Today, buildings have become the main consumers of world energy use (Cao et al. 2016). It is stated that 50% of the energy is spent on heating and cooling in buildings and industry (
Moreover, several studies focused on investigating the HDD patterns to understand and predict the heating demand of the buildings. Elizbarashvili et al. (2018) estimated daily, monthly, and annual HDD and CDD for 14 different sites of Georgia based on daily mean air temperature data for the 30-year period (1961–1990). The results show that there are significant differences for HDD and CDD among the 14 examined cities of Georgia. OrtizBeviá et al. (2012) estimated trends and interannual variability in the evolution of HDD and CDD in Spain from observations at 31 stations for an extended period of 1958–2005. The results show that there is a trend which is found to be statistically significant at roughly two-thirds of the Spanish stations used in the study. Although OrtizBeviá et al. (2012) concluded that these trends are similar to those obtained from observations in other parts of Europe, and France is said to be the most temperature-sensitive country in Europe (
This article is based on and is an extended version of a paper presented at the CCC2019 (Kuru and Calis 2019). This study aimed to construct short-term forecast models by analyzing the patterns of the HDD. In this context, two-time series analyses, namely the decomposition and Box–Jenkins methods, were conducted, and the performance of the models was assessed. The following sections describe datasets and the methodology. Next, the findings are presented. Finally, discussion and conclusions are provided.
In this study, the monthly HDD data in France were obtained from the official website of the Statistical Office of the European Union (2019). A total of 528 data covering the period of January 1974 and December 2017 were used to develop the forecasting model. The unit of data is°C * day. Table 1 presents a part of the HDD data used in the analysis.
A part of the HDD data from 1974 to 2017.
434.2 | 385.8 | 327.9 | 248.8 | 128.5 | 24.71 | 4.658 | 2.940 | 30.13 | 150.3 | 297.2 | 398.4 | |
368.1 | 345.4 | 386.7 | 270.5 | 186.6 | 77.73 | 17.10 | 11.21 | 40.27 | 215.6 | 331.3 | 469.7 | |
414.6 | 402.0 | 320.5 | 200.5 | 117.4 | 31.28 | 9.428 | 8.295 | 77.75 | 184.5 | 222.4 | 279.9 | |
363.3 | 341.3 | 347.9 | 254.1 | 135.3 | 41.40 | 12.62 | 8.048 | 21.08 | 180.9 | 289.1 | 390.3 | |
486.8 | 321.9 | 256.0 | 236.1 | 109.4 | 24.20 | 6.902 | 10.01 | 71.63 | 113.2 | 300.3 | 400.1 |
A time series consists of four different components, namely trend, seasonal, cyclical, and random components. The individual component analysis of the time series is the basis for the decomposition method. There are two decomposition approaches, which are additive and multiplicative decomposition. In the analysis of consecutive data, the multiplicative decomposition approach is preferred. Therefore, in this study, this approach was used, in which time series can be expressed as the product of the four components of the series and are Y = T.S.C.I, where Y represents the original data, whereas T, S, C, and I represent the trend, seasonal, cyclical, and random components of the series, respectively (Calis et al. 2017). In this method, first, the effect of the random change is reduced by applying an exponential correction to the data. Then, the seasonal and trend components of the data are eliminated by calculating the seasonal index and trend equation, respectively. Finally, the model was created assuming that the seasonal and trend components are in multiplication.
The Box–Jenkins method, also known as the autoregressive integrated moving average (ARIMA) model, was used for time-based analysis and time-based modeling of HDD data. This method applies the autoregressive moving average (ARMA) or ARIMA models to find the best fit of a time series model to the historical values of a time series (Baldigara and Koic 2015; Box et al. 2015). The seasonal autoregressive integrated moving average model (SARIMA) is an expanded form of ARIMA. SARIMA processes are designed to model trends, seasonal patterns, and correlated time series and have proven to be successful in estimating short-term fluctuations (Kam et al. 2010; Baldigara and Koic 2015). The SARIMA model consists of (1) automatic regression, (2) difference, and (3) moving average. SARIMA model is represented as SARIMA (p, d, q) (P, D, Q)S in which p, d, q represent the degree of nonseasonal linear autoregressive model (AR), non-seasonal difference, and degree of nonseasonal moving average (MA) model, respectively, whereas P, D, Q represent the degree of seasonal AR, seasonal difference, and the degree of seasonal MA. In addition, the seasonality length is represented by S. The d and D parameters are considered when the series is not stationary. In this study, to select the most appropriate SARIMA (p, d, q) (P, D, Q)S model for HDD series, the significance of the coefficients of the models was checked by the Ljung-Box-Pierce Chi-squared statistics and
To evaluate the performances of the models developed by the decomposition and Box–Jenkins methods, the accuracy of the fitted values was determined by calculating mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean squared deviation (MSD), which are formulated as follows:
Figure 1 shows the plot of a time series graph to evaluate the overall behavior of the HDD series over time. Figure 1 indicates that the HDD series shows similar periodic fluctuations per month which specifies that the series has a seasonal effect, in other words, it has a seasonality characteristic. Therefore, it can be concluded that the series is not stable.
In addition to the time series graph, the correlogram of the series was plotted for the
When the correlogram of the series (Figure 2) is examined, it is observed that the seasonality shows a structure similar to each other in terms of 12-month delays. In other words, there is a strong association between seasonal neighboring observations. For example, ACF (1), ACF (13), and ACF (25) show similar positive autocorrelations, which are statistically significant (
In this study, the ratio-to-moving-average method was used for calculating the seasonal index. In this method, first, the moving averages were calculated and the effects of seasonal and random changes were eliminated. Since the period numbers were equal to the season type numbers and the months of the year represented the seasons in the HDD data set, 12-months moving averages were calculated. Then, using the calculated moving averages, the central moving averages and the ratios of the actual data to these averages were calculated. Most of the random changes were eliminated by calculating the arithmetic mean of these ratios.
It should be noted that the sum of the calculated averages of each month should be 12.00. However, the fact that the sum of the averages was obtained as 11.99, the correction factor (1.0008) was calculated and applied to the calculated averages of the months. The corrected averages, in other words, seasonal indices, were obtained by multiplying the average ratio value of each month by this correction factor (Table 2). It can be said that the number of HDD in January is 109% higher than the typical index which is 100%, whereas in July and August, it is 94% less.
Monthly rate averages and seasonal indices.
2.09 | 1.78 | 1.54 | 1.13 | 0.59 | 0.21 | 0.06 | 0.06 | 0.27 | 0.79 | 1.50 | 1.97 | 11.99 | |
2.09 | 1.78 | 1.54 | 1.13 | 0,59 | 0.21 | 0.06 | 0.06 | 0.27 | 0.79 | 1.50 | 1.97 | 12.00 | |
209 | 178 | 154 | 113 | 59 | 21 | 6 | 6 | 27 | 79 | 150 | 197 | 1200 |
Next, the actual HDD data were divided into the seasonal indices to obtain the seasonally adjusted data. Actual data and the seasonally adjusted data are shown in Figure 3.
After the seasonal effect was eliminated from the series, the trend effect of the data was determined and eliminated to develop the forecasting model. For this purpose, the trend equation was determined for the seasonally adjusted data. It should be noted that the time data are numbered as 1, 2, 3..., 528 for the trend equation. Accordingly, the trend line and trend equation were obtained, as shown in Figure 4.
The trend values were calculated by using the trend equation. The trend component was removed from the data set by dividing the seasonally adjusted data by the trend values. The graph of trend corrected data is presented in Figure 5.
After eliminating the trend effect, the random component effect in each time series was removed by the exponential correction method. In this method, the corrected time series data at time
As a result, since the random component is eliminated, the forecast equation is obtained as follows:
In this study, 79 tentative models were generated. Table 3 presents 44 of these models. It should be noted that 44 tentative models shown in the table were generated without constant. Tentative models other than those with a star are also tested by including constant.
Tentative models generated in this study.
*SARIMA (1,0,1)(1,0,1)12 | SARIMA (1,0,0)(1,1,1)12 | SARIMA (1,1,1)(1,1,0)12 | *SARIMA (1,0,2)(0,1,1)12 |
SARIMA (0,0,1)(1,0,1)12 | *SARIMA (1,0,1)(0,1,1)12 | SARIMA (0,1,0)(1,1,1)12 | SARIMA (1,0,1)(0,1,2)12 |
SARIMA (1,0,0)(1,0,1)12 | SARIMA (1,0,1)(1,1,0)12 | SARIMA (1,1,0)(1,1,0)12 | *SARIMA (0,0,2)(0,1,1)12 |
SARIMA (1,0,1)(0,0,1)12 | SARIMA (1,0,0)(1,1,0)12 | SARIMA (0,1,1)(1,1,0)12 | SARIMA (0,0,1)(0,1,2)12 |
SARIMA (1,0,1)(1,0,0)12 | SARIMA (0,0,1)(1,1,0)12 | *SARIMA (0,1,1)(0,1,1)12 | SARIMA (2,0,0)(0,1,1)12 |
SARIMA (1,0,0)(1,0,0)12 | *SARIMA (0,0,1)(0,1,1)12 | SARIMA (1,1,0)(0,1,1)12 | SARIMA (1,0,0)(0,1,2)12 |
SARIMA (0,0,1)(1,0,0)12 | *SARIMA (1,0,0)(0,1,1)12 | *SARIMA (2,0,1)(1,0,1)12 | SARIMA (2,1,1)(0,1,1)12 |
SARIMA (1,0,0)(0,0,1)12 | SARIMA (1,1,1)(1,1,1)12 | SARIMA (1,0,2)(1,0,1)12 | SARIMA (1,1,2)(0,1,1)12 |
SARIMA (0,0,1)(0,0,1)12 | SARIMA (0,1,1)(1,1,1)12 | SARIMA (1,0,1)(2,0,1)12 | SARIMA (1,1,1)(0,1,2)12 |
SARIMA (1,0,1)(1,1,1)12 | SARIMA (1,1,0)(1,1,1)12 | SARIMA (1,0,1)(1,0,2)12 | SARIMA (0,1,2)(0,1,1)12 |
SARIMA (0,0,1)(1,1,1)12 | *SARIMA (1,1,1)(0,1,1)12 | SARIMA (2,0,1)(0,1,1)12 | SARIMA (0,1,1)(0,1,2)12 |
As a first step, the models which comply with both of the following conditions were selected: (1) the models with correlations that are statistically significant; (2) the models with the Chi-squared statistics of the residuals of
The selected models in the first step and second step selection criteria.
SARIMA (1,0,1)(1,0,1)12 | 0.936654 | 828032.9 | 10.33716 | 10.37759 | |
SARIMA (1,0,1)(0,1,1)12 | 0.475825 | 812361.4 | 10.26529 | 10.29820 | |
SARIMA (0,0,1)(0,1,1)12 | 0.464422 | 831655.0 | 10.27557 | 10.30026 | |
SARIMA (1,0,0)(0,1,1)12 | 0.615053 | 1090103.0 | 10.57437 | 10.59910 | |
SARIMA (1,1,1)(0,1,1)12 | 0.721348 | 787551.5 | 10.27085 | 10.30382 | |
SARIMA (0,1,1)(0,1,1)12 | 0.715896 | 804533.5 | 10.28096 | 10.30569 | |
SARIMA (2,0,1)(1,0,1)12 | 0.936911 | 816848.7 | 10.32561 | 10.37412 | |
SARIMA (1,0,2)(0,1,1)12 | 0.487184 | 793205.5 | 10.26017 | 10.30132 | |
SARIMA (0,0,2)(0,1,1)12 | 0.467373 | 825461.2 | 10.27325 | 10.30617 |
Regarding the selection of the most appropriate forecasting model among the models selected in the first step, the criteria shown in Table 5 were considered. The higher the model’s adjusted
The actual and forecasted HDD values for 2017.
497.52 | 490.2 | 417.986 | |
303.62 | 315.16 | 361.840 | |
248.77 | 251.1 | 308.607 | |
233.89 | 228.96 | 217.635 | |
95.36 | 97.75 | 111.711 | |
14.73 | 16.71 | 32.316 | |
4.98 | 4.95 | 4.335 | |
10.36 | 9.84 | 5.148 | |
78.48 | 75 | 48.420 | |
117.86 | 127.85 | 151.317 | |
321.16 | 313.33 | 294.122 | |
411.22 | 411.29 | 395.733 |
It should be noted that the residuals should be normally distributed with zero mean and constant variance. The normal probability plot and histogram in Figure 5 show that residuals except for the last residuals approximately suit to normal distribution. In addition, “versus fits” graph shows that the variance tends to increase slightly. It should be noted that this situation can be caused by the large variation of the data in the series. In addition to the plots in Figure 3, ACF and PACF plots of the residuals, which are the most examined plots in the model selection, are presented in Figures 9 and 10.
The plots indicate that most of the autocorrelations of the residuals are zero, in other words, they are not correlated. Thus, it can be concluded that the chosen model is appropriate.
The monthly actual HDD values, as well as the monthly forecasted HDD values and the confidence intervals obtained through the SARIMA (2,0,1) (1,0,1)12 model, are shown in Figure 11. The plot shows that the forecasted HDD values are close to the actual HDD values. In addition, the actual and forecasted HDD values of 2017 are presented in Table 4, which also indicates that the forecasted HDD values are close to the actual HDD values.
In this study, two methods, namely the decomposition and Box–Jenkins, are used to forecast the number of HDD of France. The results of both methods show that forecasted values are close to the actual values. The results for the next 12 months are shown in Table 5.
In addition to the comparison of actual and forecasted values, MAPE, MAD, and MSD values have to be evaluated to determine the performance of the model. MAPE, MAD, and MSD values for both the decomposition and the Box–Jenkins methods are shown in Table 6. According to Lewis (1982), if the MAPE values are below 10%, it can be said that the model is “very good,” and for the values between 10% and 20%, it can be said that the model is “good.” Furthermore, according to Witt and Witt (1992), if the MAPE value is ≤10%, it can be said that the model has a high accuracy degree, if this value is between 10% and 20%, the model can be called a “true prediction model.” As can be seen in the Table, the multiplicative model constructed by the decomposition method is “very good” and has a high accuracy degree, whereas the SARIMA model constructed by the Box–Jenkins method is “good” and a “true prediction model.”
MAPE, MAD, and MSD values of two methods.
3.771 | 18.7363 | |
4.3675 | 19.9735 | |
33.3013 | 969.8330 |
The results show that the decomposition method yields an acceptable number of HDD forecasts for supporting accurate energy prediction models. It can be concluded that the decomposition method can be preferred to forecast the number of HDD based on time. In addition to its performance, the advantage of this method is its capability of removing irregular changes and seasonal factors within the forecast.
It can be concluded that the decomposition method can be selected to forecast the number of HDD. In addition to its performance, the advantage of this method is its capability of removing irregular changes and seasonal factors within the forecast. For seasonal (monthly, weekly, or quarterly) data, the decomposition methods are often as accurate as of the ARIMA methods and they provide additional information about the trend and cycle, which may not be available in ARIMA methods (NCSS data analysis 2019).
In this study, two different time series analysis methods, namely the decomposition and Box–Jenkins, were conducted by the monthly HDD data in France between 1974 and 2017. The multiplicative model and 79 SARIMA models were constructed by the decomposition and Box–Jenkins method, respectively. The SARIMA models were evaluated according to the adjusted