Uneingeschränkter Zugang

Application of time series models for heating degree day forecasting


Zitieren

Introduction

Today, buildings have become the main consumers of world energy use (Cao et al. 2016). It is stated that 50% of the energy is spent on heating and cooling in buildings and industry (EU Strategy for Heating and Cooling 2019). Moreover, heating and hot water alone constitute 79% of total final energy use in EU households (EU Strategy for Heating and Cooling 2019). The heating demand is affected by several factors such as the building shell, the type of heating system, outdoor temperature, and occupant behavior. Among all factors, the outdoor temperature is the only one directly affected by climate change (Wu et al. 1992; Neto and Fiorelli 2008; Amber et al. 2018; Li 2018). To understand the variations in the demand for energy required to heat a building due to climate change, a technical index, called heating degree days (HDD) has been developed (IPCC, 2007). HDD is defined as the intensity of coldness (in other words, the need for heating) in a given period of time, considering the outdoor temperature and the average room temperature (Statistical Office of the European Union 2019). The calculation of HDD is based on the base temperature, which is defined as the lowest daily average air temperature that does not result in indoor heating. The value of the base temperature depends, in principle, on various factors related to the building and the environment. Using a general climatic approach, the base temperature is set to a constant 15°C in the HDD calculation. Moreover, it is assumed that the building does not need any heating for any temperature above 18°C. The HDD is calculated over a period of time (typically 1 year) by adding the differences between the daily average temperature and 18°C (weatheronline.co.uk 2019). IfTm15°C[HDD=Σi(18°CTim)];else[HDD=0]{\text{If}}\,{T_{\text{m}}} \leqslant 15^\circ {\text{C}}\left[ {{\text{HDD}} = \Sigma {\text{i}}\left( {18^\circ {\text{C}} - {{\text{T}}_{{\text{im}}}}} \right)} \right];{\text{else}}\left[ {{\text{HDD}} = 0} \right] in which Tim is the average air temperature of day i. As a result, it can be said that if there are 10% more degree days on any day/week/month/year, 10% more heating energy consumption is expected in that day/week/month/year when all other variables remain the same, and, thus, the amount of energy needed to heat a building in a given frequency is directly related to the number of HDD in that particular frequency. Therefore, HDD can be used to normalize the energy consumption of buildings with respect to heating. In addition, it is indicated that HDD is a more reliable measure of climatic impact on energy consumption than temperature alone (Mourshed 2012). Moreover, models based on HDD can be beneficial to evaluate the impact of design parameters (i.e., area of windows) with less data input (Durmayaz et al. 2000; Durmayaz and Kadioglu 2003; Sarak and Satman 2003; Dombayci 2007; Bolattürk 2008; Yu et al. 2009; Idchabani et al. 2015). Accordingly, many researchers integrated HDD in the building energy demand prediction models to improve accuracy (Meng and Mourshed 2017; Trigaux et al. 2017; Özmen et al. 2018). D’Amico et al. (2019) identified the relationship between HDD and heating energy performance. Fan et al. (2019) estimated the impacts of climatic factors including HDD and cooling degree day (CDD) on electricity demand in China. Kurekci (2016) investigated the optimum insulation thickness for building walls by using HDD and CDD values of Turkey’s provincial centers. Kohler et al. (2016) developed a new degree-day method that provides accurate estimates of annual building energy demand for space heating. It should be noted that the performance of the models might be affected by the selection of HDD data.

Moreover, several studies focused on investigating the HDD patterns to understand and predict the heating demand of the buildings. Elizbarashvili et al. (2018) estimated daily, monthly, and annual HDD and CDD for 14 different sites of Georgia based on daily mean air temperature data for the 30-year period (1961–1990). The results show that there are significant differences for HDD and CDD among the 14 examined cities of Georgia. OrtizBeviá et al. (2012) estimated trends and interannual variability in the evolution of HDD and CDD in Spain from observations at 31 stations for an extended period of 1958–2005. The results show that there is a trend which is found to be statistically significant at roughly two-thirds of the Spanish stations used in the study. Although OrtizBeviá et al. (2012) concluded that these trends are similar to those obtained from observations in other parts of Europe, and France is said to be the most temperature-sensitive country in Europe (Annual Electricity Report 2016). In a recent study, SARIMA models were constructed based on HDD data of France between 1974 and 2017. The results show that the SARIMA models yield fairly acceptable forecasts for supporting short-term forecasting of HDD (Kuru and Calis 2019).

This article is based on and is an extended version of a paper presented at the CCC2019 (Kuru and Calis 2019). This study aimed to construct short-term forecast models by analyzing the patterns of the HDD. In this context, two-time series analyses, namely the decomposition and Box–Jenkins methods, were conducted, and the performance of the models was assessed. The following sections describe datasets and the methodology. Next, the findings are presented. Finally, discussion and conclusions are provided.

Dataset and methodology
Dataset

In this study, the monthly HDD data in France were obtained from the official website of the Statistical Office of the European Union (2019). A total of 528 data covering the period of January 1974 and December 2017 were used to develop the forecasting model. The unit of data is°C * day. Table 1 presents a part of the HDD data used in the analysis.

A part of the HDD data from 1974 to 2017.

JanFebMarAprMayJunJulAugSepOctNovDec
1974434.2385.8327.9248.8128.524.714.6582.94030.13150.3297.2398.4
1975368.1345.4386.7270.5186.677.7317.1011.2140.27215.6331.3469.7
.
.
.
2015414.6402.0320.5200.5117.431.289.4288.29577.75184.5222.4279.9
2016363.3341.3347.9254.1135.341.4012.628.04821.08180.9289.1390.3
2017486.8321.9256.0236.1109.424.206.90210.0171.63113.2300.3400.1
Methodology

A time series consists of four different components, namely trend, seasonal, cyclical, and random components. The individual component analysis of the time series is the basis for the decomposition method. There are two decomposition approaches, which are additive and multiplicative decomposition. In the analysis of consecutive data, the multiplicative decomposition approach is preferred. Therefore, in this study, this approach was used, in which time series can be expressed as the product of the four components of the series and are Y = T.S.C.I, where Y represents the original data, whereas T, S, C, and I represent the trend, seasonal, cyclical, and random components of the series, respectively (Calis et al. 2017). In this method, first, the effect of the random change is reduced by applying an exponential correction to the data. Then, the seasonal and trend components of the data are eliminated by calculating the seasonal index and trend equation, respectively. Finally, the model was created assuming that the seasonal and trend components are in multiplication.

The Box–Jenkins method, also known as the autoregressive integrated moving average (ARIMA) model, was used for time-based analysis and time-based modeling of HDD data. This method applies the autoregressive moving average (ARMA) or ARIMA models to find the best fit of a time series model to the historical values of a time series (Baldigara and Koic 2015; Box et al. 2015). The seasonal autoregressive integrated moving average model (SARIMA) is an expanded form of ARIMA. SARIMA processes are designed to model trends, seasonal patterns, and correlated time series and have proven to be successful in estimating short-term fluctuations (Kam et al. 2010; Baldigara and Koic 2015). The SARIMA model consists of (1) automatic regression, (2) difference, and (3) moving average. SARIMA model is represented as SARIMA (p, d, q) (P, D, Q)S in which p, d, q represent the degree of nonseasonal linear autoregressive model (AR), non-seasonal difference, and degree of nonseasonal moving average (MA) model, respectively, whereas P, D, Q represent the degree of seasonal AR, seasonal difference, and the degree of seasonal MA. In addition, the seasonality length is represented by S. The d and D parameters are considered when the series is not stationary. In this study, to select the most appropriate SARIMA (p, d, q) (P, D, Q)S model for HDD series, the significance of the coefficients of the models was checked by the Ljung-Box-Pierce Chi-squared statistics and t-test. In addition, the corrected R2 values, the sum of the squares of the residuals, the Akaike Information Criteria (AIC) and the Schwarz Information Criteria (SIC) were considered. Furthermore, the residuals analysis including the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the residuals were conducted. Minitab 18.0 and EViews 10.0 packet programs were used to calculate all the statistics and the model selection criteria.

To evaluate the performances of the models developed by the decomposition and Box–Jenkins methods, the accuracy of the fitted values was determined by calculating mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean squared deviation (MSD), which are formulated as follows: MAPE=t=1n|(yty^t)/yt|n×100(yt0)MAD=t=1n|(yty^t)|nMSD=t=1n|(yty^t)|2n\begin{array}{*{20}{c}} {{MAPE} = \frac{{\sum\nolimits_{t = 1}^n {\left| {\left( {{y_t} - {{\hat y}_t}} \right)/{y_t}} \right|} }}{n} \times 100\,\,\,\,\,\,\,\,\,\,\,\,\left( {{y_t} \ne 0} \right)} \\ {{MAD} = \frac{{\sum\nolimits_{t = 1}^n {\left| {\left( {{y_t} - {{\hat y}_t}} \right)} \right|} }}{n}} \\ {{MSD} = \frac{{\sum\nolimits_{t = 1}^n {{{\left| {\left( {{y_t} - {{\hat y}_t}} \right)} \right|}^2}} }}{n}} \end{array} MAPE corresponds to the relative scale of the prediction error between the forecasted value and the actual value. It should be noted that the smaller the error is in MAPE, the more accurate the prediction is (Kam et al. 2010). Similarly, the smaller the MAD and MSD are, the more accurate the prediction is.

Findings

Figure 1 shows the plot of a time series graph to evaluate the overall behavior of the HDD series over time. Figure 1 indicates that the HDD series shows similar periodic fluctuations per month which specifies that the series has a seasonal effect, in other words, it has a seasonality characteristic. Therefore, it can be concluded that the series is not stable.

Fig. 1

Time series graph of the HHD series.

In addition to the time series graph, the correlogram of the series was plotted for the k=36-month delay (Figure 2). This correlogram was created in EViews to examine the seasonality.

Fig. 2

The correlogram of the HDD series for k=36 month delays.

When the correlogram of the series (Figure 2) is examined, it is observed that the seasonality shows a structure similar to each other in terms of 12-month delays. In other words, there is a strong association between seasonal neighboring observations. For example, ACF (1), ACF (13), and ACF (25) show similar positive autocorrelations, which are statistically significant (p = 0.000 < 0.05). In addition, for the delays defined by ACF (3), ACF (15), and ACF (27), the autocorrelations are zero. As a result, it can be confirmed from Figure 2 that these autocorrelation structures continue systematically and regularly in other delays. Therefore, in addition to the time-series graph, the correlogram shows that there is a high association between seasonal observations of the series and that the series is not stationary.

Findings of the decomposition method

In this study, the ratio-to-moving-average method was used for calculating the seasonal index. In this method, first, the moving averages were calculated and the effects of seasonal and random changes were eliminated. Since the period numbers were equal to the season type numbers and the months of the year represented the seasons in the HDD data set, 12-months moving averages were calculated. Then, using the calculated moving averages, the central moving averages and the ratios of the actual data to these averages were calculated. Most of the random changes were eliminated by calculating the arithmetic mean of these ratios.

It should be noted that the sum of the calculated averages of each month should be 12.00. However, the fact that the sum of the averages was obtained as 11.99, the correction factor (1.0008) was calculated and applied to the calculated averages of the months. The corrected averages, in other words, seasonal indices, were obtained by multiplying the average ratio value of each month by this correction factor (Table 2). It can be said that the number of HDD in January is 109% higher than the typical index which is 100%, whereas in July and August, it is 94% less.

Monthly rate averages and seasonal indices.

JanFebMarAprMayJunJulAugSepOctNovDecTotal
Arithmetic mean of ratios2.091.781.541.130.590.210.060.060.270.791.501.9711.99
Seasonal index2.091.781.541.130,590.210.060.060.270.791.501.9712.00
Seasonal index(%)20917815411359216627791501971200

Next, the actual HDD data were divided into the seasonal indices to obtain the seasonally adjusted data. Actual data and the seasonally adjusted data are shown in Figure 3.

Fig. 3

The plot of actual and seasonal adjusted HDD data sets.

After the seasonal effect was eliminated from the series, the trend effect of the data was determined and eliminated to develop the forecasting model. For this purpose, the trend equation was determined for the seasonally adjusted data. It should be noted that the time data are numbered as 1, 2, 3..., 528 for the trend equation. Accordingly, the trend line and trend equation were obtained, as shown in Figure 4.

Fig. 4

The plot of seasonally adjusted data and trend line.

The trend values were calculated by using the trend equation. The trend component was removed from the data set by dividing the seasonally adjusted data by the trend values. The graph of trend corrected data is presented in Figure 5.

Fig. 5

The graph of trend corrected data.

After eliminating the trend effect, the random component effect in each time series was removed by the exponential correction method. In this method, the corrected time series data at time t was calculated by substituting the numbers from 0.1 to 0.9 with intervals of 0.1 instead of α in the equation below. Yt=αyt+α(1α)yt1+α(1α)2yt2++(1α)t1y1(0α1;t=1,2,3,n)\begin{array}{*{20}{c}} {{Y_t} = \alpha {y_t} + \alpha \left( {1 - \alpha } \right){y_{t - 1}} + \alpha {{\left( {1 - \alpha } \right)}^2}{y_{t - 2}} + \ldots + {{\left( {1 - \alpha } \right)}^{t - 1}}{y_1}} \\ {\left( {0 \leqslant \alpha \leqslant 1;t = 1,2,3, \ldots n} \right)} \end{array} In addition, the sum of the squares of the differences between the actual data and the corrected data was calculated for each α value. The least squares sum was obtained for α = 0.9. Therefore, this α value was chosen for forecasting future observations. As a result of the first-order exponential correction by taking α = 0.9, the random component is removed and the graph of the obtained cyclic indices is given in Figure 6.

Fig. 6

Distribution of cyclic indices.

As a result, since the random component is eliminated, the forecast equation is obtained as follows: Yc=Tc*S*T{{\text{Y}}_{\text{c}}} = {{\text{T}}_{\text{c}}}*{\text{S}}*{\text{T}} where Yc represents the forecasted HDD values; Tc represents the trend and the seasonality effect eliminated HDD values; S is the seasonal index and T is the trend index. The actual HDD values and the forecasted values calculated by using the equation are presented in Figure 7.

Fig. 7

Actual and forecasted values obtained by the decomposition method.

Findings of the Box–Jenkins method

In this study, 79 tentative models were generated. Table 3 presents 44 of these models. It should be noted that 44 tentative models shown in the table were generated without constant. Tentative models other than those with a star are also tested by including constant.

Tentative models generated in this study.

Models
*SARIMA (1,0,1)(1,0,1)12SARIMA (1,0,0)(1,1,1)12SARIMA (1,1,1)(1,1,0)12*SARIMA (1,0,2)(0,1,1)12
SARIMA (0,0,1)(1,0,1)12*SARIMA (1,0,1)(0,1,1)12SARIMA (0,1,0)(1,1,1)12SARIMA (1,0,1)(0,1,2)12
SARIMA (1,0,0)(1,0,1)12SARIMA (1,0,1)(1,1,0)12SARIMA (1,1,0)(1,1,0)12*SARIMA (0,0,2)(0,1,1)12
SARIMA (1,0,1)(0,0,1)12SARIMA (1,0,0)(1,1,0)12SARIMA (0,1,1)(1,1,0)12SARIMA (0,0,1)(0,1,2)12
SARIMA (1,0,1)(1,0,0)12SARIMA (0,0,1)(1,1,0)12*SARIMA (0,1,1)(0,1,1)12SARIMA (2,0,0)(0,1,1)12
SARIMA (1,0,0)(1,0,0)12*SARIMA (0,0,1)(0,1,1)12SARIMA (1,1,0)(0,1,1)12SARIMA (1,0,0)(0,1,2)12
SARIMA (0,0,1)(1,0,0)12*SARIMA (1,0,0)(0,1,1)12*SARIMA (2,0,1)(1,0,1)12SARIMA (2,1,1)(0,1,1)12
SARIMA (1,0,0)(0,0,1)12SARIMA (1,1,1)(1,1,1)12SARIMA (1,0,2)(1,0,1)12SARIMA (1,1,2)(0,1,1)12
SARIMA (0,0,1)(0,0,1)12SARIMA (0,1,1)(1,1,1)12SARIMA (1,0,1)(2,0,1)12SARIMA (1,1,1)(0,1,2)12
SARIMA (1,0,1)(1,1,1)12SARIMA (1,1,0)(1,1,1)12SARIMA (1,0,1)(1,0,2)12SARIMA (0,1,2)(0,1,1)12
SARIMA (0,0,1)(1,1,1)12*SARIMA (1,1,1)(0,1,1)12SARIMA (2,0,1)(0,1,1)12SARIMA (0,1,1)(0,1,2)12

As a first step, the models which comply with both of the following conditions were selected: (1) the models with correlations that are statistically significant; (2) the models with the Chi-squared statistics of the residuals of k=12, 24, 36 months delay are statistically insignificant, in other words, the residuals of the models are not correlated with each other. As a result, nine models were selected for further analysis. In the next step, the adjusted R2 value, the residual sum of squares, the AIC, and the SIC were calculated to select the appropriate model. The selected models and the selection criteria values are presented in Table 4.

The selected models in the first step and second step selection criteria.

ModelAdjusted R2Residual sum of squaresAICSIC
1SARIMA (1,0,1)(1,0,1)120.936654828032.910.3371610.37759
2SARIMA (1,0,1)(0,1,1)120.475825812361.410.2652910.29820
3SARIMA (0,0,1)(0,1,1)120.464422831655.010.2755710.30026
4SARIMA (1,0,0)(0,1,1)120.6150531090103.010.5743710.59910
5SARIMA (1,1,1)(0,1,1)120.721348787551.510.2708510.30382
6SARIMA (0,1,1)(0,1,1)120.715896804533.510.2809610.30569
7SARIMA (2,0,1)(1,0,1)120.936911816848.710.3256110.37412
8SARIMA (1,0,2)(0,1,1)120.487184793205.510.2601710.30132
9SARIMA (0,0,2)(0,1,1)120.467373825461.210.2732510.30617

Regarding the selection of the most appropriate forecasting model among the models selected in the first step, the criteria shown in Table 5 were considered. The higher the model’s adjusted R2 value, the lower the residual sum of squares, the lower the AIC and the SIC are, the better the model is. Accordingly, the model that addresses most of the criteria at the same time is selected. Table 3 shows that the adjusted R2 value of the models varies between 0.464422 and 0.936911 and the highest values are observed in the first and the seventh models. Then, the residuals sum of squares and AIC and SIC values of these models were compared. The values of the aforementioned criteria for the seventh model are smaller than those of the 1st model. As a result, the seventh model, namely SARIMA (2,0,1)(1,0,1)12, was selected as the suitable and the final forecasting model. However, it should be noted that the graph of the residuals should be examined to check the appropriateness of the final forecasting model. The graphs obtained through Minitab are presented in Figure 8.

The actual and forecasted HDD values for 2017.

MonthsActual (kWh)Forecasted (°C*day)Forecasted (°C*day)
2017M01497.52490.2417.986
2017M02303.62315.16361.840
2017M03248.77251.1308.607
2017M04233.89228.96217.635
2017M0595.3697.75111.711
2017M0614.7316.7132.316
2017M074.984.954.335
2017M0810.369.845.148
2017M0978.487548.420
2017M010117.86127.85151.317
2017M011321.16313.33294.122
2017M012411.22411.29395.733

Fig. 8

Plots of the HDD series’ residuals according to the selected model.

It should be noted that the residuals should be normally distributed with zero mean and constant variance. The normal probability plot and histogram in Figure 5 show that residuals except for the last residuals approximately suit to normal distribution. In addition, “versus fits” graph shows that the variance tends to increase slightly. It should be noted that this situation can be caused by the large variation of the data in the series. In addition to the plots in Figure 3, ACF and PACF plots of the residuals, which are the most examined plots in the model selection, are presented in Figures 9 and 10.

Fig. 9

ACF plot of residuals.

Fig. 10

PACF plot of residuals.

The plots indicate that most of the autocorrelations of the residuals are zero, in other words, they are not correlated. Thus, it can be concluded that the chosen model is appropriate.

The monthly actual HDD values, as well as the monthly forecasted HDD values and the confidence intervals obtained through the SARIMA (2,0,1) (1,0,1)12 model, are shown in Figure 11. The plot shows that the forecasted HDD values are close to the actual HDD values. In addition, the actual and forecasted HDD values of 2017 are presented in Table 4, which also indicates that the forecasted HDD values are close to the actual HDD values.

Fig. 11

The actual values, the forecasted values, and the confidence intervals of the HDD data according to model SARIMA (2,0,1) (1,0,1) 12.

Discussion

In this study, two methods, namely the decomposition and Box–Jenkins, are used to forecast the number of HDD of France. The results of both methods show that forecasted values are close to the actual values. The results for the next 12 months are shown in Table 5.

In addition to the comparison of actual and forecasted values, MAPE, MAD, and MSD values have to be evaluated to determine the performance of the model. MAPE, MAD, and MSD values for both the decomposition and the Box–Jenkins methods are shown in Table 6. According to Lewis (1982), if the MAPE values are below 10%, it can be said that the model is “very good,” and for the values between 10% and 20%, it can be said that the model is “good.” Furthermore, according to Witt and Witt (1992), if the MAPE value is ≤10%, it can be said that the model has a high accuracy degree, if this value is between 10% and 20%, the model can be called a “true prediction model.” As can be seen in the Table, the multiplicative model constructed by the decomposition method is “very good” and has a high accuracy degree, whereas the SARIMA model constructed by the Box–Jenkins method is “good” and a “true prediction model.”

MAPE, MAD, and MSD values of two methods.

Multiplicative modelSARIMA (2,0,1)(1,0,1)12 model
MAPE3.77118.7363
MAD4.367519.9735
MSD33.3013969.8330

The results show that the decomposition method yields an acceptable number of HDD forecasts for supporting accurate energy prediction models. It can be concluded that the decomposition method can be preferred to forecast the number of HDD based on time. In addition to its performance, the advantage of this method is its capability of removing irregular changes and seasonal factors within the forecast.

It can be concluded that the decomposition method can be selected to forecast the number of HDD. In addition to its performance, the advantage of this method is its capability of removing irregular changes and seasonal factors within the forecast. For seasonal (monthly, weekly, or quarterly) data, the decomposition methods are often as accurate as of the ARIMA methods and they provide additional information about the trend and cycle, which may not be available in ARIMA methods (NCSS data analysis 2019).

Conclusion

In this study, two different time series analysis methods, namely the decomposition and Box–Jenkins, were conducted by the monthly HDD data in France between 1974 and 2017. The multiplicative model and 79 SARIMA models were constructed by the decomposition and Box–Jenkins method, respectively. The SARIMA models were evaluated according to the adjusted R2 value, residual sum of squares, AIC and SIC criteria, as well as the analysis of the residuals. Finally, MAPE, MAD, and MSD values were calculated to evaluate both multiplicative and SARIMA models. As a result, the SARIMA (2,0,1)(1,0,1)12 model was selected as the final forecasting model. In addition, the multiplicative model constructed by the decomposition method is “very good” and has a high accuracy degree, whereas the SARIMA model constructed by the Box–Jenkins method is “good” and a “true prediction model.” Therefore, the results show that the decomposition method yields acceptable forecasts for supporting short-term forecasting of HDD. Future studies can focus on the integration of these models in the forecasting models of the heating demand in buildings.

eISSN:
1847-6228
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
Volume Open
Fachgebiete der Zeitschrift:
Technik, Einführungen und Gesamtdarstellungen, andere