INFORMAZIONI SU QUESTO ARTICOLO

Cita

INTRODUCTION

Fruits play a significant role in human lives by providing vitamins, minerals, and dietary fibers necessary for growth and development (Kader 2001). To improve humans’ health, consuming fruits and vegetables is necessary to meet food insecurity (Gibson et al. 1998). Tribal peoples hold bag the topmost position among the undernourished sections of society, as there is always uncertainty regarding their food supply (Dey & Bisai 2019). Nandal and Bhardwaj (2014) reported that exploitation of horticultural crops that have not yet been utilized could become a way out of this population's deteriorating health and nutritional insecurity. Due to the varied climatic conditions and soil types, India holds a prominent position in world horticulture (Ravindran et al. 2007). According to FAO statistics (2021), India currently accounts for around 12 percent of total fruit production worldwide. Also, the fruit yield in India (14.66 tons per ha) is slightly greater than that of the world (13.67 tons per ha). Major fruit crops cultivated in India include mango, banana, papaya, apple, grape, pineapple, and guava (Bairwa et al. 2012). Currently, India accounts for around 3.16, 26.28, 4.00, 45.13, and 6.46 percent of the entire world's production of apples, bananas, grapes and, cumulatively, mangoes, mangosteens, guava, and pineapples.

The role of fruit in bringing nutritional and economic security to the country's people is well-established (Chadha 2002). As a result, the nation must increase fruit output through wise legislation. Additionally, it is crucial to develop ways for enhancing output and productivity. Such production strategies and policies require accurate predictions regarding fruit production. Henceforth, in this study, the production of apples, bananas, grapes, mangoes, mangosteens, guavas, and pineapples in India are predicted using data from 1961–2015 (training data set) and 2016–2020 (testing data set). Many similar studies have been done in the past.

Ray et al. (2016a) estimated the autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedasticity (GARCH) model of fruit cultivated area and production in India. They found that the ARIMA model was superior to the GARCH model for area and production data series. Ahmad et al. (2005) employed the autoregressive integrated moving average (ARIMA) model to forecast the production of kinnow mandarins in Pakistan. Hamjah (2014) used the Box–Jenkins ARIMA model and forecasted mango, banana, and guava production in Bangladesh and found ARIMA (2,1,3), ARIMA (3,1,2), and ARIMA (1,1,2), respectively, to be the best models. Sharma et al. (2014) forecasted the production of apples in Himachal Pradesh using the ARIMA model, and found that ARIMA (0,1,1) was the most suitable one. Rathod and Mishra (2017) forecasted mango production in Karnataka and concluded that the weather-based stepwise regression model performed better than the ARIMA model for forecasting. Banana harvest area and production in Turkey were forecasted by using several ARIMA and exponential smoothing models. The Brown exponential smoothing model was the most suitable (Eyduran et al. 2020). Rueangrit et al. (2020) used the Box–Jenkins approach to forecast durian fruit production in Thailand. They used seasonal autoregressive integrated moving average (SARIMA) for forecasting and found that the SARIMA (2,1,1)(0,1,1)12 model was the best suited.

ARIMA and ETS models have been used and compared in this study to predict the production of the specified fruits, viz., apples, bananas, grapes, mangoes, mangosteens, guavas, and pineapples, for 2021–2027.

MATERIALS AND METHODS

The research mainly focused on estimating the data series nature and developing the time series model of the fruit crop (apple, banana, grape, pineapple, and mango). Basic statistics were incorporated for data series visualization, and the ARIMA and ETS models were used for developing the time series model. All the secondary data series were collected from the period of 1961 to 2020 from www.fao.org.

Descriptive Statistics

Descriptive statistics are generally associated with the central tendency and dispersion measure used to summarize the data. Mean, standard deviation (SD), standard error (SE), coefficient of variation (CV%), skewness, kurtosis, and cumulative growth rate (CGR%) were considered in our study.

Time series analysis

A time series analysis is a process of analyzing a set of data that was compiled over time. The research contributes to constructing a statistical model that predicts the behavior of data series. Our work was driven to build a time series model, namely, an ARIMA model, and Holt's nonlinear ETS model, to forecast fruit production (apple, banana, grape, pineapple, and mango) in India.

Autoregressive integrated moving average model

The Box–Jenkins method consists of four steps: identification, estimate, diagnostic testing, and forecasting. Before implementing the Box–Jenkins model, it is crucial to assess the stationarity of the time series and the significant seasonality that must be modeled (Ray & Bhattacharyya 2020a; Ray et al. 2021; Raghav et al. 2022). To check the stationarity of time series variables, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are used (Ray & Bhattacharyya 2020b), as well as some popular statistical tests such as augmented Dickey–Fuller (1979), Phillips–Perron (1988), and Kwiatkowski–Phillips–Schmidt–Shin (1992). The ACF and PACF functions estimate the model order in the identification step and build ARIMA (p, d, q); where order p is denoted as the autoregressive model, d is integrated, and q is denoted as the moving average model. The ARIMA (p, d, q) model is denoted as ∅(β)(1 − β)dZt = θ(β)ɛt; where ∅(β) = 1 − ∅1β − ∅2β2 − ∅3β3 − ⋯ … − ∅pβp (AR model with order p); θ(β) = 1 − θ1βθ2β2θ3β3 − ⋯ … − θqβq (MA model with order q); Zt=(1β)dεtd {{Z}_{t}}=\frac{{{\left( 1-\beta \right)}^{d}}{{\varepsilon }_{t}}}{{{\nabla }^{d}}} (ARIMA model; β is the backshift operator); and ɛt ~ WN(0, σ2) (error term follows a normal distribution with zero mean and constant variance).

After determining the model order, the parameter estimation process is calculated using maximum likelihood approaches. The optimal model is chosen using model selection criteria such as Akaike (1974) information criteria, root mean square error (RMSE), and mean absolute error (MSE) (Mishra et al. 2022). Finally, the in-sample and out-of-sample forecasts are used to assess the validity and prediction of the chosen model in the last stage (Ray et al. 2016b). Because most real-time series have a growing and decreasing tendency, they are classified as nonstationary. The first-order differencing approach was applied to make the data series stationary. The normality of residuals is checked using the Jarque–Bera test (Ray et al. 2016b), the independence of residuals is checked using the Box–Ljung test, and the constant variance of residuals is checked using the ARCH LM test. Root mean squared error was used to assess the accuracy of the anticipated values (RMSE). The same procedure was carried out to predict the rainfall in Western Himalayan regions using monthly, seasonal, and annual data series.

Holt's nonlinear models

Since most time series contain volatility and do not follow a linear trend, it may be beneficial to incorporate components that consider these characteristics in the estimated model. The nonlinear models of Holt cover systemic development that combines exponential smoothing (ETS) models to form a nonlinear dynamic model. Analysis of these models using state-space-based probability calculations supports model selection and prevision standard error calculation (Hyndman et al. 2002).

The model uses three primary time series components: trend (T), seasonal (S), and error (E). It reflects the long-term trend of time series movement, which is the unpredictable part of the time series. The annual data series was used in this study. The components we need are combined in our model in various additive and multiplicative combinations to produce yt. We have an additive model yt = T+E or a multiplicative model such as yt = T×E; where the individual components of the model are given as follows: E [A, M]; T [N, A, M, AD, MD]; S [N, A, M]; where N – none, A – additive, M – multiplicative, AD – additive dampened, and MD – multiplicative dampened (damping uses an additional parameter to reduce the impacts of the trend over time). Accordingly, the models that we are interested in estimating can be found (after selecting S [N]) in Table A.

State space equations for each of the models in the Holt's nonlinear

Trend Additive Error Models Trend Multiplicative Error Models
N yt = lt−1 + ɛtlt = lt−1 + αɛt N yt = lt−1 (1 + ɛt)lt = lt−1 (1 + αɛt)
A yt = lt−1 + bt−1 + ɛtlt = lt−1 + bt−1 + αɛtbt = bt−1 + βɛt M yt = (lt−1 + bt−1) (1 + ɛt)lt = (lt−1 + bt−1) (1 + αɛt)bt = bt−1 + β(lt−1 + bt−1)ɛt
AD yt = lt−1 + ϕbtq + βɛtlt = lt−1 + ϕbt−1 + αɛtbt = ϕbt−1 + βɛt MD yt = (lt−1 + ϕbt−1) (1 + ɛt)lt = (lt−1 + ϕbt−1) (1 + αɛt)bt = ϕbt−1 + β(lt−1 + ϕbt−1)ɛt

α – smoothing factor for the level, β – smoothing factor for the trend, ϕ – damping coefficient; l – initial level components, b – initial growth components, estimated as part of the optimization problem

Model Selection Criteria

The best model was selected based on the following goodness of fit criteria: MAE=t=1n|yt^yt|n;RMSE=t=1n(yt^yt)2n;MAPE=1nt=1n|yt^ytyt|×100;Theil=t=1n(yt^yt)2nt=1ny^t2n+t=1nyt2n;BP=(y^¯y¯)2t=1n(yt^yt)2n;VP=(sy^sy)2t=1n(yt^yt)2n;CVP=2(1r)sy^syt=1n(yt^yt)2n; \begin{array}{*{35}{l}}\text{MAE}=\frac{\sum\nolimits_{t=1}^{n}{\left| \widehat{{{y}_{t}}}-{{y}_{t}} \right|}}{n};\ \text{RMSE}=\sqrt{\frac{\sum\nolimits_{t=1}^{n}{{{\left( \widehat{{{y}_{t}}}-{{y}_{t}} \right)}^{2}}}}{n}}; \\ \text{MAPE}=\frac{1}{n}\sum\nolimits_{t=1}^{n}{\left| \frac{\widehat{{{y}_{t}}}-{{y}_{t}}}{{{y}_{t}}} \right|\times 100;} \\ \text{Theil}=\frac{\sqrt{\frac{\sum\nolimits_{t=1}^{n}{{{\left( \widehat{{{y}_{t}}}-{{y}_{t}} \right)}^{2}}}}{n}}}{\sqrt{\sum\nolimits_{t=1}^{n}{{}^{\hat{y}_{t}^{2}}\!\!\diagup\!\!{}_{n}\;}}+\sqrt{\sum\nolimits_{t=1}^{n}{{}^{y_{t}^{2}}\!\!\diagup\!\!{}_{n}\;}}};\text{BP}=\frac{{{\left( \bar{\hat{y}}-\bar{y} \right)}^{2}}}{\sum\nolimits_{t=1}^{n}{\frac{{{\left( \widehat{{{y}_{t}}}-{{y}_{t}} \right)}^{2}}}{n}}}; \\ \text{VP}=\frac{{{\left( {{s}_{{\hat{y}}}}-{{s}_{y}} \right)}^{2}}}{\sum\nolimits_{t=1}^{n}{\frac{{{\left( \widehat{{{y}_{t}}}-{{y}_{t}} \right)}^{2}}}{n}}};\ \text{CVP}=\frac{2\left( 1-r \right){{s}_{{\hat{y}}}}{{s}_{y}}}{\sum\nolimits_{t=1}^{n}{\frac{{{\left( \widehat{{{y}_{t}}}-{{y}_{t}} \right)}^{2}}}{n}}}; \\ \end{array} where MAE is the mean absolute error index; RMSE is the root mean square error index; MAPE is the mean absolute percentage error; Theil is the Theil index; BP is the bias proportion; VP is the variance proportion, and CVP is the co-variance proportion. Notably, the BP index determines how far the mean of forecast values is from the mean of the actual values, the VP index measures the proportion of variation in forecast values from the variation in actual values, and the CVP measures the remaining unsystematic errors.

Moreover, ŷt and yt are the prediction and actual series, respectively; y^¯ {\bar{\hat{y}}} and y¯ {\bar{y}} are the mean of forecast and actual series respectively; sŷ and sy are the standard deviations of forecast and actual series, respectively.

To support the decision concerning validity of forecasting, Diebold–Mariano, Giacomini–White, and Clark–West tests were used to compare ARIMA and Holt's nonlinear ETS model using Eviews 12 and 13, and Stata 17 software.

RESULTS AND DISCUSSION
Basic statistics

The first step in this analysis is to look at the behavior of the data series through descriptive statistics (Table 1) from 1961 to 2020, which show that bananas and mangoes were the most abundant fruits in India during the study period. From Table 1, one can see that the production of apples has a range from 90 to 2891 thousand tons, with an average 1153.465 thousand tons. Banana production has increased from 2257 to 31504 thousand tons from the study period, registering the average as 12282.65 thousand tons. Grape production has ranged from 70 to 3125 thousand tons with an average of 889.9539 thousand tons. The mango production has registered from 6988 to 25631 thousand tons with an average and standard deviation 11091.62 and 4863.780. The pineapple production has increased from 76 to 1984 thousand tons with an average of 904.2272 thousand tons. Furthermore, according to the table, the grape variation series has the most significant change (104.37%), followed by bananas (82.68%). Besides, the Skewness index value is positive in all cases, showing a steady increase in production over the research period. Furthermore, except for mangoes, the Kurtosis index found a platykurtic distribution with index values near 3, indicating that our data has no outliers. The Jarque–Bera test was used to check the normality behavior of the series: banana, grape, and mango series do not follow a normal distribution, as the p-value is less than 0.05. On the other hand, visualized actual data (Figs. 1, 2) shows all the series had a linear trend with little volatility, which means that all the series are not stationary at their level series; for this reason, the next step in this study is to examine the stationarity behavior of the series using different tests.

Descriptive statistics of the empirical data set

Statistic APPLE BANANA GRAPE MANGO PINEAPPLE
Mean 1153.465 12282.65 889.9539 11091.62 904.2272
Median 1120.822 7503.050 414.3915 9559.610 837.1935
Maximum 2891.000 31504.00 3125.000 25631.00 1984.000
Minimum 90.00000 2257.000 70.00000 6988.000 76.00000
Std. Dev. 751.9735 10155.63 928.9188 4863.780 515.1081
CV (%) 65.1925 82.6827 104.3783 43.8509 56.9666
Skewness 0.428803 0.763810 1.081578 1.592267 0.344042
Kurtosis 2.330973 2.019532 2.922950 4.824546 2.073819
Jarque–Bera (Statistic) 2.957718 8.237345 11.71295 33.67557 3.328177
Probability (Jarque–Bera test) 0.227898 0.016266 0.002861 0.000000 0.189363

Figure 1.

Forecast and actual values of the five fruits using ARIMA forecasting

Figure 2.

Forecast and actual values of the five fruits using ETS forecasting

Unit root tests

Before creating the model, the stationarity of the data must be assessed after examining the nature of the data series. To test the stationarity behavior in our series, we use two different tests to avoid any spurious results; the first test is the Ng–Perron test (Ng & Perron 2001), which gives us four different tests, and the second test is the Dickey–Fuller test with bootstrapping critical values depending on the Park (2003) procedure (Table 2). Accordingly, the results show strongly that all our variables are not stationary at their levels, as the calculated statistic was less than the critical value, so we cannot reject the null hypothesis (H0: Data series is not stationary). Still, they are stationary at their first differences, as the calculated statistic was greater than critical value, so the null hypothesis was rejected. It also confirmed that the ARIMA integrated will model one or more. This result means that we should use ARIMA models in our forecasting. After the stationarity test, based on the objective, we tried to build ARIMA and ETS models for forecasting.

Unit root test results

Fruit Ng–Perron test Park test

MZa MZt MSB MPT Statistic Critical % value
APPLE −4.655 −1.399 0.287 18.353 0.473 2.250
D (APPLE)* −44.491 −4.683 0.105 2.220 −10.140 3.058
BANANA −1.800 −0.841 0.467 42.823 0.751 2.302
D (BANANA)* −28.083 −3.747 0.133 3.246 −5.941 3.103
GRAPE −4.829 −1.323 0.274 17.601 2.367 2.166
D (GRAPE)* −66.590 −5.769 0.086 1.370 −7.225 3.032
PINEAPPLE −16.870 −2.898 0.171 5.438 0.063 2.072
D (PINEAPPLE)* −28.758 −3.783 0.131 3.215 −6.279 2.992
MANGO −1.232 −0.429 0.348 32.137 3.472 2.342
D (MANGO)* −28.891 −3.730 0.129 3.559 −7.334 3.170
Critical 5% value 17.3 2.91 0.168 5.48

first order of difference

Model selection

After determining the stationarity of the data series, we tried to build a statistical model. The results inspired by Table 3 revealed clearly that ETS (Holt's nonlinear) models are the best in all the cases according to all the statistics (AIC, RMSE, MAE, MAPE, Theil, BP, VP and CVP) where we have the smallest values; this result means that the predictions using the ETS model will provide the minimum errors and deviations between forecasting and actual values (Mishra et al. 2022). However, we will forecast using the two methods for more accurate results.

Model selection: ARIMA and Holt's nonlinear ETS model and fit criteria

Fruit MODEL AIC RMSE MAE MAPE Theil BP VP CVP
ARIMA

APPLE (2. 1. 0) 13.43 258.97 215.547 35.440 0.088 0.387 0.052 0.560
BANANA (0. 1. 1) 16.90 6026.99 4972.23 80.192 0.171 0.616 0.070 0.313
GRAPE (0. 1. 2) 13.46 813.57 710.26 219.61 0.260 0.760 0.003 0.236
PINEAPPLE (0. 1. 3) 11.83 113.25 87.964 11.792 0.053 0.075 0.004 0.919
MANGO (2. 2. 4) 16.59 7706.93 6078.89 47.890 0.240 0.622 0.283 0.093

Holt's nonlinear ETS

APPLE M M A 842.86 240.93 147.58 12.97 0.086 0.014 0.026 0.95
BANANA M M A 1032.30 1085.85 696.71 5.732 0.033 0.005 0.659 19.53
GRAPE M M A 774.118 202.411 93.524 12.162 0.078 0.006 0.011 0.69
PINEAPPLE M A N 763.445 89.846 60.584 7.219 0.043 0.005 0.000 0.138
MANGO M M N 1038.17 918.334 520.421 4.014 0.038 0.022 0.353 13.84
Forecasting using ARIMA models

After building the model, we tried to estimate the forecasting nature obtained from the best model of the data series. Data in Figure 1 clearly show significant deviations between forecast and actual values using confidence intervals and graphs, except for the PIN series (ARIMA model). These results revealed that the ARIMA forecasting is inappropriate for our series and that it is necessary to depend on ETS forecasting. The prediction obtained from the ARIMA model is that apple, banana, grape, pineapple, and mango production may reach up to 2790.8844, 35096.4473, 3465.5880, 1997.4287, and 44378.1405 thousand tons, respectively, in 2027.

Forecasting using ETS

The data in Table 4 below shows the parameters of the ETS technique. Hence, the smoothing factor α indicated a random walk in the PIN and MAN series since the α index is close to 1, in contrast to APP, BAN, and GRA series. Furthermore, it is clear from the β index that the values are equal to zero in most cases, meaning there are no trend changes.

Parameters of ETS models

Fruit Parameters Initial states

α β ϕ level trend
APPLE 0.000 0.000 0.965 172.295 1.055
BANANA 0.165 0.000 1.000 2133.632 1.049
GRAPE 0.000 0.000 0.430 62.190 1.070
PINEAPPLE 0.944 0.000 - 39.588 33.480
MANGO 0.728 0.055 - 6907.784 0.055

Moreover, the damping index ϕ is held in APP, BAN, and GRA series with values close to one, decreasing the effect of the trend over time. The forecasting obtained from the ETS model indicated that apple, banana, grape, pineapple, and mango production may reach up to 4741.6803, 45968.0357, 5574.1481, 2044.5578, and 33998.0305 thousand tons in 2027 (Fig. 2).

The last step in this study is to provide a forecast evaluation using three different tests: Diebold–Mariano (2002), Giacomini–White (2006), and Clark–West (2007), to examine the forecasting accuracy between the two techniques in our study. Are they the same or not? As mentioned before, the ETS results are the best in our study. This result is confirmed using the three tests in Table 5, where we reject the null hypothesis of similarity between the two forecasts, especially the Clark–West test.

Results of statistical tests to compare actual and forecast results between ARIMA and ETS models

Fruit Diebold–Mariano test Giacomini–White test Clark–West test
statistic p statistic p statistic p
APPLE 0.235 0.814 0.116 0.733 3.734 <0.000
BANANA 4.746 0.000 9.879 0.001 8.209 <0.000
GRAPE 6.017 <0.000 16.849 <0.000 10.337 <0.000
PINEAPPLE 2.052 0.046 3.182 0.074 4.163 <0.000
MANGO 4.142 <0.000 6.835 0.008 6.953 <0.000
CONCLUSIONS

The strategy of fruit production in India is supposed to be developed based on accurate predictions and the best forecasting models. This study focused on the forecasting behavior of production for apples, bananas, grapes, mangoes, mangosteens, guavas, and pineapples in India using data from 1961–2015 (training data set) and 2016–2020 (testing data set). Two unit root tests were used, the Ng–Perron (2001) test and the Dickey–Fuller test with bootstrapping critical values depending on the Park (2003) procedure. The results show that all variables are stationary at first differences. ARIMA and ETS models were used and compared based on goodness of fit (RMSE, MAE, MAPE, Theil, BP, and VP); the results indicated that the ETS model was the best in all the cases, as the predictions using ETS had the smallest errors and deviations between forecasting and actual values. This result was confirmed using three tests (Diebold–Mariano, Giacomini–White, and Clark–West). The ETS model can consider different trend and seasonal component combinations besides capturing the data series’ complex nonlinear nature. Also, the stationarity process is not required, and it is often easier to deal with missing data. The ETS model outperforms other models such as ARIMA in terms of small sample size, nonnormal data distribution, nonstationarity, trend, and seasonality. According to the best models, forecasts for production during 2021–2027 were obtained. In terms of production, an increase is expected for apples, bananas, grapes, mangoes, mangosteens, guavas, and pineapples in India during this period. These current projection results may enable policy-makers in India to establish more active nutrition security and sustainability programs, as well as improved fruit production regulations, in the coming years. Overall, fruit production forecasting empowers policymakers with crucial information for sustainable agriculture planning, resource allocation, market regulation, and food security strategies. By leveraging this data effectively, policymakers can make informed decisions to enhance fruit production, improve food security, and promote sustainability in the agricultural sector.

eISSN:
2353-3978
Lingua:
Inglese
Frequenza di pubblicazione:
2 volte all'anno
Argomenti della rivista:
Life Sciences, Biotechnology, Plant Science, Ecology, other