À propos de cet article

Citez

INTRODUCTION

Drought is one of the common environmental factors affecting plant growth and productivity in agricultural systems. Reduced water availability (water stress) induces numerous physiological, biochemical, and molecular changes in plant organs, lowering the crop yield's size and quality (Klamkowski et al. 2015). Since most forecasting scenarios predict a further increase in the temperature with the same level or even a decrease in precipitation (Schneider et al. 2010; Polade et al. 2017) and water is essential for irrigation in areas with insufficient rainfall, water availability determines, in extreme cases, the ability of regions and even entire countries to develop.

The main weather factors limiting crop cultivation are the amount of solar radiation, the height and fluctuation of temperature, and the amount of precipitation. Chartzoulakis and Bertaki (2015) highlighted that the most popular irrigation methods are low efficiency, as the crops use less than 65% of the water applied to the field. Aryalekshmi et al. (2021) stated that the leading cause of water resource depletion is irrigation, which is responsible for about 70% of water consumption worldwide. Increasing the efficiency of water usage requires not only the use of modern irrigation equipment, but also the implementation of precise methods for determining the water needs of plants (Gu et al. 2020). The timing of irrigation can be determined based on plant criteria (Klamkowski & Treder 2002; Jones 2004; Fernández & Cuevas 2010), soil criteria (Vereecken et al. 2008; Yu et al. 2021; Treder et al. 2022), and climate (Doorenbos & Pruitt 1977). Climatic criteria for irrigation assume that water requirements are determined mainly by the course of the weather and the phase of plant development (Ley et al. 1994). In the 1940s, Thornthwaite (1948) described the theoretical basis for the effects of climatic conditions on evaporation as a means of water loss from plant and soil surfaces. Evaporation parameters are used to estimate the plants’ water needs, determine climatic water balances, manage water resources, design efficient irrigation systems, and estimate plant growth. Accurate estimation of evapotranspiration is crucial for sustainable water management in agriculture (Wanniarachchi & Sarukkalige 2022).

Farmers can make irrigation decisions based on weather conditions. Estimating the water requirements of a specific crop (ETc) can be done by multiplying the crop coefficient (K) by the reference evapotranspiration (ETo).

ETc=K×ETo. {\rm{E}}{{\rm{T}}_{\rm{c}}} = {\rm{K}} \times {\rm{E}}{{\rm{T}}_{\rm{o}}}.

Reference evapotranspiration is determined using evaporometers or estimated using mathematical models (Allen 1993; Allen et al. 1998; Yuan et al. 2003). So far, many models have been developed to determine evapotranspiration (Doorenbos & Pruitt 1977; Hargreaves & Samani 1985; Allen 1993; Gocic & Trajkowic 2010; Sentelhas et al. 2010; Wanniarachchi & Sarukkalige 2022). The Food and Agriculture Organization of the United Nations (FAO) has defined a standard method for estimating ETo that incorporates the Penman–Monteith equation, which was modified and reformulated by Allen et al. (1998) as the reference equation (FAO-PM56). Pereira et al. (2015) report that comparative studies conducted under different climatic conditions demonstrated the superiority of the FAO PM method for estimating ETo over other equations and empirical methods. However, the application of this equation is limited due to the availability of meteorological data. According to Gocic and Trajkovic (2010), these data are not readily available for some locations, especially in developing countries, and may have questionable reliability. The Penman–Monteith model is used in automatic weather stations, software solutions such as Crop-wat 8.0 and ETo calculator (Gabr 2022; Lykhovyd 2022), and online computing platforms, e.g., www.nawadnianie.inhort.pl/eto (Treder et al. 2013).

The need for a complete set of meteorological data and information on solar radiation limits the widespread use of the Penman–Monteith equation. Unfortunately, in most cases, the most expensive component of an automatic weather station is the radiation probe. In agriculture, automatic weather stations are used primarily to monitor basic parameters such as air temperature and humidity, precipitation, and wind strength and direction, sufficient to predict the occurrence of diseases and pests. Due to their high price, most stations are not equipped with a radiation sensor and do not determine evapotranspiration. Thus, it is necessary to use machine learning to estimate evapotranspiration with limited data access. The application of machine learning techniques in hydrology and environmental studies, including agriculture, has increased recently (Yamaç & Todorovic 2020; Treder et al. 2023). The large amount of data obtained using automatic weather stations allows machine learning programs to determine evapotranspiration (Kumar et al. 2011; Mehdizadeh 2018). Cobaner (2011) developed a method for estimating evapotranspiration based on a fuzzy system trained by a learning algorithm derived from neural network theory. The neuro-fuzzy model is also based on solar radiation, air temperature, and relative humidity. Adnan et al. (2017), Antonopoulos and Antonopoulos (2017), and El-Magd et al. (2023) also demonstrated the usefulness of the machine learning method for determining evapotranspiration with limited availability of measurement data.

The present study investigated the performance of four machine learning algorithms (regression trees, boosted trees, random forests, and artificial neural networks) for estimating ETo based on incomplete meteorological data.

MATERIALS AND METHODS

Weather conditions were recorded using an automated meteorological station (Metos, Pessl Instruments, Austria) located in the Experimental Orchard in Skierniewice (Poland; λ = 20°16′; φ = 51°95′; altitude = 120 m) in the years 2009–2022. The measured meteorological parameters were air temperature (T), level of solar radiation (solar irradiance, SR), air relative humidity (RH), and wind speed at 2 m height (U2). The following parameters were determined based on the meteorological data for each day from April 15 to October 15.

Extraterrestrial solar radiation onto a horizontal surface at the top of the atmosphere (Ra) was calculated using the application located at www.engr.scu.edu/~emaurer/tools/calc_solar_cgi.pl. Ra is constant for a specific time and place on Earth.

Vapor pressure deficit – VPD (kPa); VPD=esea; {\rm{VPD}} = {{\rm{e}}_{\rm{s}}} - {{\rm{e}}_{\rm{a}}};

es – saturation vapor pressure;

es = 0.6108 × exp(17.27 × T/(T + 237.3));

ea – actual vapor pressure;

ea = (RH × es/100);

T – air temperature;

RH – relative air humidity.

Evapotranspiration – ETo (mm) is determined by the automatic weather station according to the Penman–Monteith formula (Allen et al. 1998). ETo=0.408Δ(RnG)+γ900T+273u2(esea)Δ+γ(1+0.34u2) {ET}_o = {{0.408\,\Delta \,({R_n} - G) + \gamma {{900} \over {T + 273}}{u_2}({e_s} - {e_a})} \over {\Delta + \gamma (1 + 0.34{u_2})}}

ETo – reference evapotranspiration (mm·day−1);

Δ – slope vapor pressure curve (kPa·C−1);

Rn – net radiation at crop surface (MJ·m−2·day−1);

G – soil heat flux density (MJ·m−2·day−1)

T – mean daily air temperature at 2 m height (°C);

u2 – mean daily wind speed at 2 m height (m·s−1);

es – saturation vapor pressure (kPa);

ea – actual vapor pressure (kPa);

es−ea = VPD (kPa);

γ – psychrometric constant (kPa·°C−1).

This study used four machine learning techniques to estimate evapotranspiration (ETo). These techniques included regression trees (RT), boosted trees (BRT), random forests (RF), and artificial neural networks (ANN).

Before assessing the possibility of using machine learning for ETo estimation, a correlation matrix between meteorological input variables and ETo was developed (Table 1). Due to the low correlation with ETo (0.05), the wind speed data were not considered for further analyses.

Pearson correlation coefficients between daily evapotranspiration (ETo) and meteorological data

SR Tavg Tmax RH U2 Ra VPD #D
ETo 0.94 0.66 0.72 −0.69 0.05 0.62 0.82 −0.41

SR – average daily level of solar radiation, Tmax – maximum daily air temperature, Tavg – mean daily temperature, RH – mean air relative humidity, U2 – wind speed at 2 m height, Ra – extraterrestrial solar radiation, VPD – vapor pressure deficit, #D – day number of the year

To set up the models, the meteorological variables (mean and maximum air temperatures, average air humidity, VPD, average level of solar radiation, extraterrestrial solar radiation, and day number of the year) were used as input.

Due to difficulties in obtaining data on actual solar radiation by farmers, two calculation scenarios were used in the simulation: 1 – data without solar radiation and 2 – complete data with solar radiation measured at a height of 2 meters.

Machine learning methods
Regression trees

Regression tree (RT) is a tree-based method in which each path starts at a root node representing a sequence of data divisions until the result is reached at a leaf node. This method can be used for classification and regression. The ultimate goal of this method is to obtain a model that can predict the search value for this specific scenario by learning simple decision rules (Yang 2019). A classic algorithm was used to classify and analyze regression trees (Breiman et al. 1984).

Boosted trees

Boosted regression tree (BRT) models combine two techniques: decision tree algorithms and boosting methods. Boosting is a committee-based approach that can be used to improve the accuracy of classification or regression methods. Boosting uses a weighted average of the results obtained by applying the prediction method to various samples (Sutton 2005).

Random forests

Random forest (RF) fits multiple classification trees into a dataset and combines the predictions from all the trees. The predicted class of an observation is calculated by a majority vote of the out-of-bag predictions for that observation with ties split randomly (Cutler et al. 2007).

Artificial neural networks

The artificial neural network (ANN) algorithm comprises three layers: the input layer, the hidden layer, and the output layer. In the ANN algorithm, data are placed in the input layer to train the model, weights are obtained in the hidden layer, and prediction results are generated in the output layer (Xu et al. 2018). This method can learn from a training dataset and store the data pattern as weighted neuron connections. After training, when new data are applied to the ANN algorithm, it recognizes and classifies the pattern based on the data. The ANN model is one of the best-known and widely adopted machine learning methods used for modeling in hydrological research (Kumar et al. 2011).

The calculations were performed using STATISTICA v. 12 package (StatSoft, USA). The analyses were performed using the tenfold cross-validation mode. The performance of the different machine learning models was evaluated using the mean squared error (MSE), root mean square error (RMSE), coefficient of determination (R2), and slope of regression forced through the origin between measured and simulated ETo.

MSE=i=1n(SiOi)2nRMSE=i=1n(SiOi)2n \matrix{{{\rm{MSE}} = {{\sum\nolimits_{{\rm{i}} = 1}^{\rm{n}} {{{({S_i} - {O_i})}^2}}} \over {\rm{n}}}} \cr {{\rm{RMSE}} = \sqrt {{{\sum\nolimits_{{\rm{i}} = 1}^{\rm{n}} {{{({S_i} - {O_i})}^2}}} \over {\rm{n}}}}} \cr}

Oi – observed value;

Si – value generated by the machine learning models.

RESULTS AND DISCUSSION

Evapotranspiration strictly depends on the amount of solar radiation, air temperature, and humidity (Fig. 1), which is accounted for by the Penman–Monteith model. In our experiment, in 88% of cases, the average radiation determined the amount of evapotranspiration.

Figure 1.

Relationship between evapotranspiration (ETo) and important weather parameters (Skierniewice 2009–2022)

In the Polish temperate climate, solar radiation has the most critical impact on ETo of all the weather parameters. Allen et al. (1998) state that radiation-based methods perform better in humid environments where the aerodynamic term of the combination equation is relatively small. They are commonly used in crops grown under cover since irrigation control by solar radiation is an efficient irrigation scheduling method in greenhouses (Kim et al. 2010). Yamaç and Todorovic (2020), who conducted research in the dry climate of Southern Italy, showed that air temperature had the highest correlation with potato evapotranspiration, with radiation being in second place. Hargreaves and Samani (1985) confirmed that air temperature methods for estimating ETo showed the best performance in arid and semi-arid regions. However, regardless of the climate zone, solar radiation and air temperature are the main factors influencing the ET process (Tang et al. 2018). A very high coefficient of determination (R2 = 67%) has also been found between evapotranspiration and VPD (Fig. 1). VPD affects evapotranspiration and the efficiency of water use by crops. Tanner and Sinclair (1983) indicated that VPD is the dominant factor influencing the relationship between plant dry matter production and water use. Vapor pressure deficit (VPD) is the difference between the amount of moisture in the air and the amount of moisture it can hold once saturated. Instead of relative humidity, a more accurate way to express the driving force for water loss from a leaf is VPD since its value is independent of temperature. VPD is a critical parameter calculated in evapotranspiration models, especially in combination type equations and Penman–Monteith type formulas (Howell & Dusek 1995).

Preliminary analysis of the data showed that daily evapotranspiration correlated better with maximum air temperature than with average air temperature because stomatal transpiration occurs primarily during the day (when maximum daily temperatures occur). This result indicates that the maximum temperature value helps estimate evapotranspiration using simplified models. Adnan et al. (2017), who described the relationship between evapotranspiration and various meteorological parameters, also showed a higher covariance value for maximum temperature than average or minimum temperature. On the other hand, Yamaç and Todorovic (2020) reported that the correlation between potato ETc and mean air temperature in dry climates was higher than with other meteorological variables.

Figures 2, 3, and 4 show the importance of individual arguments for calculations carried out on models (RT, BRT, and RF) with and without average solar radiation. In the case of ANN, the computing software did not make such calculations available to the user. When radiation is included in the calculations, its weight equals 100. The following essential parameters are VPD, maximum temperature, relative humidity, and average air temperature. When solar radiation was omitted from the analyses, the VPD value and maximum temperature were most important. Air humidity deficit significantly affects the rate of transpiration, which is why Penman and Monteith have included the formula to calculate it in their model. VPD is determined solely from air temperature and humidity, so it may be useful for more accurate evapotranspiration estimates when access to meteorological data is limited. If radiation is omitted, the importance of Ra and the day number of the year, which positions the day for which calculations are carried out during the growing season, also increases significantly. The actual solar radiation reaching the ground is part of the solar radiation reaching the Earth's atmosphere and, like solar radiation, it initially increases in spring and decreases at the end of the growing season (Fig. 5). In models used to determine evapotranspiration, the calculated amount of solar radiation reaching the Earth's atmosphere (Ra) can partially replace the actual solar radiation reaching the Earth's surface (Hargreaves & Samani 1985).

Figure 2.

The importance of variables when creating regression trees

SR – average solar radiation level, VPD – vapor pressure deficit, Tmax – maximum temperature, Tavg – average temperature, RH – relative humidity, Ra – extraterrestrial solar radiation, #D – day number of the year

Figure 3.

The importance of variables when creating boosted trees

Note: see Figure 2

Figure 4.

The importance of variables when creating random forests

Note: see Figure 2

Figure 5.

Net changes in extraterrestrial solar radiation (Ra) and the average level of solar radiation (SR) during the vegetation period (Skierniewice 2009–2022)

Four machine learning algorithms were run for training and testing subsets with two scenarios of available meteorological data: with and without solar radiation. The overall results of modeling performance (for measured data and those generated with individual models) are represented by statistical indicators in Table 2. In general, for all four models, using solar radiation data in the calculations improved the quality of evapotranspiration prediction. It is an obvious result because the intensity of transpiration depends on the level of solar radiation, which the PM model also considers. The use of more advanced machine learning methods improved the quality of prediction. However, even evapotranspiration prediction using ANN without actual solar radiation data was inferior to the random tree method using solar radiation data. The results showed the usefulness of machine learning methods for estimating evapotranspiration even without data on the actual level of solar radiation.

Statistical analysis of the performance of the RT, BRT, RF, and ANN models in estimating daily ETo with two different meteorological input datasets

Model Radiation R2 Slope MSE RMSE
Regression trees + 0.911 0.911 0.108 0.329
0.813 0.813 0.228 0.478
Boosted trees + 0.942 0.931 0.073 0.269
0.834 0.825 0.205 0.453
Random forests + 0.952 0.895 0.066 0.256
0.841 0.799 0.207 0.455
Artificial neural networks + 0.963 0.947 0.023 0.152
0.870 0.843 0.082 0.286

R2 – determination coefficient, MSE – mean squared error, RMSE – root mean square error

The statistical results of ETo estimation using the RT model indicated the model's lowest performance when evapotranspiration was estimated without solar radiation data. When average solar radiation data was used, the slope of regression forced through the origin was improved from 0.813 to 0.911, R2 from 0.813 to 0.911 (the same values for the slope and R2 were recorded), MSE from 0.228 to 0.108, and RMSE from 0.478 to 0.329.

In the other models, we observed an increase in regression coefficient (R2) and a decrease in the value of prediction errors MSE and RMSE using meteorological data with solar radiation. Yamaç and Todorovic (2020) stated that solar radiation is crucial to accurately estimate daily potato evapotranspiration using machine learning models. In our research, the highest level of prediction was achieved using the ANN model. As with the other models, when average solar radiation data were used, the slope of the origin-forced regression improved from 0.843 to 0.947, R2 from 0.870 to 0.963, MSE from 0.082 to 0.023, and RMSE from 0.286 to 0.152. These results agree with the findings of Aghajanloo et al. (2013), Tang et al. (2018), and Yamaç and Todorovic (2020), who indicated that the ANN method with a complete set of meteorological data performed better than the other machine learning methods.

CONCLUSIONS

Our research confirmed previous observations that machine learning methods can be widely used in agriculture, e.g., in estimating evapotranspiration. The results of this study demonstrated that the applied models (RT, BRT, RF, and ANN) could describe nonlinear relationships between weather parameters and evapotranspiration. The accuracy of evapotranspiration estimation depended on the type of input variables and the machine learning model used. The highest level of evapotranspiration prediction was obtained using the ANN model. Including solar radiation data in the calculations improved the quality of evapotranspiration prediction in all four models. In the absence of data on the actual solar radiation reaching the Earth's surface, it is advisable to supplement the input data with data on extraterrestrial solar radiation and the day number of the year. Vapor pressure deficit is also an important parameter for determining evapotranspiration using machine learning models. Machine learning models have advantages over empirical equations since they require no knowledge of internal variables and offer simple solutions for nonlinear and multi-variable functions. Therefore, they can be useful in areas and situations with limited access to meteorological data.

eISSN:
2353-3978
Langue:
Anglais
Périodicité:
2 fois par an
Sujets de la revue:
Life Sciences, Biotechnology, Plant Science, Ecology, other