Acceso abierto

Regional-Scale Analysis of Vegetation Dynamics Using Satellite Data and Machine Learning Algorithms: A Multi-Factorial Approach


Cite

Introduction

Human beings benefit significantly from resources provided by Earth's unique ecosystems. Social well-being, resource management, and environmental planning could all benefit from accurate mapping of various ecosystems [1, 2]. For a wide range of end-users, accurate information on vegetation spreading on a global or regional scale is becoming increasingly important [3]. This is because climate change is hastening the pace of vegetation dynamics and the rapid growth of the human population [4]. The earliest stages of vegetation mapping relied heavily on specialists’ expertise in identifying vegetation class borders [5]. This strategy is not only limited in its applicability but also time-consuming, despite being fairly precise at a small regional scale. The application of the remote sensing (RS) technology has substantially improved the mapping accuracy and efficiency [6, 7]. Different types of RS data, such as aerial photography, Landsat, Sentinel-2, IKONOS, and the Moderate Resolution Imaging Spectroradiometer (MODIS), may allow map makers to generate vegetation maps at global and regional scales [8,9,10,11]. At various cartographic scales, satellite images can be used to map the vegetation cover. In addition, because they register a broad geographic region on a regular basis, they include valuable sources of data for mapping vegetation [12, 13]. However, the accuracy of satellite image classification highly depends on the method used to execute it and the availability of data [14]. Traditional supervised approaches [15], such as the Mahalanobis distance, minimum distance, and maximum likelihood, and unsupervised methods [16], such as iterative self-organizing data analysis technique (ISODATA) and K-means, are used to perform image classification and vegetation mapping tasks. Nevertheless, new methods are needed to address this problem, particularly in light of technological advancements that have enabled the development of sensors capable of acquiring images with high spatial, spectral, and temporal resolutions, resulting in a significant volume of data to be evaluated [17].

Machine learning (ML) methods are used for data processing. ML methods are commonly used to analyze remote sensing data [18, 19]. Compared with traditional linear methods, ML approaches allowed for the establishment of nonlinear and non-parametric interactions between dependent and independent variables, leading to an overall improved performance [20]. These classifiers can model various types of data reliably. ML algorithms have been used in several studies to map the spatial distribution of vegetation regions. For example, Macintyre et al. [21] applied tasseled cap transformations and principal component analysis (PCA) to analyze various multi-temporal Sentinel-2 images. They fed the data into four ML approaches, namely, classification tree (CT), RF, K-nearest neighbor (KNN), and SVM, to distinguish vegetation species. They found that the methods produced a satisfactory result, but they have shown that more research is needed to see if these findings can be replicated in different vegetation types and areas. According to another study [22], ML algorithms can be used to map vegetation cover, and they are particularly useful when training data contain a significant number of observations and variables. Michez et al. [23] verified the accuracy of ML learners for invasive tree-mapping tasks with unmanned aerial vehicle (UAV) imagery in riparian zones. They computed spectral and textural properties and visible and near-infrared data at different scales and used a supervised classification method according to the RF method to identify the most significant variable. The performance of RF and SVM algorithms for cork oak woodlands categorization from UAV images was also studied in a previous study [24]. Parente and Ferreira [25] applied RF to map the pastureland from MODIS data in Brazil and obtained 80% accuracy. In one other study [26], different ML algorithms for crop mapping were from Landsat 8 data in Ukraine and achieved an accuracy of around 75% using the CART method. Johansen et al. [27] used Landsat 5 and 7 images to map woody vegetation in Australia using ML methods and found that CART and RF methods provided accurate vegetation maps. Sluiter and Pebesma [28] used Landsat 7 images, airborne imaging spectrometer (HyMap), and ASTER optical bands to classify the natural vegetation in the Mediterranean area. They showed ML approaches outperformed traditional statistics-based methods and yielded up to 75% accuracy. Another study [29] confirmed that in comparison to SVM and NN methods, the RF classifier produced higher classification accuracies, was less susceptible to training sample quality, and took less training time. However, it is difficult to classify the images and identify the vegetation cover from satellite images accurately because of the presence of obstacles such as buildings, roads, and shadows, which appear as noise in the images [5]. Moreover, it remains challenging to improve vegetation mapping accuracy in heterogeneous landscapes due to (a) the lack of vegetation field survey data because of time-consuming and expensive fieldwork [3], and (b) existing approaches are incapable of distinguishing small changes in vegetation types based on spectral information [5]. Therefore, for vegetation cover mapping, more factors and information that may affect the distribution of vegetation and the classification accuracy should be considered.

Cloud-based computing services may enable effective image processing, such as the classification of huge amounts of image data using ML approaches [5]. Google Earth Engine (GEE) is a cloud-based geospatial analysis platform [30] that is free to use. Gaining access to a large volume of RS data and pre-processing have become more convenient with GEE. GEE empowers researchers and practitioners across various domains. It facilitates comprehensive analyses of land use and land cover changes, contributing to assessments of urban expansion, disaster impacts, and water resource management. Moreover, GEE supports research on climate change, biodiversity monitoring, and agricultural analysis, leveraging both historical and current satellite data. GEE's capabilities extend to air quality assessment, forest carbon monitoring, and infrastructure planning, proving invaluable for crisis response efforts and educational initiatives. The platform's ability to handle extensive remote sensing and geospatial data continues to unlock innovative solutions for a wide array of environmental and societal challenges. For implementation, GEE offers a variety of classification methods, such as naïve Bayes classifier, SVM, decision tree, and RF, in which RF is the most widely used categorization method [31]. Therefore, the aim of this study is to analyze the season-based vegetation cover using the ML classification method (RF). The RF method is a robust choice for analyzing the seasonal vegetation cover due to its strengths in capturing complex relationships, managing multicollinearity, avoiding overfitting, and assessing feature importance. Given the intricate factors affecting vegetation across seasons, its resilience to noise and outliers makes it an ideal choice for remote sensing data. Additionally, this method's ability to generalize, perform parallel processing, and offer reasonable interpretability further endorses its suitability. By leveraging these advantages, RF can provide accurate predictions of and valuable insights into the key drivers of seasonal vegetation cover changes. The aim of the current study is to analyze the spatial distributions of vegetation during different seasons (e.g., summer, autumn, and winter) in 2021 using Sentinel-2 data for a large-scale area (e.g., Greater Sydney region, Australia) with 12,368.2 km2 on the cloud-based GEE platform.

The following is a list of the study's major contributions: (1) With the aid of the ML technique (RF) and cloud-based GEE platform, we assessed the vegetation spatial distribution and the area of pixels in each class during the seasons of the year by analyzing a large amount of Sentinel-2 data. (2) We used multiple data, such as topographic factors, textural information, spectral indices, and spectral bands, to improve the classification and overcome the aforementioned issues of ML in the classification with limited input variables. (3) We used each feature contribution information to arrange the features based on their importance of vegetation mapping for each season and to see which factors show a high contribution to classification. The presented method has not been applied for the given task, especially on a regional scale similar to the Greater Sydney region, Australia. Moreover, we compared our findings obtained by RF+all factors with those of other researchers’ works to demonstrate the efficacy of the proposed method for season-based vegetation analysis using multi-temporal data in a complex regional setting.

Materials and Methods

Data collection, classification and accuracy assessment, and evaluation of feature importance were the three primary processes in the classification process. First, GEE was used to collect surface reflectance data for 2021, which was then separated into three seasonal intervals (e.g., summer (1st of December to the 28th of February), autumn (1st of March to the 31st of May), and winter (1st of June to the 31st of August)). Then, we derived spectral indices, topographic factors, and gray-level co-occurrence matrices (GLCMs). The RF method and a confusion matrix were used in the second stage to perform image classification and accuracy evaluation. We also provided a feature selection process using the RF model to determine how the contribution of each feature affects the classification results. Finally, we compared the generated vegetation maps by the proposed RF model with other researcher's works to further investigate the benefit of the proposed method and multi-temporal Sentinel-2 data to map the vegetation cover. Figure 1 depicts the general procedures involved in creating season-based vegetation maps, and more information on each phase will be provided in the following sections.

Figure 1:

Overall flowchart of vegetation classification process in the GEE platform.

Data and Study Area

The test region is Greater Sydney, which is located at 33.8048° S, 150.7214° E on Australia's east coast with 12,368.2 km2 (Figure 2). Grasslands and woodlands are the most common types of land cover, with a variety of other land uses. To implement the classification, multi-temporal Sentinel-2 satellite imagery was obtained from GEE datasets. Sentinel-2 is a significant orbital platform for monitoring and studying the global vegetation. Sentinel-2 data have gained immense popularity in the remote sensing field due to its higher spatial and temporal resolution, as well as its global coverage and free access (Gašparović and Jogun 2018). It is the latest generation of the European Space Agency (ESA)'s Earth observation mission, with the high spatial (10–60 m) and temporal (10 days/5 days) resolution. Sentinel-2 data on local and regional scales provide information on land surface reflectance for a variety of wavelengths because Sentinel-2's high-resolution multispectral sensor operates on thirteen different bands, three of which have a resolution of 60 m, six of which have a resolution of 20 m, and four of which have a resolution of 10 m (Mylona et al. 2018). The dataset used in this study contains different cloud-free images for three seasons, namely, summer, autumn, and winter (each season with 3 months). For each season, we collected and processed the Sentinel-2 data separately. The number of images that were processed for each season was 39 images for summer, 61 images for autumn, and 55 images for winter. Then, from all of the pixels in the stack, we produced a composite image by applying selection criteria to each pixel. The median () function was used to generate a composite in which each pixel value represents the median of all pixels in the stack. In total, we processed 155 Sentinel-2 images for the classification and spatial distribution evaluation of vegetation during each season. To create high-resolution vegetation maps, we used 10 bands (blue, green, red, vegetation red edge, near-infrared, and SWIR) with the 10-m and 20-m resolution out of 13 bands (Table 1) available for Sentinel-2 data. The red-edge and short-wave bands with 20-m spatial resolution were resampled to the spatial resolution of visible to near-infrared bands with 10 m to maintain the spatial consistency across the Sentinel-2 data. We used Sentinel images with a spatial resolution of 10 m, which can be used to explore the Earth's surface in more detail. In addition, we illustrated the usage of the red edge (RE) and SWIR bands, which improved the classification accuracy, especially for vegetation [32].

Figure 2:

Greater Sydney region, Australia.

Spatial and spectral resolutions of Sentinel-2 satellite data.

Band Central wavelength (nm) Spatial resolution (m)
Coastal aerosol 443 60
Blue 490 10
Green 560 10
Red 665 10
Vegetation red edge 705 20
Vegetation red edge 740 20
Vegetation red edge 783 20
NIR 842 10
Vegetation red edge 865 20
Water vapor 945 60
SWIR-Cirrus 1,380 60
SWIR 1,610 20
SWIR 2,190 20
Training and Testing Samples

In this work, we defined the classification scheme as dominance vegetation types such as trees, grass areas, crops, and non-vegetation areas such as built-up area and water body. With the use of Google Earth images, exploration of the true and false color composites of the Sentinel-2 data, and expert knowledge, ground truth samples were obtained.

Input Data

To characterize training samples and differentiate between various vegetation types, a time-series of input variables was gathered. For the season-based vegetation mapping of the Greater Sydney region in 2021, a total of 67 input variables, including topographic parameters, spectral indices, textural information, and spectral bands, were used for each season, as given in Table 2. The variables were formed as the average of the 3 months. The Shuttle Radar Topographic Mission (SRTM) digital elevation layer with a spatial resolution of 30 m was used to calculate topographic elements, such as aspect, slope, and elevation (three variables). A GLCM [33, 34] in the 7*7 neighborhood of each pixel was used to calculate texture information, while a five-layer image stack with variance, contrast, dissimilarity, homogeneity, and correlation was constructed. We used a function called glcmTexture (size, kernel, average) in GEE to calculate GLCM. According to the experimental results in earlier investigations, the best classification outcomes were obtained with a kernel size ranging from 5*5 to 13*13 [35], and the 7*7 kernel size was proven to be optimal for vegetation classification in this study. Correlation calculates the linear relationship between the gray levels of adjacent pixels. When the local areas have significant contrast, dissimilarity acquires large values and rises linearly. Contrast is a measure that shows the number of variations in an image. Homogeneity weighs the values that decrease exponentially as the distance from the diagonal increases by inverting the contrast weight. Variance measures the distribution of pixels’ gray levels. We calculated the GLCM around each pixel of every band, and 50 elements were generated in total. For the spectral bands, we used the bands with 10-m and 20-m spatial resolution (10 variables), as discussed in Section 2.1. Then, we calculated vegetation indices, such as the normalized difference tillage index (NDTI) [36], modified normalized difference water index (MNDWI) [37], normalized difference built-up index (NDBI) [38], and normalized difference vegetation index (NDVI) [39] based on Eqs (1)–(4). NDTI=SWIR1SWIR2SWIR1+SWIR2 {\rm{NDTI}} = {{{\rm{SWIR}}1 - {\rm{SWIR}}2} \over {{\rm{SWIR}}1 + {\rm{SWIR}}2}} MNDWI=GreenSWIR1Green+SWIR1 {\rm{MNDWI}} = {{{\rm{Green}} - {\rm{SWIR}}1} \over {{\rm{Green}} + {\rm{SWIR}}1}} NDBI=SWIRNIRSWIR+NIR {\rm{NDBI}} = {{{\rm{SWIR}} - {\rm{NIR}}} \over {{\rm{SWIR}} + {\rm{NIR}}}} NDVI=NIRRedNIR+Red {\rm{NDVI}} = {{{\rm{NIR}} - {\rm{Red}}} \over {{\rm{NIR}} + {\rm{Red}}}}

The number of input variables for the RF method used to create season-based vegetation maps.

Category Description Input variables number
Topographic Elevation, slope, aspect 3
Spectral bands Blue, green, red, vegetation red edge, near-infrared, and SWIR 10
Spectral indices NDVI, NDBI, MNDWI, NDTI 4
Textural information Variance, contrast, dissimilarity, homogeneity, correlation 5×10
Total variable 67
Random Forest (RF) Method

We applied RF to classify Sentinel-2 images and generate season-based vegetation maps for 2021 due to its efficiency and robustness. Breiman [40] suggested RF as an ensemble learning approach. When running RF, only a few parameters are required to be specified compared to other machine learning approaches, such as SVM and ANN [40]. Furthermore, RF is increasingly being used in remote sensing domains for image classification and vegetation mapping [41,42,43]. RF is composed of several base learners, such as classification and regression trees (CART), which can be calculated as follows: { h(x,θk),k=1,2,i } \left\{ {{\rm{h}}\left( {{\rm{x}},{\theta _k}} \right),{\rm{k}} = 1,2, \ldots {\rm{i}} \ldots } \right\} where h, x, and θκ denote the RF classifier, input variable, and random predictor variables that are utilized for creating every CART tree, respectively. The output of all decision trees involved is used to calculate the RF's final response. Figure 3 shows a schematic representation of the RF for vegetation classification. The success of RF depends on the design of each decision tree that makes up the forest [40]. This procedure uses two steps involving random selection. To build each decision tree, the initial phase uses a bootstrap technique [40] to select the training samples and out-of-bag (OOB) data with replacement randomly. The split conditions for each node in the decision tree are determined in the second random sample stage [40]. To split each tree using the Gini index [40], which is a measurement of heterogeneity, a subset of the predictor variables is randomly picked. Because only a random selection of predictor variables is used, there is less connection between trees and a higher generalization capability. RF also has the benefit of being able to quantify the input variable importance [40], which reveals the contribution to classification accuracy. RF simply requires two parameters [44]: the number of predictor variables (mtry) and the number of trees (ntree) to grow into an entire forest to be randomly chosen, which were set to the square root of the input variables and 300, respectively, in this study. As ntree increases, OOB error decreases. When ntree exceeds a specific threshold, the OOB error is convergent, according to the law of large numbers [44]. The RF is computationally light and unaffected by the outliers or the parameters utilized to run it [41]. When opposed to individual decision trees, overfitting is less of a concern, and it is not required to prune the trees, which is a cumbersome job [44]. Furthermore, determining which parameters to use is simple. All the RF's features may bring up numerous options for classifying complex Sentinel-2 imagery to generate high-quality vegetation maps.

Figure 3:

The schematic diagram of RF for season-based vegetation mapping.

Accuracy Assessment Metrics

In this work, we calculated the accuracy of the proposed model for season-based vegetation mapping based on F1 score, precision, recall, overall accuracy (OA), and kappa coefficient [45, 46]. OA is a simple and straightforward summary assessment of the likelihood of a case being categorized properly. The degree of concordance between classified data and reference data is represented by the kappa coefficient. It not only considers the OA but also the variations in the number of samples in each category [47]. The F1 score is a quantitative indicator that assesses the balance between recall and precision in unbalanced training data. Recall, also called sensitivity, refers to the number of real pixels recognized in each category. Precision shows how many correct pixels are identified for every category. Based on true positive (TP), false negative (FN), false positive (FP), and true negative (TN), the aforementioned metrics can be calculated as follows: F1score=2×Precision×RecallPrecision+Recall {\rm{F}}1\;{\rm{score}} = {{2 \times {\rm{Precision}} \times {\rm{Recall}}} \over {{\rm{Precision}} + {\rm{Recall}}}} Precision=TPTP+FP {\rm{Precision}} = {{{\rm{TP}}} \over {{\rm{TP}} + {\rm{FP}}}} Recall=TPTP+FN {\rm{Recall}} = {{{\rm{TP}}} \over {{\rm{TP}} + {\rm{FN}}}} OA=TP+TNN {\rm{OA}} = {{{\rm{TP}} + {\rm{TN}}} \over {\rm{N}}} Kappa=p0pe1pe {\rm{Kappa}} = {{{{\rm{p}}_0} - {{\rm{p}}_{\rm{e}}}} \over {1 - {{\rm{p}}_{\rm{e}}}}} where P0=TP+TNTP+TN+FP+FNandPe=(TP+FN)×(TP+FP)×(FP+TN)×(FN+TN)(TP+TN+FP+FN)2 \matrix{{{P_0} = {{{\rm{TP}} + {\rm{TN}}} \over {{\rm{TP}} + {\rm{TN}} + {\rm{FP}} + {\rm{FN}}}}\;{\rm{and}}} \hfill \cr {{P_e} = {{\left( {{\rm{TP}} + {\rm{FN}}} \right) \times \left( {{\rm{TP}} + {\rm{FP}}} \right) \times \left( {{\rm{FP}} + {\rm{TN}}} \right) \times \left( {{\rm{FN}} + {\rm{TN}}} \right)} \over {{{\left( {{\rm{TP}} + {\rm{TN}} + {\rm{FP}} + {\rm{FN}}} \right)}^2}}}} \hfill \cr }

Results

In this part, we discuss the quantitative and qualitative outcomes obtained by using the proposed RF approach for season-based vegetation mapping of Sentinel-2 images. Table 3 depicts the results of all aforementioned metrics (e.g., precision, recall, F1 score, OA, and kappa) obtained by using the RF method along with other factors such as spectral indices, topographic factors, and texture information. According to the Table 3, the accuracy of OA and kappa for the proposed RF+spectral indices were 90.65% and 86.11% for summer, 90.08% and 85.27% for autumn, and 91.35% and 87.17% for winter. By adding topographic factors to the model (RF+spectral indices+topographic factors), the accuracy of OA and kappa was increased to 91.29% and 87.06% for summer, 90.60% and 86.06% for autumn, and 92.08% and 88.27% for winter. In the next step, we integrated the topographic factors and texture information into the method (e.g., RF+spectral indices+topographic factors+texture information) and found that the results were improved for each season compared to the RF+spectral indices and RF+spectral indices+topographic factors. For example, the model could achieve OA and kappa with 92.56% and 88.96% for summer, 91.64% and 87.60% for autumn, and 92.89% and 89.46%, respectively. The proposed model with different factors could achieve satisfactory results of vegetation mapping for the Greater Sydney region for each season. However, the model with all parameters could obtain better results of OA and kappa for winter with 92.89% and 89.46% compared to summer with 92.56% and 88.96%, and autumn with 91.64% and 87.60%. By contrast, the proposed model with all factors could obtain a higher accuracy of the F1 score (90.30%) for grassland mapping for summer than for autumn with 88.72% and winter with 89.78%. This might be because of high vegetation growth and overall greenness of grass areas in summer. The greenness element, which shows fluctuations in photosynthetically active vegetation, is beneficial for vegetation and can help highlight the difference between winter and summer seasons [48]. Similarly, the red-edge band (high ranked factor in the summer) has a significant association with leaves’ chlorophyll content and is considered to have contributed to the classification of vegetation types [49]. Thus, the proposed model could achieve better results for this class in summer. Moreover, for trees, the proposed RF+spectral indices+topographic factors+texture information could attain the lowest F1 score of 98.82% for summer and the highest F1 score for winter with 99.06%. In fact, the model misclassified pixels of trees as other pixels, specifically for the complex background, which leads to obtaining less accuracy for tree classification in summer. In addition, we compared the proportions of correctly categorized pixels (PCCPs) from the proposed RF method with all variables and only spectral indices using a two-proportion Z-test for each season [50]. The PCCP of RF pairings (RF+spectral indices+topographic factors+texture information and RF+spectral indices) for P < 0.05 shows that the null hypothesis is true according to the Z-test results. These are RF+spectral indices+topographic factors+ texture information for summer (P = 0.345), autumn (P = 0.339), and winter (P = 0.478). This suggests that adding more variables to the RF model could enable it to create PCCPs different from the RF+ spectral indices for each season, despite the fact that their OAs are fairly close to one another. The accuracy of the proposed RF+spectral indices+topographic factors+-texture information for vegetation cover classification was assessed using the confusion matrix, which is depicted in Figure 4. As it is clear from the figure and aforementioned quantitative results explained, the presented technique obtained better results for the confusion matrix in different seasons after adding more inputs into the model.

Quantitative results were achieved by the suggested RF method for season-based vegetation mapping.

Precision (%) Recall (%) F1 score (%) OA (%) Kappa (%)
Summer RF+spectral indices Non-vegetation 82.45 91.22 86.61 90.65 86.11
Grass 91.54 83.96 87.59
Trees 98.93 97.64 87.59
Crops 67.01 75.01 70.78
RF+ spectral indices+topographic factors Non-vegetation 82.99 91.39 86.99 91.29 87.06
Grass 92.80 84.46 88.43
Trees 99.24 97.94 98.58
Crops 67.87 78.15 72.65
RF+ spectral indices+topographic factors+texture information Non-vegetation 84.27 92.08 88.00 92.56 88.96
Grass 93.50 87.31 90.30
Trees 99.37 98.28 98.82
Crops 74.98 80.89 77.82

Autumn RF+spectral indices Non-vegetation 81.45 88.72 84.93 90.08 85.27
Grass 90.10 83.50 86.68
Trees 98.93 97.83 98.38
Crops 66.61 73.64 69.95
RF+ spectral indices+topographic factors Non-vegetation 80.70 92.30 86.11 90.60 86.06
Grass 90.33 83.98 87.04
Trees 98.96 97.85 98.40
Crops 71.69 73.46 72.57
RF+ spectral indices+topographic factors+texture information Non-vegetation 81.87 92.49 86.86 91.64 87.60
Grass 91.97 85.70 88.72
Trees 99.44 98.21 98.83
Crops 73.50 76.56 75.00

Winter RF+spectral indices Non-vegetation 81.88 91.61 86.47 91.35 87.17
Grass 90.99 84.73 87.75
Trees 99.25 98.29 98.77
Crops 73.84 77.10 75.44
RF+ spectral indices+topographic factors Non-vegetation 82.62 91.93 87.03 92.08 88.27
Grass 91.05 88.04 89.52
Trees 99.37 98.37 98.87
Crops 78.46 75.87 77.14
RF+ spectral indices+topographic factors+texture information Non-vegetation 83.36 92.57 87.72 92.89 89.46
Grass 92.78 86.96 89.78
Trees 99.50 98.63 99.06
Crops 80.16 82.97 81.54

Figure 4:

Confusion matrix used in the proposed RF model's training process: (a) with only spectral indices; and (b) all input variables. (i), (ii), and (iii) present the normalized confusion matrix for summer, autumn, and winter seasons, respectively.

Moreover, we achieved the visualization results of season-based vegetation mapping by using the proposed method with all spectral, topographic, and texture factors for summer, autumn, and winter seasons, which are shown in Figures 5–7, respectively. Each figure contains (a), (b), (c), and (d) presenting the original multi-temporal Sentinel-2 images, the qualitative results of RF+spectral indices, the qualitative results of RF+spectral indices+topographic factors, and the visualization results of RF+spectral indices+topographic factors+texture information, respectively. According to the figures, after adding topographic factors (Figures 5(c), 6(c), and 7(c)) to the RF model with only spectral indices (Figures 5(b), 6(b), and 7(b)), the visualization results improved, while the model could obtain smooth vegetation maps for each season compared to the RF model with only spectral indices (RF+spectral indices). The proposed RF+spectral indices+topographic factors+texture information (Figures 5(d), 6(d), and 7(d)) could predict a smaller number of FPs and FNs for various vegetation classes, which may result in producing high-quality vegetation maps for each season compared to the RF+spectral indices and RF+spectral indices+topographic factors. Based on the figures and visualization results, it is evident that the proposed RF model with all elements could accurately identify grasslands and crops areas and produce better maps of grass and crops in summer and winter, respectively.

Figure 5:

Visualization results of season-based vegetation mapping achieved by the proposed model for summer season: (a) original multi-temporal Sentinel-2 image, (b) results of RF+spectral indices, (c) results of RF+spectral indices+topographic factors, and (d) results of RF+spectral indices+topographic factors+texture information.

Figure 6:

Visualization results of season-based vegetation mapping achieved by the proposed model for autumn season: (a) original multi-temporal Sentinel-2 image, (b) results of RF+spectral indices, (c) results of RF+spectral indices+topographic factors, and (d) results of RF+spectral indices+topographic factors+texture information.

Figure 7:

Visualization results of season-based vegetation mapping achieved by the proposed model for winter season: (a) original multi-temporal Sentinel-2 image, (b) results of RF+spectral indices, (c) results of RF+spectral indices+topographic factors, and (d) results of RF+spectral indices+topographic factors+texture information.

Furthermore, based on the classification results achieved by the proposed Rf model with all factors (RF+spectral indices+topographic factors+texture information), we calculated the area of pixels in each class for each season in km2 (Table 4). As the Table 4 shows, we achieved the highest area for trees and grasslands, with an average area of 9,368.9730 km2 and 1,571.3156 km2, respectively, thus demonstrating that these two classes covered most parts of the Greater Sydney region. Moreover, we have shown the variable importance of season-based vegetation mapping achieved by the proposed RF model in Figure 8. Variable importance in this research refers to the mean decrease in accuracy (MDA), which illustrates the degree of accuracy the method loses by removing every variable [51]. Only the top twenty input variables were chosen for presentation out of 67 input variables for each season to make visualization easier. The topographic factors, spectral indices, and texture elements are among the top twenty most significant input variables. The aforementioned factors were critical in the separation of classes and exhibited high contribution to the classification of Sentinel-2 images and season-based vegetation mapping for the Greater Sydney region. This is also confirmed in the quantitative results, where we explained how adding the factors to the model could affect the accuracy of the metrics and improve the results.

Figure 8:

Input variable importance in season-based vegetation mapping achieved by the RF model for the Greater Sydney region: (a) for summer, (b) for autumn, and (c) for winter.

Area of pixels in each class for each season in square km.

Class Area (km2)
Summer Non-vegetation 1,364.3155
Grass 1,691.4873
Trees 9,225.3798
Crops 86.8179

Autumn Non-vegetation 1,335.9587
Grass 1,537.0343
Trees 9,440.5423
Crops 54.4652

Winter Non-vegetation 1,370.4691
Grass 1,485.4252
Trees 9,440.9971
Crops 71.1091
Discussion

The results of our study demonstrate the effectiveness of using the random forest (RF) classification model in conjunction with multi-temporal Sentinel-2 data and additional factors such as topographic factors, texture information, and spectral indices for regional-scale vegetation analysis. One important aspect of our study was the inclusion of additional factors beyond Sentinel-2 spectral bands. The incorporation of topographic factors, such as elevation, slope and aspect, allowed us to account for terrain variations, which are known to influence vegetation patterns. The inclusion of texture information enabled capturing finer details about the spatial arrangement and structure of vegetation, providing valuable information for classification. Moreover, the use of spectral indices helped extract meaningful information related to vegetation characteristics. The identification of variables that influence classification performance is crucial for understanding the underlying processes driving vegetation patterns. Our findings emphasize the significance of multi-temporal Sentinel-2 data, topographic elements, spectral indices, and textural factors in accurately mapping vegetation. The integration of these diverse data sources enables a more comprehensive characterization of the vegetation cover, contributing to improved monitoring and management of vegetation resources. We also compared the obtained results of this work with those of other studies to investigate the benefit of the proposed method for vegetation analysis of multi-temporal Sentinel-2 data. The results of the other works were derived from the original published articles; however, the given technique was developed utilizing an empirical dataset. For instance, de Colstoun et al. [52] applied the decision tree (DT) method to multi-temporal Landsat 7 images for vegetation mapping. When compared to a validation dataset collected on the ground, they could obtain an OA of 82% for the final map. Cingolani et al. [53] used Landsat data and discriminant functions to map vegetation in a heterogeneous mountain rangeland in central Argentina. The results showed that the proposed method obtained an OA of 86% when compared through field validation. Macintyre et al. [21] applied ML classifiers such as SVM, classification trees (CTs), and the nearest neighbor (NN) for vegetation mapping from multi-season Sentinel-2 images based on tasseled cap transformations (TCT), principal component (PC), vegetation indices, and spectral bands. They achieved OAs of 50%, 72%, and 74% for CT, NN, and SVM, respectively. Sharma et al. [54] performed RF and cross-validation methods for vegetation mapping in Japan based on combining multi-source datasets, such as Landsat-8 and Sentinel-2. They obtained the OA and kappa coefficient of 77% and 74% for the Sentinel-2 dataset and 86% and 84% for the Landsat-8 dataset. When they combined Sentinel-2 and Landsat-8 datasets, the classification was somewhat better (OA = 0.89, kappa coefficient = 0.87) than that when the Landsat 8 dataset was used separately. In comparison to the other experiments, our presented technique with additional input variables achieved better results with the average OA and kappa coefficient of 92.37% and 88.68%, respectively, thus confirming the efficacy of the proposed approach for vegetation mapping from multi-temporal Sentinel-2 images for large-scale areas. In our study, the developed RF model exhibited better performance than other traditional classification methods. The ensemble nature of RF, which combines multiple decision trees, allows for robust and accurate classification by mitigating overfitting and reducing the impact of outliers or noise in the data. Additionally, the RF model can automatically rank the importance of input variables, enabling the identification of key factors that drive vegetation patterns. This information is invaluable for understanding the underlying processes governing vegetation dynamics and for guiding future research and management strategies. In addition, our findings indicated that GEE is productive and efficient in accessing satellite imagery, applying machine learning algorithms, and producing vegetation maps. Unlike most commercial image-processing software, GEE does not require any special hardware. Thus, in a general sense, the open-access platform and methodology proposed in this work allow decision-makers to effectively monitor vegetation cover over time using multi-temporal Sentinel-2 data without needing to pay or download software and data. However, it is important to note that our study solely relied on images from the Sentinel-2 sensor for vegetation mapping. While this dataset provided valuable information about the vegetation cover and dynamics, the integration of other data types has the potential to further enhance the accuracy and comprehensiveness of vegetation analysis. One promising data type that can be integrated with Sentinel-2 data is synthetic aperture radar (SAR) imagery. SAR sensors emit microwave signals and measure the backscattered signals, which can penetrate cloud cover and provide valuable information about the vegetation structure, moisture content, and biomass. By combining SAR imagery with Sentinel-2 data, we can leverage the complementary strengths of both sensors to improve vegetation analysis.

Conclusion

This work generated high-quality season-based vegetation maps from multi-temporal Sentinel-2 images based on the RF model. Using cloud-based image processing technologies such as Google Earth Engine (GEE), this study provided an efficient method for mapping vegetation cover for a large-scale area, such as the Greater Sydney region of Australia. We also used additional features, such as topographic factors, textural information, and spectral indices, to improve the classification. The classification map's accuracy for each season was evaluated visually and quantitatively. In addition, we compared the results obtained by the RF model with other works to demonstrate the efficiency of the proposed methodology in producing accurate vegetation maps. The RF model with all factors achieved an OA of 92.56% for the summer, 91.64% for autumn, and 92.89% for winter, which could improve the quantitative results compared to the other studies. The visualization results confirmed that the proposed model could obtain satisfactory season-based vegetation maps. Moreover, we demonstrated the most significant input variables that affected the classification results and presented high-contribution to season-based vegetation mapping using the RF method. Results demonstrated that multi-temporal Sentinel-2 data, spectral indices, topographic factors, and texture information were critical in separating classes and were effective for vegetation mapping. The findings revealed that our method showed the ability to produce satisfactory vegetation maps in a variety of terrestrial environments. Furthermore, by using GEE, multi-temporal data, machine learning, and modern computing technologies opened the prospect of developing a timely vegetation monitoring platform.

eISSN:
1178-5608
Idioma:
Inglés
Calendario de la edición:
Volume Open
Temas de la revista:
Engineering, Introductions and Overviews, other