Over past centuries, land use in Estonia has had a substantial impact on land cover. While the share of forest land has been as small as 20–25%, it has gradually increased as a consequence of land drainage, a decrease in the number of households managing small farms, and the abandonment of arable land (Kremser, 1998; Etverk, 2003; Jõgiste
Forest inventory provides the woodland owner with data required for forest management planning. Forest land is divided into patches based on the properties of soil and plant cover, enabling the forest stand on each generated unit to be characterized by mean values of inventory variables (tree species composition, forest height, wood volume, age of stand elements, etc.) with a sufficiently small variability within each unit (Krigul, 1972; Vaus, 2005). Such forest management inventories (FMIs) have been conducted in Estonia over the past 150 years, with a typical FMI repetition cycle of 10 years (Meikar, 1998). Forest land-use summaries have been published on the basis of FMI data (Aru & Okas, 1959; Aru
The FMI would give a complete overview of Estonian forests if all data were accessible. Natural though full accessibility is in the case of state-owned forests, however, it is not feasible for forests in private ownership. The records from FMI databases are updated only when the forest owner considers it necessary. To overcome these problems, many countries have a second type of forest inventory, a National Forest Inventory (NFI) based on a regularly spaced sample of field plots 7–15 m in radius, instrumentally measured (NFIEUROPE, 1997; Tomppo
Forest land surveys have used remote sensing since 1920, when half a million acres of forest land was inventoried in Canada on the basis of preliminary aerial photography (Howard, 1991). Aerial photography has been used for FMI in Estonia since 1961 when monochrome aerial photos were taken on emulsions sensitive to the visible and near-infrared part of the spectrum over Järvselja forests in southeast Estonia, as a basis for the construction of 1:10,000 forest-stand maps.
The automated processing of remote-sensing data to support the construction of maps of forest-inventory variables was proposed for the particular case of Finland by Poso
Although many countries have already incorporated multispectral satellite images and airborne laser scanning data into their NFI systems to construct maps of forest variables (McRoberts & Tomppo, 2007; Barrett
The main forest-forming tree species in Estonia are European aspen (
The first Estonian-wide NFI with a sample plot layout similar to the current NFI was launched in 1999 (Kohava, 2000). With a rather modest budget and equipment, the NFI was able to give a fairly accurate assessment of the forest area, resources, and cutting volume. The main initial objective of the NFI was the estimation of major characteristics of forest stands. At present, however, the NFI has a wider scope, reporting also such data as the distribution of land by land-use categories and the afforestation and growing stock of non-forest land. The smallest target unit for reporting is the individual county.
Methodologically, the NFI is designed as an annual research effort, which, using optimal methods, must ensure continuous updating of its assembled information, including the forest database. Since 2014, sampling has been conducted with increased frequency, on a network of sample plots covering the entire country (Figure 1). Under this procedure, in each given year approximately 375 clusters (i.e. 20%) from the entire ensemble of sample plots, is measured (Figure 2). This ensures re-measurement of permanent plots once in every 5 years. Point estimates of parameters are calculated with data from the sample plots, as a basis for inferences to the entire population.
Cluster network of the Estonian NFI (2014–2018).
Number of measured permanent (P), temporary (T), and new permanent (NP) sample plots per year.
The Estonian NFI is designed as a systematic sample without pre-stratification. The sampling grid is established to meet accuracy requirements at the national level. About 5,500 sample plots are measured per year, with the sampling intensity the same throughout the country. The sample (cluster) distribution is based on the national 5×5 km quadrangle grid, determined by the Estonian base-map coordinates system (EPSG:3301).
The observation unit is the individual field plot, determined by its centre coordinates. The method of sampling with partial replacement is used. The sample plot area is subdivided if its area overlaps with different types of forest stands or parcels with different land cover. To enhance the efficiency of the survey, sample plots are concentrated into clusters, defined as 800×800 m squares (Figure 3). Some sample-plot clusters are deemed permanent, others temporary. The radius of a given sample plot depends on the variables selected for assessment, and additionally on variable values (with, e.g., smaller sample plots used for lower or regrowth layers). The radii of the principal sample plots are 10 m and 7 m. For land-use category determination, plots of other radii are taken, although without modification in the scheme for selecting centre coordinates. For the construction of models for stand height and wood volume, we used the sample plots in which a tree layer was present and which were located entirely inside a forest stand.
Estonian NFI cluster design and sample plot arrangement.
The remote-sensing support system of the Estonian NFI is based on data from the European Union Copernicus programme, from the NASA/USGS Landsat programme, and from the airborne photography and laser scanning programme of the Estonian Land Board. The medium spatial resolution (10–30 m) multispectral images from Sentinel-2 MSI (ESA, 2015) and Land-sat-8 OLI (USGS, 2019) sensors and SAR images from Sentinel-1 (ESA, 2020) are used for the prediction of tree species composition and for the detection of changes. The data are available from the regional data centre ESTHub (2016).
Aerial photography and airborne laser scanning are conducted by the Estonian Land Board under a repetition schema that provides, for any given location, either summer or springtime data in each second year, and additionally produces measurements from each similar growth season after every four years (Figure 4) (Maa-amet, 2020). The point density of the archived ALS data ranges from 0.15 m−2 to 2 m−2. The ALS pulse footprint diameter at canopy level is about 0.5 m, with a scanning angle that does not exceed 30° from nadir. Data are distributed according to a 1 km2 map sheet system (Maa-amet, 2019). Before the year 2017, the Estonian Land Board used the Leica ALS50-II airborne scanning system. From 2017 onward, measurements have instead been taken with the Riegl VQ-1560i. In the Estonian NFI, orthophotos (in the RGB + NIR bands) are currently used only for visual interpretation, in the estimation of land cover type during the preparation of fieldwork. In the remote-sensing module offered in this paper for the Estonian NFI, ALS data are used for the prediction of forest height and wood volume. Ancillary data sources for our offered module are a 1:10,000 base map and soil map of Estonia, the FMI database with its stand-level forest-management inventory data, and a digital terrain module (DTM) provided by the Estonian Land Board.
Location of target areas for fitting forest height and standing-wood volume models, corresponding to the Estonian Land Board ALS measurement schema. The areas are partially overlapping.
The first two variables to be predicted are basal-area-weighted forest height (Lorey's mean height)
The parameters for height and wood volume models are estimated using NFI sample-plot field measurement data for each particular flight campaign, with each campaign covering roughly one-quarter of Estonia's area (Figure 4). No correction is currently made for springtime phenology and the consequent increased gap fraction in the canopy, and tree species are not currently distinguished for
Prediction of tree species composition for target units with a size of 10–30 m is a challenging task, since most forests in Estonia are of mixed composition, since the spectral signatures of broadleaf deciduous trees are quite similar to each other, and since variables that are related to stand age and structure have a substantial influence on forest spectral signature (Nilson & Peterson, 1994) as calculated from Land-sat-8 OLI and Sentinel-2 MSI images. The proposed solution for the Estonian NFI is based on machine learning, using a large number of samples from the FMI database and data from a 1:10,000 soil map. Lang
The machine-learning procedure starts with the selection of a potential training set of stands from the FMI database. The next step after the cloud masking of image data is the removal of outliers, based on relationships of spectral radiance with forest age and wood volume in red, near infrared, and shortwave infrared bands (Lang
The model fitting on empirical data starts with the selection of informative features. The next step is model hyper-parameter optimization. Finally, a prediction map is constructed. Although the variable being predicted is the dominant species code, additional useful information is obtained from the entire vector of class probabilities (Lang
For the construction of a tree species composition map, we used Landsat-8 OLI and Sentinel-2 images from April 2018 to September 2018. The images were resampled to 25 m spatial resolution and converted to the EPSG:3301 coordinate system. The methods for image processing and data analysis resemble those recently described by Lang
The data processing system is implemented with existing software, but with the addition of a calling script written in Python. Each task (e.g. the prediction of forest height, of wood volume, or of tree species composition; or the detection of changes; or the appraisal of predictions) is structured as a sequence of specific steps (e.g. a preparation phase, the extraction of features from remote sensing data, the fitting of a model, the application of the selected model, and validation). The user edits the script template of a specific step, inserts proper parameter values, and executes the script. Each processing execution is assigned a unique ID, which is then used in a file-storage folder structure. The scripts produce an execution history log as a directed acyclic graph (DAG). Version control, the generated data-folder structure, and the generated DAG jointly ensure traceability and repeatability of the data-processing steps. In considering alternatives to this scheme, we evaluated some formal workflow tools (Airflow, Luigi, Jenkins), but found that they failed to add sufficient value. The principal software tools used in our scheme are QGIS, GRASS GIS, Python, GDAL, ESA SNAP, the Orfeo Toolbox (OTB), LAStools, FUSION, R, RStudio, and Git.
During execution, all intermediate results are stored in the local computer. In addition, the intermediate and final results are copied to a backup file server. Intermediate results are stored for 2 years (this being the longest interval between successive measurement updates). Final results are stored for some longer period, as decided by the system administrator. The hardware configuration comprises an 8-core workstation CPU, 24 GB of RAM, and 6 TB of workstation disk storage, with additionally 2.3 TB of backup storage. All hardware is virtualized.
For characterization of forest height and wood volume models, the mean residual squared error
For the assessment of forest height and wood volume predictions, we used a set of forest stands from the overlap area of the NE and NW blocks, and additionally a set of forest stands from the overlap area of the SE and SW blocks (Figure 4). We applied the following selection criteria: the stand polygon area was required to lie in the range of 1.2–8.0 ha; the count of 10 m pixels within the stand polygon was required to be > 100; the number of pixels within the stand polygon without value (i.e. with NoData label) was required to be < 10 in the corresponding comparison maps; and the proportion of evergreen tree species in the stand was required to be either ≥ 75% or ≤ 25%. The last of these filters (the disjunctive criterion regarding the proportion of evergreens) was applied to select contrasting sets of stands according to tree species. On the overlap area of NW and NE blocks were 1,157 evergreen and 605 deciduous stands. On the overlap area of SW and SE blocks were 644 evergreen and 1,275 deciduous stands. For each stand, the mean value of pixels was calculated from forest-height and wood-volume prediction maps.
The tree species composition was validated on NFI sample plots, because we used FMI data for the prediction model, and the NFI and FMI datasets are independent. Two selection criteria were used: it was required that the sample plot be described as forest in the NFI, and the probability of the plot being non-forest was required to be < 25% according to predicted pixel values in the tree species map. This pair of filters helped to remove disturbed areas and validation points in which the spectral signature is mainly influenced by objects other than forest trees. We analyzed the prediction for dominant tree species and the prediction for the proportion of evergreen coniferous trees in the species composition.
The forest-height prediction models described 89.5–94.8% of the variability in the empirical data (Table 1). The residual standard error of the models remained below 2.4 m. The comparison of predicted forest height on ALS dataset overlap areas revealed a small systematic difference, dependent on the dominant species (Figure 5). With models fitted for each ALS data-set individually, the predicted values for forests dominated by broadleaf deciduous trees tended to be greater when a midsummer ALS dataset was used. For evergreen coniferous trees, with the midsummer ALS dataset, the species-independent prediction model yielded an indication of underestimation (Table 4).
Predicted forest height on the overlap area of ALS data: (a) blocks SW and SE, (b) blocks NW and NE. The blocks correspond to lidar data from summer (SU) and springtime (SP).
Parameters for the forest-height
ALS flight | Block | Model parameters* / | ||||
---|---|---|---|---|---|---|
RSE (dm) | R2 % | DF | ||||
SP 2017 | SW | 9.05 | 11.86 | 24 | 89.5 | 292 |
SU 2017 | SE | 6.09 | 11.58 | 21 | 94.7 | 281 |
SP 2018 | NE | 13.80 | 11.47 | 17 | 94.8 | 313 |
SU 2018 | NW | 12.07 | 21 | 91.8 | 312 |
Height percentiles are in metres. /
The fitted models for standing-wood volume prediction described 84.2–91.7% of the variability in the empirical data (Table 2, Figure 6). The residual standard error of the models remained in the range of 66–97 m3 ha−1. Model (2) is constructed with regard to the prevalence of multi-layer canopies in Estonia. In the current exploratory study, on the other hand, the variable predicted is wood volume for the dominant tree layer. We consider this to be the reason why the parameter
Parameters for the standing-wood volume
ALS flight | Model parameters* / | ||||||
---|---|---|---|---|---|---|---|
RSE (m3 ha−1) | R2 | DF | |||||
SP 2017 | 1.438 | 1.264 | 0.416 | 78.6 | 84.2 | 287 | |
SU 2017 | 1.452 | 0.861 | 97.0 | 85.2 | 276 | ||
SP 2018 | 1.277 | 1.233 | 0.454 | 65.7 | 91.7 | 308 | |
SU 2018 | 1.492 | 0.598 | 76.3 | 85.1 | 304 |
Height percentiles are in metres, and canopy cover values within the range of 0–100. /
Measured and predicted wood volume for NFI sample plots. Dominant species codes: HB = European aspen; KS = silver birch; KU = Norway spruce; LM = black alder; LV = grey alder; MA = Scots pine; XK = other.
Standing-wood volume
ALS data | Target area | Variable* | Dominant species**/ | ||||||
---|---|---|---|---|---|---|---|---|---|
HB | KS | KU | LM | LV | MA | XK | |||
SP 2017 | SW | 190 | 161 | 179 | 263 | 152 | 250 | 233 | |
SP 2017 | SW | RSE | 77 | 67 | 70 | 116 | 96 | 83 | 81 |
SP 2017 | SW | MRE | −37 | 11 | 18 | −45 | 25 | −1 | −59 |
SP 2017 | SW | N | 22 | 93 | 43 | 13 | 7 | 106 | 7 |
SU 2017 | SE | 265 | 190 | 287 | 331 | 199 | 302 | 87 | |
SU 2017 | SE | RSE | 102 | 92 | 106 | 90 | 76 | 98 | 73 |
SU 2017 | SE | MRE | −3 | 37 | −13 | −51 | 22 | −26 | 57 |
SU 2017 | SE | N | 18 | 83 | 57 | 8 | 18 | 92 | 4 |
SP 2018 | NE | 157 | 197 | 235 | 188 | 145 | 237 | 166 | |
SP 2018 | NE | RSE | 95 | 60 | 73 | 78 | 53 | 56 | 30 |
SP 2018 | NE | MRE | −37 | −3 | 6 | −26 | −9 | 10 | −26 |
SP 2018 | NE | N | 19 | 59 | 66 | 10 | 31 | 120 | 3 |
SU 2018 | NW | 239 | 162 | 248 | 314 | 158 | 211 | 235 | |
SU 2018 | NW | RSE | 104 | 59 | 91 | 125 | 59 | 72 | 144 |
SU 2018 | NW | MRE | 54 | 31 | −29 | −17 | 12 | −24 | 144 |
SU 2018 | NW | N | 9 | 99 | 54 | 16 | 36 | 93 | 1 |
Species codes: HB = European aspen; KS = silver birch; KU = Norway spruce; LM = black alder; LV = grey alder; MA = Scots pine; XK = other. / *
The wood volume prediction for evergreen stands was greater when estimated from springtime ALS maps (Table 4, Figure 7). The opposite was observed for deciduous stands. The apparent change in predicted values for evergreen forests is the result of using a prediction model that does not account for a change in the leaf area index. A second factor may also be relevant: canopy cover estimates based on Riegl VQ-1650i laser-scanner data may be expected to be sensitive to the contribution from evergreen coniferous and deciduous broadleaf trees due to their morphological differences in the shoot and crown structure.
The average difference between predicted forest height
Block SP | Block SU | ||||
---|---|---|---|---|---|
EGR | DEC | EGR | DEC | ||
SW | SE | 0.53 | −0.90 | 34.5 | −46.5 |
NE | NW | 0.13 | −0.71 | 40.4 | −38.0 |
Predicted standing-wood volume on the overlap area of ALS data: (a) blocks SW and SE, (b) blocks NW and NE. The blocks correspond to lidar data from summer (SU) and springtime (SP).
The tree species composition prediction was first compared to a dataset published by Lang
Predicted proportion (%) of evergreen conifers for 5,011 NFI sample plots with stands older than 24 years.
Predicted dominant species and field observations on 6,239 NFI sample plots.
NFI / | Predicted dominant species* / | ||||||
---|---|---|---|---|---|---|---|
KS | KU | MA | LV | LM | HB | XK | |
KS | 1,110 | 186 | 114 | 153 | 137 | 106 | 16 |
KU | 201 | 694 | 162 | 47 | 12 | 37 | 6 |
MA | 173 | 211 | 1,532 | 4 | 8 | 15 | 2 |
LV | 93 | 25 | 3 | 334 | 14 | 23 | 7 |
LM | 67 | 10 | 7 | 25 | 123 | 20 | 4 |
HB | 81 | 34 | 5 | 65 | 23 | 108 | 13 |
XK | 47 | 13 | 15 | 72 | 30 | 35 | 17 |
Species codes: HB = European aspen; KS = silver birch; KU = Norway spruce; LM = black alder; LV = grey alder; MA = Scots pine; XK = other. /
The NFI is based on contact measurements of trees on a set of small sample plots. For spatial units smaller than the individual county, feature variables from remote-sensing data can be used to construct maps of forest-inventory variables. As in the case of many other countries, Estonia has a national programme of springtime aerial photography and ALS for topographic mapping, with special flights additionally performed at higher altitudes in the summertime for forest-inventory purposes.
To construct maps of forest height based on the 3-dimensional ALS point clouds and NFI sample-plot measurements, it suffices to apply a linear model that takes as argument an upper percentile of the point-cloud height distribution. Upon taking a model independent of tree species and comparing forest height predictions based on springtime against predictions based on summer ALS data, we found a slight bias in the predicted forest height, dependent on the proportion of evergreen coniferous species in the subject forests. The difference proved more pronounced in deciduous forests, where greater heights were predicted when summer ALS data were used.
Lidar pulse returns are rarely recorded from tree stems during airborne measurements. The prediction of standing-wood volume therefore requires a model that accounts for tree height and forest density (defined as the number of trees per unit area). With detection of single trees not reliable from sparse point clouds with about 1 point per m2, canopy cover can be taken as a proxy for forest density. However, the interaction of laser pulse with forest canopy, including the degree of pulse penetration to the ground, is substantially dependent on the amount of foliage, in addition to suffering dependence on flight parameters and scanner settings. There are two options for incorporating canopy cover information into the standing-wood volume model. The first option is to use a variable that predicts canopy cover. The second option is to calculate forest height metrics from point clouds which include near-to-ground points. The former option has been adopted for Estonian forests, while jurisdictions adopting the second option include Finland (Kotivuori
A bias similar to the result encountered with evergreen coniferous-dominated forest height prediction was found in the standing-wood volume prediction, with the common model yielding an smaller by about 40 m3 ha−1 for summer than for springtime ALS data. On the other hand, a systematically greater, with a discrepancy of the same order of magnitude, was found for forests dominated by deciduous trees for summer as compared against springtime ALS data. The problem could be addressed by dividing the empirical data into subsets according to the dominant tree species and fitting model parameters for each subset. However, in this approach the number of observations decreases and it becomes necessary to know the tree species distribution with high accuracy before the model can be applied. The number of observations for model fitting can be increased by including sample-plot measurements from several years. If information about forest disturbances is reliable, then forest height or wood-volume data for the sample plots measured a few years before the collection of ALS data can be updated with a forest growth model.
Tree species composition prediction was tested in this study for 25 m target pixels. Although the uncertainty in species was substantial at the pixel level, the proportion of aggregated evergreen coniferous species in the composition was predicted reliably, and therefore could in future versions of the system be used for correcting phenology effects.
The NFI is intended to provide timely forest data for the whole country and to furnish published maps of the main variables, namely forest height, standing-wood volume, tree species composition, forest age, basal area, relative density, and location of disturbances. However, even for a small country, such as Estonia it is not economically feasible to carry out ALS measurements every year for the whole area, making it necessary in the case of many Estonian locations to update the data by other means. With a geographically comprehensive prediction of tree species composition, it becomes possible to use forest growth modelling to update forest height and wood-volume predictions for those areas lacking current-year ALS measurements. The current Estonian Land Board flight schedule makes it necessary to apply the modelling over a period of up to two years. For the determination of forest height, an algebraic difference model is used to simulate the growth of Estonian forests (Kiviste, 1997):
The remaining two variables to be predicted for forest management planning are stand basal area
As an essential part of our remote-sensing-based prediction system for forest variables, we designed a module for the estimation of prediction errors. This error-budget module is based on the
In this paper, we have presented a fully functional remote-sensing support module for the Estonian NFI, supporting the construction of maps of forest height, standing-wood volume, and tree species composition. Our study has, however, revealed the need for correction of phenology effects in the ALS data. This can be achieved by increasing the accuracy of tree species composition estimates, using longer time series of multispectral satellite data and dense time series of Sentinel-1 SAR data as proposed by Dostálová
When the system becomes fully operational, it will be possible to determine also forest age at high accuracy, by using time series of multi-temporal satellite images for the detection of forest regeneration fellings (Peterson
The remote-sensing support module offered in this paper for the Estonian NFI accepts open-source data. Our models for sparse ALS point clouds yield coefficients of determination in the ranges of 89.5–94.8% for height and 84.2–91.7% for wood volume. However, validation of the common model prediction results has revealed systematic bias, upon comparing predictions from summer against predictions from springtime ALS data. The bias problem could be addressed by including the share of evergreen tree species in the model for springtime ALS data. At the pixel level, the prediction of dominant tree species, from a set of six possible options, has been found to give a Cohen's kappa value of 95% for the confidence interval 0.51–0.54 when validated on small NFI sample plots, with the determination of the share of aggregated evergreen tree species in forests found to be reliable. Further studies are required to better account for phenology effects influencing ALS data and to increase the precision of tree species composition predictions. Already, however, our proposed solution indicates a path forward for Estonian forestry, offering the prospect of annually updated forest-inventory data for all Estonian forest at 10–30 m spatial resolution, to support sustainable forest-management planning.