Accès libre

GIS-Based Land Cover Analysis and Prediction Based on Open-Source Software and Data

À propos de cet article

Citez

Introduction

Changes in land use strongly affect the Earth’s surface. Since the 19th century, their impact on the geographical environment has been enormous (Vitousek 1994), as industrialisation has completely changed the face of many regions. Furthermore, land-use changes are inevitable and, in most cases, irreversible, since they are determined by population growth and progressive urbanisation. Bielecka (2020), based on an in-depth analysis of the literature, classified three main research fields related to land use/land cover (hereinafter marked as LULC) changes. The first deals with change documentation and analysis of land cover change trajectories, the second comprises LULC change drivers and environmental impacts, and the third involves LULC forecasting. Notwithstanding the fact that land use and land cover are distinct land surface concepts, the mixing of LULC concepts has dominated research in this area since the 1970s (Fisher et al. 2005, Comber et al. 2008, Bielecka and Jenerowicz 2019). In recent decades, many studies have focussed on LULC change analysis and the development of prognostic scenarios and have found urbanisation and agricultural land loss as the main processes of land-use change worldwide (Lambin et al. 2000, 2003, Seto et al. 2011). Documentation of land cover or land-use changes in Poland is also profound, conducted at the local (Prus 2012, Wiatkowska et al. 2021), regional (Dukaczewski 2019) or national (Poławski 2009, Borowska-Stefańska et al. 2018; Kurowska et al. 2020) level. It also revealed that urbanisation and losses in agricultural areas were the main LULC change trajectories in the past decades. Prus (2012) noted that urban development changed the character of many regions from agricultural to multifunctional. This was also confirmed by Kurowska et al. (2020), who indicated that the progressive loss of agricultural and forest land occurs despite stringent protective legal regulations. Rapid urbanisation significantly affects the environment, economy and society (Kowalewski et al. 2013, Jończy et al. 2021, Kocur-Bera, Lyjak 2021), and achieving the United Nations (UN) Sustainable Development Goals (SDGs) (Bielecka et al. 2020) is a great challenge.

Significant advances in land-use change research have come with the advancement of geoinformation technology and the availability of satellite data. Remote sensing and geographical information systems (GISs) are widely used to quantify, monitor and map spatio-temporal changes in LULC. Furthermore, GIS-based models allow us to monitor and predict LULC changes based on many environmental and socioeconomic factors, such as elevation, access to water, climatic conditions, population density, migration, as well as access to the labour market and healthcare. LULC change models are either standalone tools (such as slope, land use, excluded layer, urban extent, transportation, hillshade (SLEUTH), Conversion of Land Use and its Effects (CLUE) and Dinamica Environment for Geoprocessing Objects (EGO)) or plug-ins to GIS software, such as the Modules for Land Use Change Evaluation (MOLUSCE) plug-in to Quantum GIS (QGIS). Prediction algorithms also vary greatly; some rely on highly advanced artificial and deep learning methods, Markov chains, while others rely on regression analysis. Furthermore, GIS-based LULC change and prediction models, as noted by Lambin et al. (2000, 2003), constitute integrated tools for developing various future LULC scenarios, as well as qualitative and quantitative LULC analyses. A comparative analysis of various forecasting LULC models was presented by Mas et al. (2014), Noszczyk (2018) and Jayasinghe et al. (2021).

This paper focusses on LULC prediction in the north-eastern part of Pomerania and the Tricity metropolitan area in Poland, based on open data and software. To estimate land cover in 2024, coordination of information on the environment CLC data from 2006, 2012 and 2018 (CLMS 2022) were used, while Global Human Settlement (GHS) – Population (GHS-POP 2022) data, topographic data and digital terrain elevation data were used to create the thematic layers that influenced the possibility of land-use change. The research questions clarify what, where and how big are the land cover changes that have occurred in the analysed periods. Based on the land cover analysis during the periods 2006–2012, 2012–2018 and 2018–2024, the hypothesis of steady, albeit slow, expansion of urban areas at the expense of agricultural land was verified.

Materials and methods
Research area and data used

The study covered the counties of Gdańsk (denoted as GDA) and Kartuzy (denoted as GKA) and the cities of Gdańsk, Gdynia and Sopot, known as the Tricity region (Fig. 1), with an area of 2332.03 km2 and slightly >2 million inhabitants.

Fig. 1

Study area location.

The landscape of the study area is diverse. Apart from the Tricity metropolitan area, agricultural lands and forests dominate the land-use structure. The hilly western part with natural vegetation is quite protected with a landscape park and eight protected landscape areas. In the eastern lowland areas, agriculture predominates, including intensive agricultural production at Żuławy Gdańskie. The space–time framework of the study has identified significant changes in LULC and population over the past decades, data availability and research methodology.

CLC Level 1 datasets from 2006, 2012 and 2018 were used as the primary source of data to develop a predictive land cover model (inventories 2006 and 2012) and validation (CLC 2018). CLC is a Europewide geographical, seamless, vector database of stored information on LULC and its changes in the reference years, based on computer-aided visual interpretation of satellite images (CLMS 2022). The hierarchical nomenclature includes five types of LULC at Level 1: (1) artificial surfaces, (2) agricultural land, (3) forest and semi-natural land, (4) wetlands and (5) water bodies, which are further subdivided into 15 classes (Level 2) and 44 classes at the most detailed, national Level 3 (Büttner 2014). The minimum mapping unit was set at 25 ha, while the minimum width of the patches was 100 m; the geometric and thematic accuracies were estimated at 70 m and 85%, respectively (Land Copernicus 2017). Despite many disadvantages, recapitulated by Bielecka and Jenerowicz (2019) and Gąsiorowski and Poławski (2011), CLC is commonly used in many applications regarding land-use monitoring and change analysis (e.g. Jansen and Gregorio 2002, Feranec et al. 2010, Cole et al. 2018; Kocur-Bera and Pszenny 2020). In Poland, LULC analysis based on CLC data was carried out by Mierzwiak and Calka (2019), Borowska-Stefańska et al. (2018), Pokonieczny (2018), Pabjanek and Szumacher (2017) and Łowicki and Mizgajski (2013). CLC datasets were derived from Copernicus Land Monitoring services (CLMS 2022).

The National Database of Topographic Objects (BDOT10k, GUGiK, 2022) was used to determine land-use change causative factors, such as proximity to roads, built-up areas and protected areas. BDOT10k is a seamless, nationwide vector topographic database with a level of detail corresponding to a 1:10,000 map, maintained by the Surveyor General (GUGiK 2022). It constitutes a component of the Polish spatial data infrastructure and is publicly available through the following Web services: Web Feature Services (WFS) and Web Map Services (WMS) (Izdebski et al. 2021). BDOT10k data are broadly used in land-use applications (Fiedeń 2019, Adamiak et al. 2021). Landform analysis, particularly for slope, was conducted based on Digital Terrain Elevation Model (DTED) Level 2 with arc second (approximately 30 m) spatial resolution (USGS 2022). Terrain representation is equivalent to the contour information on a 1:50,000 scale map (MIL-PRF-89020B 2004).

The multi-temporal population grid of the GHS-POP data from the year 2015 derived information of residential population distribution, representing the number of people per cell. The estimating algorithm uses Gridded Population of the World (GPW v. 4.10) and Global Human Settlement Layer (GHSL) to disaggregate census population to 250-m grid cells. GHS-POP data are available at the Joint Research Centre GHSL product website. The overview of data used is given in Table 1.

Summary of data used.

Data / format Reference years CRS Spatial resolution Data provider Explanatory variables
CORINE land Cover / vector 2006, 2012, 2018 ETRS89 / Poland CS92 25 ha Copernicus Land Monitoring Services [CLMS 2022]
BDOT10k / vector 2018 ETRS89 / Poland CS92 1:10,000 Head Office of Geodesy and Cartography [GUGiK 2022] Distance to roads, built-up area, protected areas
DTED / raster 2004 WGS84 30 m United States Geological Survey [USGS 2022] Slope, elevation
GHS POP / Grid 2015 World Mollweide 250 m Joint Research Centre (JRC), DG for Regional and Urban Policy of the European Commission [GHS-POP 2022] Population distribution

The land cover prediction was carried out using the MOLUSCE plug-in of QGIS. The plug-in was written in Python version 2, so it can be run in the QGIS 2.X platform only; for QGIS 3, the plug-in is not available. MOLUSCE code is an open source algorithm and can be found on GitHub.

Methods applied
Research hypothesis and methodology overview

Based on the to-date achievements and the land cover structure of the area in question, we hypothesised that the main land cover trajectory is the growth of urbanised areas at the expense of agricultural land. This hypothesis was confirmed by analysing past land cover during the periods 2006–2012 and 2012–2018 and the prediction during the period 2018–2024. MOLUSCE algorithm is a model based on cellular automata and artificial neural network (CA-ANN) that estimates future land cover from historical data and explanatory variables in the raster form. This model prepares a land cover transition probability matrix and land cover changes using the Markov chain approach based on CLC 2006 and 2012. Next, it trains a simulation model using an ANN, i.e. a classical multilayer perceptron (ANN-MLP) with the sigmoid function. The simulation of future land cover is based on CA. Six predictors (explanatory variables) that potentially influence land cover dynamics were set up, based on previous research (e.g. Rahman et al. 2017, Jogun et al. 2019, Alam et al. 2021) and data availability; these predictors are proximity to roads, built-up areas, location of protected areas, slope, elevation and population distribution. The predictive model that enables the simulation of land cover in 2024 was developed based on CLC 2006 and 2012 and the determinants of land cover. CLC 2018 data were used to validate the model by calculating the overall accuracy and kappa coefficient.

This research was conducted in four stages, as shown in Figure 2.

Fig. 2

Workflow scheme of land-use prediction process.

The first stage focussed on data acquisition and preprocessing, the second developed raster layers for each land-use change predictor. In the third step, simulated (2018) and predicted (2024) land cover were modelled. Finally, the fourth step focussed on model validation.

Data preprocessing

All data used in the analysis were clipped to the area of interest and reprojected into the Polish state geodetic coordinate system ETRS89/Poland CS92 and transformed into 250 m resolution raster. This maintained a uniform geometry and ultimately the reliability of the obtained results was ensured. CLC data of Level 1 were converted to 250 m raster, giving Code 1 to artificial surfaces, Code 2 to agricultural land, Code 3 to forest and semi-natural ecosystems and Codes 4 and 5 to wetlands and water bodies, respectively.

Land-use change predictor

The following predictors for land cover change were set up and used in the simulation and prediction processes: proximity to roads, built-up areas, protected area location, slope, elevation and population distribution (Fig. 3).

Fig. 3

Explanatory variables, raster data.

a) distance from roads; b) distance from buildings; c) population; d) restricted areas; e) Digital Elevation Model; f) slope.

The raster data showing proximity of paved roads and built-up areas were preceded by object buffering in five zones with distances specified in Table 2.

Buffer zones applied to roads and built-up area.

Distance to roads and built-up areas [m] Zone number
≤ 100 1
100–200 2
200–500 3
500–1000 4
> 1000 5

As the conducted analysis returned the buffers in vector form, the next step was rasterisation (cell size: 250 m).

Figures 3A and B show the proximity to roads and built-up area, respectively. The spatial distribution of the population is demonstrated in Figure 3C. Protected areas (Fig. 3D) are seen as restricted in land cover modelling as human impact is severely limited there. This means that the development of built-up areas is not allowed, and the land cover types fluctuate slightly. Figures 3E and F present the elevation (from 0 m to 300 m) and slope in 2° intervals. Both raster layers resulted from the Level 2 DTED elevation model resampled in QGIS software to a cell size of 250 m using bilinear interpolation. The Pearson correlation coefficient was used to reveal stochastic dependence between variables.

Land cover simulation and prediction

The 2006 land cover map was the initial state raster, and the 2012 land cover was the final state raster. The simulation model was built and verified for 2018, while the land cover forecast was for 2024, the reference year for the next CLC inventory. The MOLUSCE algorithm investigated the change between the initial and final states of past land cover inventories (2006 and 2012) and thereafter generated the transition matrix and changed land cover map. Based on the explanatory variables, a probability matrix was created, which is the basis of the land cover model predicted by the ANN-MLP. The parameters used were as follows:

Neighbourhood size: ‘1’ – which means that 9 cells (3 × 3 region) were taken into the analysis.

Learning rate: 0.001 – a tuning parameter in the optimisation algorithm, controlling how fast the model adapts to the problem.

Momentum: 0.001 – a component of the learning algorithm that avoids immediate changes in the weight values after changing the error gradient, making the learning more consistent. Small values of the learning rate and momentum result in a longer but more stable learning process.

Maximum number of iterations: 200.

Number of hidden layers and neurons – two hidden layers that had 32 and 4 neurons, respectively, were set.

Number of samples – 1000 randomly distributed samples.

Based on the learned neural network, the MOLUSCE tool simulated land cover changes for the year 2018. The simulation model was validated by comparing the predicted and observed (CLC 2018) land cover maps using two metrics: kappa and overall accuracy. Finally, the CA spatial filter developed a predicted land cover map for the year 2024, based on the transition probabilities and transition potential maps.

Results
Land cover transformation analysis

The predominant land cover in the study region was agricultural area, followed by forest and artificial surfaces, mainly built-up areas and transport infrastructure. In 2006, agricultural area occupied 1405.19 km2, forest area occupied 634.56 km2, while artificial area covered as much as 213.69 km2. During the period 2006–2012, there was a distinct decrease in the agricultural land area and an increase in the area occupied by artificial land (Table 3).

Statistics of land cover changes in km2.

Land cover class Land cover code 2006 2012 2018 2006–2012 2012–2018 2006–2018
Artificial surfaces 1 213.69 300.25 309.06 86.56 8.81 95.37
Agricultural areas 2 1405.19 1297.06 1289.38 −108.12 −7.69 115.81
Forest and seminatural ecosystems 3 634.56 655.19 654.12 20.62 −1.06 19.56
Wetlands 4 2.56 3.00 3.00 0.44 0.00 0.44
Water bodies 5 73.31 73.81 73.75 0.50 −0.06 0.44

Figure 4 shows that the biggest changes occurred in the classes of artificial surfaces (3.72 pp growth) and agricultural areas (4.64 pp decrease).

Fig. 4

Percentage of land cover categories on analysed area (2006 and 2012).

In 2012, artificial surfaces increased their area by more than one third as compared to the area in 2006, i.e. by >86 km2. This is evident in Figures 5A, B, and 6.

Fig. 5

CLC maps.

(a) 2006; (b) 2012; (c) 2018. LULC types: 1 – Artificial surfaces; 2 – Agricultural areas; 3 – Forest and seminatural ecosystems; 4 – Wetlands; 5 – Water bodies.

Fig. 6

Change map between land use in 2006 and 2012.

LULC types: 1 – Artificial surfaces; 2 – Agricultural areas; 3 – Forest and seminatural ecosystems; 4 – Wetlands; 5 – Water bodies.

The biggest changes are visible on the outskirts of the Tricity and Żuławy Gdańskie, where the extensive and modernised road network (mainly the A1 motorway and the S7 expressway) was accompanied by numerous built-up areas, intended mainly for services, trade and industry, hardly ever for housing.

The period 2012–2018 (Fig. 5C) was characterised by a qualitatively reduced land cover change, dominated by an increase in artificial surface, albeit 10 times lower than that of the previous period, and small losses of agricultural land, 7.69 km2 and 1.06 km2, respectively. Furthermore, a slight decrease in the forest area by 1.06 km2 (0.16% of the forest area in 2012) was discerned.

Land cover simulation and prediction

The explanatory variables used in the land cover forecast have a significant impact on the results; therefore, their statistical independence, measured by the Pearson correlation, was assumed. The values are juxtaposed in Table 4.

Correlation values between explanatory variables.

Names of variables Slope Distance to buildings Restricted areas Distance to roads Population Elevation
Slope −0.04 0.01 −0.00 −0.00 0.35
Distance to buildings 0.02 0.30 −0.40 0.21
Restricted areas 0.03 −0.02 −0.02
Distance to roads −0.30 0.10
Population −0.29
Elevation

The table reveals that there was no significant relationship between variables. The coefficient values ranged from −0.40, between population and distance to buildings, to 0.35 between slope and elevation. A negative correlation means that when the distance from buildings rises, the people count decreases, similar to the dependence between distance to roads and population (−0.30). A moderate correlation between distance to building and roads (0.30) as well as slope and elevation (0.35), indicated the upsurge of both factors. The latter showed that a steeper slope is most often accompanied by distinguishable change in the elevation of the terrain. Concluding, all the six driving factors can be used in land cover modelling. Land cover simulation model in MOLUSCE is a predictor computing the transition potential based on CLC raster data (2006 and 2012) and explanatory variables (Table 5).

Transition matrix between CLC 2006 and 2012.

Land cover Artificial Agricultural Forest, seminatural ecosystems Wetlands Water
Artificial 0.9782 0.01 0.01 0.00 0.00
Agricultural 0.06 0.92 0.02 0.00 0.00
Forest, seminatural ecosystems 0.01 0.01 0.98 0.00 0.00
Wetlands 0.00 0.17 0.22 0.61 0.00
Water 0.01 0.01 0.00 0.00 0.98

The transition matrix shows the possibility of conversion of one land cover class to another. Hence, considering the agricultural area, it is seen that the possibility of conversion to artificial surface and forest equals 6% and 2%, respectively. The highest change probability occurred in the wetlands, which could be used as agricultural land (17%) or be afforested (22%).

Based on the transition matrix, the CA tool simulated the land cover for the year 2018. The validation process evaluated the correctness of the simulation, and it totalled 99.4%; the kappa coefficient value was 0.99. Such high values mean that the neural network had been trained well enough to predict land cover changes, although there was a slight underestimation of artificial areas (8.81 km2, 2.9%) and an overestimation of agricultural land (7.69 km2; 0.6%).

Based on the simulation model, the prediction for the year 2024 is presented in Figure 7, land-use changes for 2018–2024 in Figure 8, while a statistical summary of predicted land-use changes during the period 2018–2024 is presented in Table 6.

Fig. 7

Predicted land use in 2024.

LULC types: 1 – Artificial surfaces; 2 – Agricultural areas; 3 – Forest and seminatural ecosystems; 4 – Wetlands; 5 – Water bodies.

Fig. 8

Spatial distribution of land cover changes.

LULC types: 1 – Artificial surfaces; 2 – Agricultural areas; 3 – Forest and seminatural ecosystems; 4 – Wetlands; 5 – Water bodies.

Land cover and land cover change statistics [in km2]

Land cover 2018 2024 2012–2018 2018–2024
Artificial surfaces 309.06 318.25 8.81 9.19
Agricultural areas 1289.38 1285.56 −7.69 −3.81
Forest and seminatural ecosystems 654.12 650.25 −1.06 −3.88
Wetlands 3.00 2.62 0.00 −0.38
Water bodies 73.75 72.62 −0.06 −1.12

Model outcomes predicted further growth of artificial surfaces, quantitatively amounting to 9.91 km2 (2.98%), mainly concentrated in the vicinity of the Tricity region (see Fig. 6). The urban sprawl will cause a decrease in agricultural land area from 1289.38 km2 in 2018 to 1285.56 km2 in 2024 (0.3%) and decrease in forest area of 0.58% compared to the area in 2018. The model also anticipated marginal reduction of wetland and water bodies.

The analysis of artificial surfaces during the period 2006–2024 showed an exponential increase (exponent 0.3), while agricultural area showed an exponential decline with an exponent of −0.06; R squared in both cases was 0.85. The fractional exponent of the functions indicated slow changes in both artificial and agricultural areas. This confirmed the thesis, put forward in the Introduction section, about the steady, albeit slow, increase in urbanised areas at the expense of agricultural land.

Discussion

LULC changes during the period 2006–2012, reaching 3097.41 km2, took place all over the country. As shown by Hościło and Tomaszewska (2014) and Mierzwiak and Calka (2019), the dominant LULC trajectory was afforestation of arable lands, and 18.7% of changes concerned the growth in artificial areas. Hence, the trend of changes was different than in the area in question. The size and nature of LULC changes were largely influenced by Poland’s membership in the European Union and access to EU funds. Leśniewska-Napierała et al. (2019) noted a significant impact of EU subsidies on LULC changes in Poland, which was particularly large in the Tricity and its outskirts. However, Kocur-Bera and Pszenny (2020), based on a thorough investigation, found that agricultural land conversion into urban areas resulted from economic factors. Moreover, the authors concluded that the current Polish spatial law is conducive to the development of built-up areas, which in turn creates urban chaos. This opinion was also shared by Kowalewski et al. (2013) and Jończy et al. (2021).

The accuracy of LULC change and LULC prediction is influenced by many factors, among them, the explanatory variables used. Much worldwide research has diagnosed slope, elevation, roads, water bodies, protected area, soils and precipitation as the most powerful determinants of land cover changes (Leśniewska-Napierała et al. 2019, Alam et al. 2021, Faisal et al. 2021). Our study used some of them; the soil factor was not used due to difficulty in data acquisition.

The reported accuracy of LULC prediction measured by the maximum kappa value of 0.63 was considered by many (Alam et al. 2021, Jayasinghe et al. 2021) as favourable. However, as found by Jayasinghe et al. (2021) and Faisal et al. (2021), the platform used is of utmost importance. The scientists observed that MOLUSCE outperforms other open-source models, with kappa oscillating between 0.75 and 0.88. The kappa value achieved for the CLC 2018 forecast reached 0.99. It resulted from the short (6-year) prognostic period and the detailed spatial distribution of the population in the study area.

The study used CLC as the primary data source. CLC was found to be a valuable resource for LULC study at the regional level (Kocur-Bera and Pszenny 2020). It is worth mentioning that apart from land cover classes, as indicated by the name of the data, CLC also stores land-use information; hence, both the terms ‘land cover’ and ‘land use’ are used in the literature.

The predicted changes in LULC could have a significant impact on the land assessment process conducted by the Polish Armed Forces, as, inter alia, the expansion of the forest causes difficulties in the passability of the area. The Żuławy Gdańskie region is of key importance in terms of terrain trafficability due to the fact that it is unforested and mainly flat. Furthermore, this is a low-lying region where heights above sea level do not exceed 10 m and the most part of it is depressed below sea level. A visible extension and modernisation of the road network in this region has had an enormous positive impact on the passability properties. Studies led by Dawid and Pokonieczny (2020 and 2021) showed that the presence of road networks highly influences the troops’ movement ability and that a flat terrain is less prone to the occurrence of microrelief shapes that negatively affect the passability. These factors make Żuławy Gdańskie a very convenient area for conducting military actions and crisis management operations, which is valuable information because such geographical circumstances make this area vulnerable to flooding.

Conclusions

Over the total study period from 2006 to 2018 and the predictions for 2024, the major land cover changes were categorised under three dominant classes – artificial surfaces, agricultural land and forest. A vast increase in artificial surface, slightly >40% of the class area (from 213.69 km2 to 300.25 km2), took place during the period 2006–2012. In the subsequent period (2012–2018), artificial area also increased but to a considerably less extent (8.81 km2). The 2024 prediction revealed similar growth of the artificial surfaces class (9.19 km2). This increase in the artificial terrain resulted in a decrease in other LULC types. Mainly, the greatest loss was for the agricultural land, which lost almost 120 km2 of area throughout the 18 years of analysis. In the remaining land cover classes, changes were not as significant as those in artificial surfaces and agricultural areas.

The northern part of Poland, especially the environs of Tricity, is a very fast-developing area, and the knowledge regarding how this terrain may change in the future is valuable information that could be used by different economic and planning sectors.

The MOLUSCE model can be used for evaluating the trend of changes in subsequent 6-year time periods, e.g. for the period spanning 2030–2036. The credibility of such predictions decreases with the increase in simulation iterations as there are myriads of factors that could influence the land cover changes. A good example of this possibility is the unexpected coronavirus disease–2019 (COVID-19) pandemic, which broke out in the year 2020 in Poland. It has had an enormous negative impact on the Polish economy, and consequently, it substantially slowed down the land cover changes, especially in the class of artificial surfaces.

eISSN:
2081-6383
Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Geosciences, Geography