Historically, our perception of soil and its characteristics has required comprehensive laboratory analysis. Conventional measurement techniques aiming to assess the relationship between the physical and chemical properties of soil components often overlook their complex interaction. It is important to develop and improve the existing methods of measuring soil parameters to describe the entire soil system as accurately as possible (Viscarra Rossel et al. 2006). Spectroscopy makes it possible to deviate from the traditional techniques of laboratory measurement of soil parameters by determining the relationship between electromagnetic radiation and an object in its natural environment. Spectroscopic measurements have shown enormous potential for calibration, prediction and data modelling in soil science (Milton 1987). Historically, research has been conducted to determine the possible method of testing soils in various ranges of electromagnetic radiation. One of the beststudied is where diffuse reflectance spectroscopy (DRS) has been used, inter alia, in the midvisible and nearinfrared (MVNIR) ranges. This method enables faster, more economically and nonchemically extracted soil measurement procedures (Raupach 1991).
Visible and nearinfrared (VNIR) spectroscopy in soil research enables the simultaneous measurement of several parameters without prior laborious precise preparation of samples. In laboratory conditions, hyperspectral spectrophotometers with very high spectral resolution in the VNIR range are used to measure soil samples. However, soil properties are also estimated with a lower spectral resolution using satellite and airborne multispectral sensors. Imaging data from these sensors are recorded in only a few bands of the VNIR range and can be used to estimate the content of soil organic carbon (SOC) (Croft et al. 2012, de Paul Obade, Lal 2013) and clay (Nanni, Dematte 2006, Demattê, Fiorio 2009). However, better results can be obtained by combining satellite data with hyperspectral measurements (Peng et al. 2015). Other studies show that attempts are being made to use airborne multispectral data to improve the quality of soil maps (Wetterlind et al. 2008). The Cubist model is often used to estimate SOC. In this case, it is also advisable to use spectral indices as variables in addition to raw reflectance (Peng et al. 2015).
Unmanned aerial vehicle (UAV) – mounted multispectral VNIR sensors are very often used to observe agricultural crops in precision farming applications. However, there are also multispectral cameras that can be used for ground or laboratory imaging. An example of a multispectral data acquisition device is the agricultural digital camera (ADC). This sensor is specifically designed to capture three spectral channels that are most sensitive to changes in plant biomass, i.e. green, red and nearinfrared. This fact makes ADC suitable for estimating the size of biomass and yield (Swain et al. 2010), assessment of nitrogen content at various stages of plant development (Saberioon et al. 2012), calculation of vegetation indices (Liu et al. 2012) and even for discrimination of crop cultivars (Avola et al. 2019). ADC can also be used in field research as a part of UAV (Candiago et al. 2015, Vega et al. 2015, Matese et al. 2017).
Another approach is given by multispectral satellite sensors. Since 1972, Landsat satellites gather images that can be useful in environmental studies. For example, sensor thematic mapper (TM) onboard Landsat 5 was used to detect bare soil (Dematte et al. 2009). In 2015, the European Space Agency (ESA) begin to deliver free of cost, good spatial resolution (10 m) Earth images. Sensors onboard optical Sentinel2 satellites are equipped with 12 spectral bands, which can be useful for clay content mapping (Gasmi et al. 2022). There are other examples of clay content mapping using other multispectral sensors such as that onboard the ASTER satellite (Gasmi et al. 2019). These studies have proved that multispectral satellite sensors should be considered in soil research more often.
The usefulness of the acquired image data largely depends on the way it is processed and analysed. Many statistical methods are used to obtain reliable soil information from multispectral images, such as multiple linear regression (MLR) analysis, principal component regression (PCR) and partial least squares (PLS) regression. Application of the latter method to hyperspectral data allows to determine several soil parameters with high values of correlation coefficient and low errors, including grain size composition, pH, cation exchange capacity (CEC) or some chemical elements (Mammadov et al. 2020, Vestergaard et al. 2021). Recently, machine learning algorithms based on random forests and Cubist development models have been used to study the relationship between spectral data and soil characteristics. The Cubist model is often used to estimate SOC; in such cases, it is also advisable to use spectral indices as variables in addition to raw reflectance (Peng et al. 2015).
A proposed new approach for estimating soil parameters is to use for this purpose multispectral images obtained from ADC. The possibility of determining the condition of the soil substrate based on such data is not well researched or described. This study aimed to determine whether it would be possible to estimate soil parameters using a sensor that guarantees measurements only in three spectral channels (green, red and nearinfrared). More precisely, which soil parameters, with what method of data analysis and with what accuracy, can be estimated based on images taken with a multispectral camera in laboratory conditions.
The research was conducted within two arable fields located in Pokrzywno (Wielkopolskie Voivodeship, Poznań Poviat). This region has a temperate transitional climate characterised by a small number of frosty days and low rainfall. The average annual temperature is 8.5°C and the annual rainfall is about 500–550 mm (WIOS 2013). It is an area with unfavourable water balance, exposed to periodic droughts. Soils classified as Luvisols and Phaeozems, according to the IUSS Working Group WRB (2015), dominate this study area (Fig. 1).
A total of 151 samples were collected from both research fields. They were tested in the soil science laboratory of the Adam Mickiewicz University in Poznań. All samples were prepared for testing by drying, grinding in a ceramic mortar and sieving through a 2mm mesh sieve. The soil texture was determined by the hydrometer method according to the standard PNR04032 (Polish Committee for Standardisation PKN 1998). SOC was determined using oxidation by K_{2}Cr_{2}O_{7} with H_{2}SO_{4} for 30 min on the digestion block at 150°C and titration of oxidant residues by FeSO_{4} (Nelson, Sommers 1996). Total nitrogen was modified using the Kjeldahl method (International Standard ISO 12261 1995). The soil pH was measured in 1:1 soil solution ratio in water and 1M KCL (PNISO 10390 1997). The form of nutrients available to plants (K, Mg, Ca, Zn, Cu, Pb, Cd, Mn and Fe) was determined by the modified Mehlich 3 method (Mehlich 1984). CEC was determined by successive barium and magnesium chloride solution extraction and flame atomic absorption spectroscopy (International Standard ISO 11260 1994). Calcium carbonate content was determined twice using Scheibler volumetric (International Standard ISO 10693 2002) and titration methods (FAO 2021). As a result of the analyses, the soil particle size composition was determined, and the amount of organic carbon, nitrogen, the ratio of carbon content to nitrogen content, soil reaction, percentage of calcium carbonate, total CEC and the content of the elements potassium, magnesium, calcium, zinc, copper, lead, cadmium, manganese, iron and phosphorus was also determined.
Multispectral data of soil samples were taken in the laboratory environment using the ADC by Tetracam. The specific design with optical Bayer filter mask in the complementary metal–oxide–semiconductor (CMOS) sensor allows to obtain three images at a resolution of 2048 × 1536 px (3.2 Mpx) (Swain et al. 2010). The images correspond to three Landsat Thematic Mapper 2, 3 and 4 spectral bands: green (520–600 nm), red (630–690 nm) and nearinfrared (760–920 nm) (Lan et al. 2010) and the estimated ground pixel resolution is 0.000707 m px^{−1} (Swain et al. 2010).
For the purposes of this study, photographs of 151 soil samples were taken under laboratory conditions. The ADC was placed on a tripod at a height of 70 cm above the test object and at an angle of 90°. In addition, a 400W halogen lamp was used to illuminate the surface of the soil, which was set at a distance of 80 cm and at an angle of 45°. Then the images were processed into tiff format in the Pixel Wrench 2 program dedicated by the manufacturer. The next step was to transform the original digital numbers to reflectance using the TNT Mips software.
The evolution of remote sensing techniques caused the development of methods of evaluating remote measurements, processing and extracting as much information as possible from the collected data. Attempts to interpret the reflectance data from different available ranges of electromagnetic radiation have led to the development of a large number of indicators and their derivatives. A large group of spectral indices relate to the vegetation and soil substrate, which are calculated as the ratio of reflectance in two or more spectral channels of the selected device, sometimes with additional parameters (Bannari et al. 1995).
In this study, vegetation indices such as normalized difference vegetation index (NDVI), radar vegetation index (RVI, Bannari et al. 1995, Martínez M. 2017) and infrared percentage vegetation index (IPVI, Gunathilaka 2021) were used. All of them presented ratios between reflected radiation in red and infrared ranges. In addition, the indexes with green spectra such as IPVI (GNDVI, Candiago et al. 2015), greenred vegetation index (GRVI, Motohka et al. 2010) and modified GNDVI normal (Crippen 1990) were added to the dataset. Finally, three variants of the soiladjusted vegetation index (SAVI, Bannari et al. 1995) were calculated, considering the soil parameter in three values of 0.25, 0.5 and 0.75. Summarises the spectral indices used in the spectral data processing (Table 1).
Summary of spectral indices used in the study.
Spectral index  Abbreviation  Formula 

Normalized Difference Vegetation Index  NDVI 

Green Normalized Fifference Vegetation Index  GNDVI 

Green Normalized Fifference Vegetation Index normal  GNDVInormal 

Infrared Percentage Vegetation Index  IPVI 

Soiladjusted Vegetation Index  SAVI^{*} 

Radar Vegetation Index  RVI 

GreenRed Vegetation Index  GRVI 

SAVI25 – L=0.25, SAVI50 – L=0.50, SAVI75 – L=0.75.
Raw spectra are subject to fluctuations and noise disturbance. For this reason, methods of standardising spectra data are often used. They consist in reducing the undesirable effects in the set of spectral measurements (Gholizadeh et al. 2015). In this study, we used methods of multiplicative scatter correction (MSC), standard normal variate (SNV), conversion of reflectance data into absorbance and scaling with minimum and maximum values. One of the most popular methods of data standardisation is MSC (Rinnan et al. 2009). This method relies on adjusting to each spectral measurement an ideal reference spectrum estimated based on additive and multiplicative correction factors (Rinnan et al. 2009). Another frequently used method is the SNV, which consists of common centring and scaling by subtracting the mean values and normalising with the standard deviation for each reflection spectrum (Vestergaard et al. 2021). MSC and SNV were introduced by the Prospectr 0.2.4 package implemented in the R4.1.3 software for Windows. When working with various types of data, it can be noticed that occasionally, some measured results may significantly differ from others and thus disturb the work of the computational model (Gholizadeh et al. 2015). To eliminate this effect, the min–max scaling can be used (available in the R software in the Caret package). This type of data normalisation is based on scaling all data so that they fall in the range from zero to one. This reduces the value of the standard deviation and also the effect of outliers in the dataset.
Additionally, the spectral data were converted to the form of absorbance according to the following formula (Wenjun et al. 2014):
Regression models are used to establish the relationship between variables
Variable importance in the projection (VIP) is useful for determining which predictor variables are best explained by explanatory variables. VIP determines the variables and the extent to which they contribute to the construction of a given regression model (Chong, Jun 2005, Xu et al. 2021).
VIP values can be obtained through dedicated software.
There are many ways to determine how well an outcome estimate is guaranteed by a given regression model. One of them is to calculate the coefficient of determination
Another factor is the root mean squared error (RMSE), which informs about the difference between the values estimated in the model. RMSE takes values equal to or greater than zero, with zero being a statistically perfect match of estimated values to those observed (Peng et al. 2015):
Saeys et al. (2005) proposed to establish the criteria for the classification of the model according to the following values of
The results of laboratory analyses are presented in Table 2. The summary includes statistics on mean values, median, maximum and minimum values, and standard deviation. As shown in Table 2, mean values for all data are in the range of 0.12–1504.29 mg ∙ kg^{−1}. Calcium has the most varied values. The standard deviation for this element is 2014.76 mg ∙ kg^{−1}. Other variables with high standard deviation are phosphorus, iron, magnesium, potassium and manganese. All other parameters have second derivative (SD) values <10.00; cadmium has the lowest standard deviation value of 0.12 mg ∙ kg^{−1}.
Summary of soil laboratory analyses.
Parameter  Unit  Min  Mean  Median  Max  SD 

Clay  %  1.00  4.97  5.00  8.00  1.69 
C/N  –  6.40  14.34  11.80  63.00  9.10 
CaCO3 vol  %  0.00  0.87  0.00  12.00  2.03 
CaCO3 titr  %  0.00  1.15  0.00  13.10  2.44 
Mg  mg kg^{−1}  16.30  70.59  47.70  314.80  61.70 
Ca  mg kg^{−1}  32.60  1504.29  322.70  7424.60  2014.76 
Zn  mg kg^{−1}  4.40  9.06  8.50  27.40  3.70 
Cd  mg kg^{−1}  0.00  0.12  0.09  0.55  0.12 
P  mg kg^{−1}  23.30  176.88  153.60  275.60  63.21 
Table 3 shows a summary of reflectance data obtained by ADC and the results of all used data normalisation methods. The mean reflectance values for each spectral band are 0.47 for green, 0.41 for red and 0.81 for nearinfrared. After MSC data normalisation, the mean red and green band values were slightly changed, while the NIR value was the same in both. The SNV method changed spectral data completely with mean red and green values changed to negative. Application of min–max normalisation effected in NIR spectra became smaller than the red band value. The same was observed for the absorbance values. The standard deviation varied from 0.32 in SNV green band to 0.02 in the NIR MSC band. The MSC method had lower standard deviation values for each band.
Summary of soil spectra.
RAW  MSC  SNV  maxmin NORM  ABS  

MEAN  SD  MEAN  SD  MEAN  SD  MEAN  SD  MEAN  SD  
GREEN  0.47  0.12  0.50  0.11  −0.34  0.32  0.44  0.19  0.35  0.13 
RED  0.41  0.15  0.38  0.09  −0.74  0.17  0.52  0.23  0.43  0.21 
NIR  0.81  0.29  0.81  0.02  1.08  0.17  0.33  0.15  0.13  0.20 
Figure 2 presents the relationship between analysed soil characteristics and ADC spectral data in addition to the calculated spectral indices. Correlation values differ from −1.0 (marked as blue on the graph) to 1.0 (red). Most of the soil parameters have a strong negative correlation with spectral data. Only the percentage of sand and some chemical elements, such as Mn, Fe and P, have positive correlation values. Almost every soil parameter is correlated with some of the spectral bands or indices, with the exception of clay and zinc.
Both regression models were calculated for 21 variables describing soil parameters and 12 variables corresponding to the average reflectance in three bands of the ADC device and the spectral indices calculated on their basis. Additionally, the model was calculated each time for each variant of standardised spectral variables. The data for the Cubist model were divided randomly into a training set of 80% of all data, and a test set which received the remaining 20% of the data. For the PLS model, cross validation type was used which divides the data into segments. The number of segments was set to 10. Regression models, Cubist and PLS, were downloaded in the R software by dedicated packages Cubist, PLS. For the Cubist model, the chosen parameters were the number of committees set to 1 and the number of rules set to 3. The obtained values of the predicted soil parameters were compared with those obtained by laboratory measurements based on the values of the correlation coefficient, root mean square of errors, relative root mean square of errors, RPD and the ratio of yield to interquartile distance (RPIQ) (Table 4).
Measures of goodnessoffit for soil characteristic estimations.
Parameter  Unit  MEAN  SD  Model  preprocessing  R2  RMSE  Rel RMSE  RPD  RPIQ 

Clay  %  4.97  1.69  PLS  ABS  0.043  1 634  0.33  1.03  1.22 
C/N  –  14.34  9.01  PLS  SNV  0.018  9 100  0.63  1  0.54 
CaCO3 vol  %  0.87  2.03  Cubist  minmax Norm  0.780  0.673  0.77  3.01  0.48 
CaCO3 titr  %  1.15  2.44  Cubist  minmax Norm  0.871  0.694  0.61  3.52  0.84 
Mg  mg kg^{−1}  70.59  61.70  Cubist  SNV  0.951  23 463  0.33  2.63  1.78 
Ca  mg kg^{−1}  1504.29  2014.76  Cubist  minmax Norm  0.924  735 515  0.49  2.74  3.86 
Zn  mg kg^{−1}  9.06  3.07  PLS  SNV  0.068  3 548  0.39  1.04  1.15 
Cd  mg kg^{−1}  0.12  0.12  Cubist  minmax Norm  0.836  0.052  0.42  2.26  3.13 
P  mg kg^{−1}  176.88  63.21  Cubist  ABS  0.463  67 680  0.38  0.93  2.12 
Yi Peng et al. (2015) used the Cubist model on 328 soil samples to improve SOC modelling at the regional scale. The reference data were a combination of two satellite images and laboratory VisNIR measurements. The obtained results were
The next step was to answer the question of which soil parameters can be estimated based on multispectral data obtained with ADC and with what accuracy? For that purpose, the threshold values were established for
According to the model evaluation criteria (Saeys et al. 2005), models for sand, clay, C/N, Pb, Zn and P were considered as not suitable for prediction. The distinction between high and low values was guaranteed by models for K and F. Models for the percentage content of silt, CEC, Cu, Mn, Cd and the first variant of calculating the calcium carbonate content allow for approximate quantitative predictions. According to the given criteria, we can consider pH_{H2O}, pH_{KCL}, Mg, Ca and the second method of determining the percentage of calcium carbonate as a good model. Finally, SOC and N were considered as perfect models.
The results, shown in the form of graphs (Fig. 3), present the ratio of the obtained values to the predicted values. The
For the 12 bestestimated parameters, additional graphs (Fig. 4) were created to illustrate which variables were considered as the most important for building the regression model for each of them. The
Based on the conducted research, it can be concluded that multispectral data are sufficient to determine the condition of the soil substrate. Although only the reflection values in the green, red and nearinfrared bands were used in the study, it is possible to estimate 12 out of the 21 described soil parameters with the use of appropriate data normalisation and regression model.
Although ADC was not dedicated to soil research, it can partially replace the classic spectroscope.
Based on the VIP charts, it can be concluded that the use of spectral indices as additional explanatory variables is the correct assumption. Indicators played a large role in creating regression models for many soil parameters. Spectral indices such as GNDVI and NDVI were the most frequently used. The least frequently used indices were IPVI, SAVI25 and GNDVI normal.
The ease of use and portability of ADC makes it ideal for data acquisition in the field. For this reason, it is worth considering conducting similar studies based on images taken directly in the field. It is important to determine in what lighting conditions, at what angle of camera setting and for what types of soil it would be possible to best estimate the soil parameters.
Measures of goodnessoffit for soil characteristic estimations.
Parameter  Unit  MEAN  SD  Model  preprocessing  R2  RMSE  Rel RMSE  RPD  RPIQ 

Clay  %  4.97  1.69  PLS  ABS  0.043  1 634  0.33  1.03  1.22 
C/N  –  14.34  9.01  PLS  SNV  0.018  9 100  0.63  1  0.54 
CaCO3 vol  %  0.87  2.03  Cubist  minmax Norm  0.780  0.673  0.77  3.01  0.48 
CaCO3 titr  %  1.15  2.44  Cubist  minmax Norm  0.871  0.694  0.61  3.52  0.84 
Mg  mg kg^{−1}  70.59  61.70  Cubist  SNV  0.951  23 463  0.33  2.63  1.78 
Ca  mg kg^{−1}  1504.29  2014.76  Cubist  minmax Norm  0.924  735 515  0.49  2.74  3.86 
Zn  mg kg^{−1}  9.06  3.07  PLS  SNV  0.068  3 548  0.39  1.04  1.15 
Cd  mg kg^{−1}  0.12  0.12  Cubist  minmax Norm  0.836  0.052  0.42  2.26  3.13 
P  mg kg^{−1}  176.88  63.21  Cubist  ABS  0.463  67 680  0.38  0.93  2.12 
Summary of soil laboratory analyses.
Parameter  Unit  Min  Mean  Median  Max  SD 

Clay  %  1.00  4.97  5.00  8.00  1.69 
C/N  –  6.40  14.34  11.80  63.00  9.10 
CaCO3 vol  %  0.00  0.87  0.00  12.00  2.03 
CaCO3 titr  %  0.00  1.15  0.00  13.10  2.44 
Mg  mg kg^{−1}  16.30  70.59  47.70  314.80  61.70 
Ca  mg kg^{−1}  32.60  1504.29  322.70  7424.60  2014.76 
Zn  mg kg^{−1}  4.40  9.06  8.50  27.40  3.70 
Cd  mg kg^{−1}  0.00  0.12  0.09  0.55  0.12 
P  mg kg^{−1}  23.30  176.88  153.60  275.60  63.21 
Summary of soil spectra.
RAW  MSC  SNV  maxmin NORM  ABS  

MEAN  SD  MEAN  SD  MEAN  SD  MEAN  SD  MEAN  SD  
GREEN  0.47  0.12  0.50  0.11  −0.34  0.32  0.44  0.19  0.35  0.13 
RED  0.41  0.15  0.38  0.09  −0.74  0.17  0.52  0.23  0.43  0.21 
NIR  0.81  0.29  0.81  0.02  1.08  0.17  0.33  0.15  0.13  0.20 
Summary of spectral indices used in the study.
Spectral index  Abbreviation  Formula 

Normalized Difference Vegetation Index  NDVI 

Green Normalized Fifference Vegetation Index  GNDVI 

Green Normalized Fifference Vegetation Index normal  GNDVInormal 

Infrared Percentage Vegetation Index  IPVI 

Soiladjusted Vegetation Index  SAVI 

Radar Vegetation Index  RVI 

GreenRed Vegetation Index  GRVI 
