The intensified human activities have accelerated the rate of cultural eutrophication of water bodies worldwide (Najar & Khan 2012), resulting in cyanobacterial blooms, reduced oxygen and eventually fish kills (Anita & Pooja 2013). The effects of these activities on water bodies can be estimated by the degree of eutrophication. The eutrophication level is often determined by the chlorophyll-
Chl-
Statistical methods are widely used in eutrophication studies to explore the interactions between water quality parameters and primary production (Irvine et al. 2015). However, it is difficult to study eutrophication in surface waters when there are a large number of variables. In these cases, researchers face a major difficulty regarding the choice of analytical parameters due to the multicollinearity (Johnson et al. 1982).
One way to overcome this problem is to use principal component analysis (PCA) followed by multiple linear regression (MLR) (Çamdevirena et al. 2005). The process is called principal component regression, PCR, where the principal components (PCs) of PCA are used in MLR. This process reduces the complexity of the multidimensional system by maximizing the variance of component loadings and by eliminating the invalid components (Zhang et al. 2013).
The PCR method has recently been used in various studies to identify factors driving the eutrophication of aquatic systems (Praveena et al. 2011). PCR provides information on the most meaningful variables by reducing the number of latent factors contributing to eutrophication, MLR examines the relationships between a dependent variable and a set of independent variables.
The aim of this study is to apply PCR to data from two temperate freshwater reservoirs with different trophic states. Studies using PCR methodology from this region have so far used data sets either from a single lentic (Çamdevirena et al. 2005) or lotic system (Köklü et al. 2010). The presented work also aims at determining whether the results of PCR application to a eutrophic reservoir are different from those for a mesotrophic reservoir. For this purpose, 17 physicochemical and biological water quality parameters from eutrophic temperate Çaygören and mesotrophic Ikizcetepeler reservoirs were used in PCA, and then the PCs of PCA were used in MLR to predict chl-
In Turkey, the most important problem of agriculture is the lack of water for irrigation during summer, because it has a Mediterranean climate that is extremely dry in summer. One of the effective solutions to this problem is the construction of dams on the lotic waters. The Çaygören Reservoir was built in 1971 with the objective to irrigate the Sındırgı and Bigadiç plains. It is also used for power generation. The land cover around the reservoir is mostly black pine (
The Çaygören Reservoir is located at 39°172’N, 28°191’E, 55 km southeast of Balıkesir, Turkey (Fig. 1). It lies at 273 m above sea level. It has a maximum depth of 28 m, a length of 4.6 km and a surface area of 9 km2. The Simav Stream feeds the reservoir (State Water Works 2017).
Çaygören Reservoir and the location of sampling stationsFigure 1
The Ikizcetepeler Reservoir was built in 1991 for irrigation and drinking water. It is mainly used as a drinking water source of the city of Balıkesir. It is used for irrigation of Pamukçu Plain and for sport fishing. Therefore, it is socioeconomically important for the region (Okkan & Karakan 2016).
The Ikizcetepeler Reservoir is located at 39°482’N, 27°939’E, 15 km southwest of Balıkesir, Turkey (Fig. 2). It lies at 175 m above sea level. It has a maximum depth of 25 m, a surface area of 10 km2 and a length of 6.34 km. The reservoir is mainly fed by the Kile Stream (State Water Works 2017).
Ikizcetepeler Reservoir and the location of sampling stationsFigure 2
Three sampling stations were set in each reservoir. The first station was set near the main inlet of each reservoir, the second station – between the first station and the dam, and the third station – near the dam. Sampling was initiated in February 2007 and ended in January 2009.
Chl-
Chl-
TSS were determined by filtering a known volume of water through Whatman 934-AH filters that were pre-rinsed and dried (105°C) and then tared (Okkan & Karakan 2016). PO4, NO2–N, NO3–N, NH4–N concentrations were determined spectrophotometrically on filtered water. The COD was measured by the Open Reflux Method. BOD was determined by Wrinkle’s Azide Modification Titrimetric Method. Secchi disk depth was measured on each sampling date.
Statistical analyses were carried out using the SPSS (ver. 11.0) software package. Data were log-transformed prior to statistical analysis to meet the requirements of normality for parametric tests. The Kolmogorov-Smirnov normality test was applied to all variables. PCA was performed on 17 water quality parameters to describe their interrelation patterns.
The Kaiser-Meyer-Olkin (KMO) measure of sample adequacy and Bartlett’s test of sphericity {with degrees of freedom = 1/2 [p (p – 1)], where p is the number of variables}, were used to verify the applicability of PCA. PC scores of 17 water quality parameters were used as independent variables in MLR to predict the chl-
The stepwise variable selection procedure of MLR was used to identify the best predictors of chl-
The score value (skj) for the jth observation in the kth PC was obtained from the weight of variables in PCs and standardized variables by using the following equation:
where j = 1, 2, …, n is the number of observations,
k = 1, 2, …, q is the number of selected PCs, p is the number of independent variables, skj is the standardized score value of the jth observation in the kth PC, tpk is the standardized weight of the pth variable in the kth PC, zpj is the standardized value of the pth variable of the jth observation, calculated from
where xp is the original value of the pth variable.
X2 calculated as 702.7 for the Çaygören Reservoir and 1223 for the Ikizcetepeler Reservoir by Barttlett’s sphericity test (d.f. = 136,
Residual statistics (n = 17) for Çaygören Reservoir
Mean
Standard deviation
Minimum
Maximum
Predicted value
1.26
0.66
0.28
2.52
S.E. of predicted value
0.40
0.11
–1.47
1.89
Residual
0.00
1.176
–2.35
2.93
Standard residual
0.00
0.97
–1.91
2.41
Cook’s distance
0.06
0.07
0.01
1.61
Residual statistics (n = 17) for Ikizcetepeler Reservoir
Mean
Standard deviation
Minimum
Maximum
Predicted value
1.11
0.53
–0.49
1.81
S.E. of predicted value
0.19
0.08
0.15
0.50
Residual
0.00
0.59
–0.70
1.80
Standard residual
0.00
0.96
–1.15
2.96
Cook’s distance
0.08
0.16
0.00
0.57
In both reservoirs, six out of 17 PCs were selected for MLR, which was called the first approach (Tables 3 and 4). In the first approach, the selected PCs explained 71% of the total variation in the Çaygören Reservoir and 75% in the Ikizcetepeler Reservoir (Tables 5 and 6).
Descriptive statistics of PCs for Çaygören Reservoir
Principle component
Eigenvalue
Cumulative variance (%)
PC1
4.0
23.8
PC2
2.2
36.8
PC3
1.8
47.7
PC4
1.3
55.8
PC5
1.2
63.4
PC6
1.0
69.4
PC7
0.9
75.1
PC8
0.8
79.7
PC9
0.7
83.7
PC10
0.6
87.4
PC11
0.5
90.3
PC12
0.4
92.6
PC13
0.3
94.6
PC14
0.3
96.3
PC15
0.3
97.8
PC16
0.2
99.1
PC17
0.15
100
Descriptive statistics of PCs for Ikizcetepeler Reservoir
Principle component
Eigenvalue
Cumulative variance (%)
PC1
3.9
23.2
PC2
2.5
38.0
PC3
1.9
49.1
PC4
1.6
58.6
PC5
1.4
66.7
PC6
1.2
73.8
PC7
0.9
79.1
PC8
0.8
83.7
PC9
0.7
87.6
PC10
0.5
90.5
PC11
0.4
93.1
PC12
0.3
94.9
PC13
0.3
96.5
PC14
0.3
98.1
PC15
0.2
99.3
PC16
0.1
99.9
PC17
0.01
100
Results of regression analysis (n = 17) with six PCs for Çaygören Reservoir
Included independent variables
Regression coefficient (bk)
Standard error of bk
Standardized regression coefficient
t
P
R2 (%)
Constant
1.2
0.34
3.73
0.002
71
Score 3
1.32
0.75
0.4
2.56
0.05
Results of regression analysis (n = 17) with six PCs for Ikizcetepeler Reservoir
Included independent variables
Regression coefficient (bk) Standard error of bk
Standardized regression coefficient
t P
R2 (%)
Constant
1.01 0.18
5.55 0.00
75
Score 1
1.90 0.89
0.501
2.95 0.048
PCA component loadings of the first six PCs for the Çaygören Reservoir are presented in Table 7. The bold-marked loads indicate the highest existing correlation between the variables and the corresponding component. The high values of communalities indicate that the variance was efficiently reflected in the regression analysis.
Results of principal component analysis for Çaygören Reservoir
Variables in
Standardized weight of variables in selected PC (tik; i = 1, 2, …, 16 and k = 1, 2, 3, 4, 5)
Loading of variables (vik)
Communalities
1
2
3
4
5
6
1
2
3
4
5
6
pH
–0.49
–0.06
–0.45
4.88
–1.61
–0.74
0.03
–0.05
0.04
–0.05
0.01
0.73
SO4
2.21
–3.48
3.37
–2.60
–0.93
2.45
–0.12
0.04
0.32
–0.15
–0.02
0.25
0.73
ORP
1.21
–1.93
–0.44
0.91
1.27
1.80
0.07
–0.28
–0.02
0.05
0.07
0.16
0.49
Secchi
–1.11
–0.88
–2.46
0.62
4.53
–0.42
0.03
0.02
–0.08
0.19
0.11
0.75
Temp
4.51
1.61
–2.32
–0.89
0.89
–0.71
0.29
–0.34
–0.08
–0.03
0.09
0.01
0.85
Chl
–0.17
2.74
0.03
2.81
–4.51
0.97
–0.04
0.06
–0.02
0.29
0.08
0.78
TN
–1.65
3.43
1.01
–1.37
–1.86
–0.64
–0.04
0.36
0.12
–0.02
–0.05
0.02
0.71
NO3
–2.21
1.12
–2.65
3.01
0.45
–1.26
–0.05
0.34
–0.09
0.36
0.16
0.02
0.61
NO2
–4.48
6.69
–0.69
2.42
0.95
0.21
0.04
–0.09
0.09
0.01
–0.03
0.85
NH4
–1.95
0.97
–0.82
0.34
–0.19
–1.99
–0.18
–0.03
0.05
0.04
–0.11
0.78
TP
0.21
–0.24
2.62
1.07
–0.31
–4.57
–0.04
0.02
0.12
0.03
–0.08
0.61
PO4
–0.06
–0.03
4.59
–1.92
–1.41
–1.80
0.06
0.04
–0.06
–0.06
–0.05
0.59
Cond
2.13
–1.42
–2.28
–3.55
1.38
2.37
0.12
–0.02
–0.17
–0.23
0.07
0.13
0.68
TSS
–1.37
–3.11
4.01
1.65
–2.42
0.52
0.02
0.01
0.18
–0.03
0.12
0.45
BOD
3.19
–4.30
2.25
–2.31
0.54
–0.63
–0.07
0.25
–0.01
0.16
0.09
0.81
TDS
–0.73
–1.08
–0.04
–0.97
–1.29
4.73
0.05
–0.14
0.13
0.02
–0.09
0.71
COD
3.42
–1.72
0.15
1.21
–3.67
0.59
0.25
–0.18
0.01
0.09
–0.25
0.05
0.72
All 17 variables were included in the six selected PCs. However, only certain variables had significant loads within each PC, such as NO2–N had the most significant loads in PC1, NH4–N in PC2, PO4 and TSS in PC3, pH in PC4, chl-
The component loadings of the first six PCs for the Ikizcetepeler Reservoir are given in Table 8. In this reservoir, TSS and NO2–N had the most significant loads in PC3, COD and SO4 in PC4, TDS and pH in PC5 and chl-
Results of principal component analysis for Ikizcetepeler Reservoir
Variables in
Standardized weight of variables in selected PC (tik; i = 1, 2, …, 16 and k = 1, 2, 3, 4, 5)
Loading of variables (vik)
Communalities
1
2
3
4
5
6
1
2
3
4
5
6
pH
–4.36
7.67
3.30
3.93
4.09
4.98
0.02
0.24
0.11
0.12
0.13
0.69
SO4
–5.16
–4.87
14.36
16.21
3.01
4.71
–0.01
–0.09
0.28
0.00
0.06
0.63
ORP
–1.26
–1.26
1.06
–2.48
1.97
–0.22
–0.13
–0.11
0.09
–0.22
0.07
0.17
0.57
Secchi
–0.08
–0.93
–1.92
2.23
–3.77
–0.22
–0.03
–0.09
–0.19
0.22
0.74
Temp
3.55
1.86
–3.35
3.09
0.89
0.85
0.08
–0.15
0.14
–0.09
0.04
0.86
Chl
0.33
0.34
0.42
–3.05
4.21
0.42
0.08
0.03
0.04
–0.
–0.01
0.
0.74
TN
3.41
–2.59
0.83
–0.81
0.05
0.019
0.19
–0.14
0.04
–0.04
–0.02
0.03
0.73
NO3
2.49
–4.29
5.07
–1.13
1.41
2.39
0.17
–0.15
0.18
–0.04
0.31
0.05
0.93
NO2
–1.84
–0.92
7.68
4.44
2.15
0.53
–0.05
–0.03
0.31
0.17
0.08
0.74
NH4
1.76
–3.14
4.44
–0.58
–0.13
1.55
0.17
–0.16
0.22
–0.03
0.27
–0.07
0.94
TP
0.71
4.77
0.67
–2.61
–2.28
0.45
0.07
0.29
0.04
–0.16
0.06
–0.14
0.72
PO4
0.59
4.99
1.70
–1.96
–2.32
0.54
0.06
0.
0.09
–0.11
–0.02
–0.13
0.69
Cond
5.31
–2.67
–5.70
0.68
1.03
–0.82
0.18
–0.11
–0.23
0.02
–0.11
0.04
0.82
TSS
–0.39
–0.84
–3.54
–1.12
2.96
–0.11
–0.04
–0.07
–0.09
0.24
0.25
0.58
BOD
–0.67
0.13
–1.17
2.36
3.58
0.72
0.01
0.02
–0.16
0.
–0.07
0.71
TDS
–2.34
1.21
0.48
1.68
2.49
1.37
–0.15
0.13
0.05
0.18
0.27
0.72
COD
–0.67
0.13
–1.17
2.36
3.58
0.72
0.01
0.02
–0.16
0.
–0.07
0.71
According to MLR, where all PCs were used as independent variables (called the second approach), only PC scores 7, 10 and 12 were found to have significant linear relationships with chl-
Results of regression analysis (n = 16) with all PCs for Çaygören Reservoir
Included independent variables
Regression coefficient (bk)
Standard deviation of bk
t
p
R2 (%)
Constant
1.14
0.3
3.79
0.02
76.8
Score 7
0.77
–2.2
1.8
0.049
Score 10
3.42
1.56
2.2
0.041
Score 12
0.8
0.5
2.08
0.045
Results of regression analysis (n = 16) with all PCs for Ikizcetepeler Reservoir
Included independent variables
Regression coefficient (bk)
Standard deviation of bk
t
P
R2 (%)
Constant
1.06
0.15
6.97
0.004
80.3
Score 1
0.57
1.2
1.97
0.049
Score 12
–5.81
1.72
–3.39
0.042
As can be seen in Table 5, 71% of the variation in chl-
In the Ikizcetepeler Reservoir, 75% of the variation in chl-
In the Çaygören Reservoir, six variables that had significant impacts on PC scores 1, 2, 4, 5 and 6, i.e. pH, chl-
In the Ikizcetepeler Reservoir, eleven variables that had significant impacts on PC scores 2, 3, 4, 5 and 6, i.e. pH, chl-
Results of the residual analysis were used to verify the applicability of the assembled regression models in both reservoirs. The existences of influential and outlier observations were checked in the models (Tables 1 and 2). Values of chl-
Observed and predicted Chlorophyll-
Observations
Observed values (mg l–1)
First approach predicted values (mg l–1)
Second approach predicted values (mg l–1)
1
1.75
0.7462
1.5680
2
2.04
2.4665
2.0199
3
2.00
1.2119
1.8576
4
1.21
0.9917
1.1429
5
1.26
0.6538
0.9731
6
1.09
0.621
0.8199
7
0.99
1.4281
0.9911
8
1.11
0.5003
1.2811
9
0.82
1.2926
0.6984
10
0.99
0.9912
0.8931
11
1.66
1.5972
1.8327
12
1.23
2.2135
1.2346
13
0.96
0.9209
1.2511
14
5.45
1.8412
5.4829
15
0.55
1.6093
0.392
16
1.01
1.6114
1.1065
17
1.04
1.5
1.34
Observed and predicted Chlorophyll-
Observations
Observed values (mg l–1)
First approach predicted values (mg l–1)
Second approach predicted values (mg l–1)
1
1.62
1.5204
1.420
2
0.99
0.8411
0.790
3
–0.70
0.7428
–0.550
4
0.92
0.6361
0.780
5
0.94
2.0858
0.9100
6
3.34
2.0732
3.200
7
0.69
0.9732
0.6300
8
0.74
0.8894
0.6200
9
0.79
0.9133
0.6800
10
1.16
0.8023
1.000
11
1.71
1.6937
1.500
12
1.17
0.7573
1.110
13
1.10
1.3245
0.980
14
1.27
1.0466
1.150
15
0.95
0.881
0.830
16
1.05
0.559
0.900
17
1.03
0.98
1.100
In the Çaygören Reservoir, 71% of the variation in the water quality parameters was explained by the PC3 scores, while 75% of the variation in the Ikizcetepeler Reservoir was explained by the PC1 scores, implying that chl-
Chl-
In the Ikizcetepeler Reservoir, when scores of all PCs were included in MLR (second approach), PC scores 1 and 12 were found to be significantly correlated with chl-
The main difference between the PCR and the classic MLR is that zero correlations exist between the scores and thus the problem of multi-collinearity is eliminated in PCR. The regression coefficient of PC scores obtained through this approach is presented in Table 9 for the Çaygören Reservoir. The determination coefficient of this model was found to be 76.8% and the following model was used to predict chl-
In the Ikizcetepeler Reservoir, the regression coefficients of PC scores obtained through the PCR approach are presented in Table 10. The determination coefficient of this model was found to be 80.3% and the following model was used to predict chl-
In the eutrophic Çaygören Reservoir, six PCs explained 71% of the total variation in the relationships of water quality variables, while in the mesotrophic Ikizcetepeler Reservoir, the same number of components explained 75% of the total variation. These results show that PCR is a suitable tool for showing the relationships between the water quality variables in lakes with different trophic states (Chang et al. 2012).
In the Çaygören Reservoir, all variables were included in the six selected PCs. However, only nine variables (NO3–N, NO2–N, NH4–N, PO4, TSS, pH, Secchi disk depth, TDS and TP) had significant loads within these PCs. The results showed that phosphorus and nitrogen ions were more effective on primary production than the other factors in the eutrophic Çaygören Reservoir. The research has shown that in eutrophic lakes nitrogen limitation occurs in a short time, usually in early summer, followed by the appearance of nitrogen-fixing Cyanobacteria, after which the algal community returns to phosphorus limitation (Schindler 2012).
In the mesotrophic Ikizcetepeler Reservoir, all variables were included in the six selected PCs. However, only seven variables (TSS, NO2–N, COD, SO4, TDS, pH and Secchi disk depth) had significant loads within these PCs. These results showed that in the mesotrophic Ikizcetepeler Reservoir, mostly water transparency (light availability) was effective on primary production rather than phosphorus and nitrogen. High loadings of TSS and Secchi disk depth into the selected PCs suggest that chl-
In the eutrophic Çaygören Reservoir, when all 16 PC scores were included in the stepwise MLR to predict chl-
In the eutrophic Çaygören Reservoir, the determination coefficient (R2) was 71% for the first approach’s model and 76.8% for the second approach’s model. In the mesotrophic Ikizcetepeler Reservoir, the determination coefficient was 75% for the first approach’s model and 80.3% for the second approach’s model. There is a 5% rise in R2 in both reservoirs, but this rise was not significant. R2 values are used as criteria for the predictive success of the regression models (Steyerberg et al. 2001). Generally, univariate simple regression models can yield high R2 values, but they may obscure the complexity of the natural ecosystems. The predictive power of the present models show that the multidimensional approach for predicting chl-
The results of this study show that PCR is an appropriate tool for predicting chl