1. bookVolume 35 (2016): Issue 2 (June 2016)
24 Aug 2013
4 Hefte pro Jahr
access type Open Access

Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

Online veröffentlicht: 28 May 2016
Seitenbereich: 173 - 190
24 Aug 2013
4 Hefte pro Jahr

In this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

Anderson, M.J. & Wilis T.J. (2003). Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology, 84, 511–525. DOI: 10.1890/0012-9658(2003)084[0511:CAOPCA]2.0.CO;2.Search in Google Scholar

APHA, (1992). Standard methods for the examination of water and waste water. American Washington: Public Health Association.Search in Google Scholar

Bandalos, D.L. & Boehm-Kaufman M.R. (2009). Four common misconceptions in exploratory factor analysis. In C.E. Lance & R.J. Vandenberg (Eds.), Statistical and methodological myths and urban legends (pp. 61–87). New York: Routledge Publisher.Search in Google Scholar

Barrett, P.T. & Kline P. (1981). The observation to variable ratio in factor analysis. Personality Study and Group Behaviour, 1, 23−33.Search in Google Scholar

Bray, J.R. & Curtis J.T. (1957). An ordination of the upland forest communities of Southern Wisconsin. Ecol. Monogr., 27, 325–349. DOI: 10.2307/1942268.Search in Google Scholar

Bryant, F.B. & Yarnold P.R. (1995). Principal components analysis and exploratory and confirmatory factor analysis. In L.G. Grimm & R.R. Yarnold (Eds.), Reading and understanding multivariate statistics (pp. 99−136). Washington: American Psycholgical Association.Search in Google Scholar

Burd, B.J.A., Nemec, A. & Brinkhurst R.O. (1990). The development and application of analytical methods in benthic marine faunal studies. Adv. Mar. Biol., 26, 169−247. DOI: 10.1016/S0065-2881(08)60201-1.Search in Google Scholar

Cadima, J. & Jolliffe I.T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203−214. DOI: 10.1080/757584614.Search in Google Scholar

Cattell, R.B. (1966). The Scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276. DOI: 10.1207/s15327906mbr0102_10.Search in Google Scholar

Cattell, R.B. (1978). The scientific use of factor analysis in behavioral and life sciences. New York: Plenum Press.Search in Google Scholar

Chateau, F. & Lebart L. (1996). Assessing sample variability in the visualization techniques related to principal component analysis: Bootstrap and alternative simulation methods. In A. Prats (Ed.), Proceedings of COMPSTAT 2006. Heidelberg: Physica Verlag.Search in Google Scholar

Chatfield, C. & Collins A.J. (1980). Introduction to multivariate analysis. London, New York: Chapman & Hall.Search in Google Scholar

Comrey, A.L. & Lee H.B. (1992). A first course in factor analysis. London: Taylor and Francis.Search in Google Scholar

de Winter, J.C.F., Dodou, D. & Wieringa P.A. (2009). Exploratory factor analysis with small sample sizes. Multivariate Behavioral Research, 44, 147−181. DOI: 10.1080/00273170902794206.Search in Google Scholar

Dengler, J., Lobel, S. & Dolnik C. (2009). Species constancy depends on plot size a problem for vegetation classification and how it can be solved. J. Veg. Sci., 20, 754−766. DOI: 10.1111/j.1654-1103.2009.01073.x.Search in Google Scholar

Diaconis, P. & Efron B. (1983). Computer-intensive methods in statistics. Sci. Am., 248, 116−130. doi:10.1038/scientificamerican0583-116Search in Google Scholar

Dochtermann, N.A. & Jenkins S.H. (2011). Multivariate methods and small sample sizes. Ethology, 117, 95−101. DOI: 10.1111/j.1439-0310.2010.01846.x.Search in Google Scholar

Fasham, M.J.R. (1977). The comparison of nonmetric multidimensional scaling, principal component analysis and reciprocal averaging for the ordination of simulated coenocline and coenoplanes. Ecology, 58, 551−561. DOI: 10.2307/1939004Search in Google Scholar

Forcino, F.L. (2012). Multivariate assessment of the required sample size for community paleoecological research. Palaeogeo. Palaeoclimatol. Palaeoecol., 315−316, 134−141. DOI: 10.1016/j.palaeo.2011.11.019.Search in Google Scholar

Gamito, S. & Raffaelli D. (1992). The sensitivity of several ordination methods to sample replication in benthic surveys. J. Exp. Mar. Biol. Ecol., 164, 221−232. DOI: 10.1016/0022-0981(92)90176-B.Search in Google Scholar

Gauch, H.G. & Whittaker R.H. (1972). Comparison of ordination techniques. Ecology, 53, 868–875. DOI: 10.2307/1934302.Search in Google Scholar

Gauch, H.G., Whittaker R.H. & Wentworth T.R. (1977). A comparative study of reciprocal averaging and other ordination techniques. J. Ecol., 65, 157–174. DOI: 10.2307/2259071.Search in Google Scholar

Gauch, H.G., Whittaker R.H. & Singer S.B. (1981). A comparative study of nonmetric ordinations. J. Ecol., 69, 135–152. DOI: 10.2307/2259821Search in Google Scholar

Gehlhausen, S.M., Schwartz, M.W. & Augspurger C.K. (2000). Vegetation and microclimatic edge effects in two mixed mesophytic forest fragments. Plant Ecol., 147, 21−35. DOI: 10.1023/A:1009846507652.Search in Google Scholar

Goff, F.G. & Mitchell R. (1975). A comparison of species ordination results from plot and stand data. Vegetatio, 31, 15−22. DOI: 10.1007/BF00127871.Search in Google Scholar

Goodall, D.W. (1953). Objective methods for the classification of vegetation. III. An essay in the use of factor analysis. Aust. J. Bot., 1, 39−63. DOI: 10.1071/BT9530039.Search in Google Scholar

Gorsuch, R.L. (1983). Factor analysis. Hillsdale NJ: Lawrence Erlbaum Associates.Search in Google Scholar

Hatcher, L. (1994). A step-by-step approach to using the SAS system for factor analysis and structural equation modeling. Cary: SAS Institute.Search in Google Scholar

Hill, M.O. (1973). Reciprocal averaging: an eigenvector method of ordination. J. Ecol., 61, 237−249. DOI: 10.2307/2258931.Search in Google Scholar

Hill, M.O. & Gauch H.G. (1980). Detrended correspondence analysis: an improved technique. Vegetatio, 42, 47−58. DOI: 10.1007/BF00048870.Search in Google Scholar

Hirosawa, Y., Marsh, S.E. & Kliman D.H. (1996). Application of standardized principal component analysis to land-cover characterization using multi temporal AVHRR data. Remote Sens. Environ., 58, 267−281. DOI: 10.1016/S0034-4257(96)00068-5.Search in Google Scholar

Hirst, C.N. & Jackson D.A. (2007). Reconstructing community relationships: the impact of sampling error, ordination approach and gradient length. Divers. Distrib., 13, 361–371. DOI: 10.1111/j.1472-4642.2007.00307.x.Search in Google Scholar

Hutcheson, G. & Sofroniou N. (1999). The multivariate social scientist: Introductory statistics using generalized linear models. London: Sage Publication.Search in Google Scholar

Jackson, D.A. (1993). Stopping rules in principal components analysis: A comparison of heuristical and statistical approaches. Ecology, 74, 2204−2214. DOI: 10.2307/1939574.Search in Google Scholar

Jackson, J.A. (1991). A user’s guide to principal component analysis. New York: Wiley Inter Science.Search in Google Scholar

James, F.C. & McCulloch C.E. (1990). Multivariate analysis in ecology and systematics: panacea or Pandoras box. Annu. Rev. Ecol. Evol. Syst., 21, 129−166. DOI: 10.1146/annurev.es.21.110190.001021.Search in Google Scholar

Joliffe, I. (2002). Principal component analysis. New York: Springer-Verlag.Search in Google Scholar

Kendall, M. (1980). Multivariate analysis. London: Charles Griffin.Search in Google Scholar

Kline, P. (1979). Psychometrics and psychology. London: Academic Press.Search in Google Scholar

Knox, R.G. & Peet R.K. (1989). Bootstrapped ordination: a method for estimating sampling effects in indirect gradient analysis. Vegetatio, 80, 153−165. DOI: 10.1007/BF00048039.Search in Google Scholar

Lawley, D.N. & Maxwell A.E. (1971). Factor analysis as a statistical method. New York: Macmillan.Search in Google Scholar

Legendre, P. & Birks H.J.B. (2012). Clustering and partitioning. In H.J.B. Birks, A.F. Lotter, S. Juggins & J.P. Smol (Eds.), Tracking environmental change using lake sediments Vol. 5: Data handling and numerical techniques (pp. 167−200). Dordrecht: Springer. DOI: 10.1007/978-94-007-2745-8_7.Search in Google Scholar

MacCallum, R.C., Widaman, K.F., Zhang, S. & Hong S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84−99. DOI: 10.1037/1082-989X.4.1.84.Search in Google Scholar

MacCallum, R.C., Widaman, K.F., Preacher, K.J. & Hong S. (2001). Sample size in factor analysis: The role of model error. Multivariate Behavioral Research, 36, 611–637. DOI: 10.1207/S15327906MBR3604_06.Search in Google Scholar

Manjarres-Martinez, L.M., Gutiérrez-Estrada, J.C., Hernando, J.J.A. & Soriguer M.C. (2012). The performance of three ordination methods applied to demersal fish data sets: stability and interpretability. Fish. Manag. Ecol., 19, 200−213. DOI: 10.1111/j.1365-2400.2011.00817.x.Search in Google Scholar

Manly, B.F.J. (1998). Randomization, bootstrap and Monte Carlo methods in biology. London: Chapman & Hall.Search in Google Scholar

Minchin, P.R. (1987). An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio, 69, 89−107. DOI: 10.1007/BF00038690.Search in Google Scholar

Mundfrom, D.J., Shaw, D.G. & Ke T.L. (2005). Minimum sample size recommendations for conducting factor analyses. International Journal of Testing, 5, 159−168. DOI: 10.1207/s15327574ijt0502_4.Search in Google Scholar

Okland, R.H., Eilersten, O. & Okland T. (1990). On the relationship between sample size and beta diversity in boreal coniferous forests. Vegetatio, 87, 187−190. DOI: 10.1007/BF00042954.Search in Google Scholar

Orloci, L. (1966). Geometric models in ecology 1. The theory and application of some ordination methods. J. Ecol., 54, 193−215. DOI: 10.2307/2257667.Search in Google Scholar

Orloci, L. (1978). Multivariate analysis in vegetation research. The Hague: Junk.Search in Google Scholar

Osborne, J.W. & Costello A.B. (2004). Sample size and subject to item ratio in principal components analysis. Practical Assessment Research & Evaluation, 9, 15−23.Search in Google Scholar

Otypkova, Z. & Chytry M. (2006). Effects of plot size on the ordination of vegetation samples. J. Veg. Sci., 17, 465−472. DOI: 10.1111/j.1654-1103.2006.tb02467.x.Search in Google Scholar

Peres-Neto, P.R., Jackson, D.A. & Somers K.M. (2003). Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis. Ecology, 84, 2347–2363. http://www.jstor.org/stable/3450140Search in Google Scholar

Peres-Neto, P.R., Jackson, D.A. & Somers K.M. (2005). How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Computational Statistics and Data Analysis, 49, 974−997. DOI: 10.1016/j.csda.2004.06.015.Search in Google Scholar

Pillar, V. de P. (1999). The bootstrapped ordination re-examined. J. Veg. Sci., 10, 895−902. DOI: 10.2307/3237314.Search in Google Scholar

Preacher, K.J. & MacCallum R.C. (2002). Exploratory factor analysis in behavioral genetics research: Factor recovery with small sample sizes. Behav. Genet., 32, 153−161. DOI: 10.1023/A:1015210025234.Search in Google Scholar

Rao, C.R. (1964). The use and interrelation of principal component analysis in applied research. Sankhya (Ser. A), 26, 329−358. http://www.jstor.org/stable/25049339Search in Google Scholar

Richman, M.B. (1988). A cautionary note concerning a commonly applied eigen analysis procedure. Tellus B, 40, 50−58. DOI: 10.1111/j.1600-0889.1988.tb00212.x.Search in Google Scholar

Shaukat, S.S. (1985). Approaches to the analysis of ruderal weed vegetation. PhD. thesis, University of Western Ontario, London, Canada.Search in Google Scholar

Shaukat, S.S. & Uddin M. (1989a). A comparison of principal component and factor analysis as an ordination model with reference to desert ecosystem. Coenoses, 4, 15−28. http://www.jstor.org/stable/43461254Search in Google Scholar

Shaukat, S.S. & Uddin M. (1989b). An application of canonical and principal component analysis to the study of desert environment. Abstracta Botanica (Budapest), 13, 17−45. http://www.jstor.org/stable/43519176Search in Google Scholar

Shaukat, S.S. & Siddiqui I.A. (2005). Essentials of Mathematical Ecology: Computer Programs in BASIC, FORTRAN and C++. Karachi: Farquan Publishers.Search in Google Scholar

Shaukat, S.S., Sheikh I.H. & Siddiqui I.A. (2005). An application of correspondence analysis, Detrended correspondence analysis and Canonical correspondence analysis to the vegetation and environment of calcareous hills around Karachi. Int. J. Biol. Biotechnol., 2, 617−627.Search in Google Scholar

Stauffer, D. F., Garton E.O. & Steinhorst R.K. (1985). A comparison of principal component from real and random data. Ecology, 66, 1693−1698. DOI: 10.2307/2937364.Search in Google Scholar

Swan, J.M.A. & Dix R.L. (1966). The phytosociological structure of upland forest at Candle Lake, Saskatchewan. J. Ecol., 54, 13−40. DOI: 10.2307/2257657.Search in Google Scholar

Ter Braak, C.J.F. (1986). Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology, 67, 1167−1179. DOI: 10.2307/1938672.Search in Google Scholar

Velicer, W.F. & Fava J.L. (1998). The effects of variable and subject sampling on factor pattern recovery. Psychological Methods, 3, 231−251. DOI: 10.1037/1082-989X.3.2.231.Search in Google Scholar

Walker, S.C. & Jackson D.A. (2011). Random-effects ordination: describing and predicting multivariate correlations and co-occurrences. Ecol. Monogr., 81, 635–663. http://www.jstor.org/stable/23208478Search in Google Scholar

Whittaker, R.J. (1987). An application of detrended correspondence analysis and nonmetric multidimensional scaling to the identification and analysis of environmental factor complexes and vegetation structures. J. Ecol., 75, 363−376. DOI: 10.2307/2260424.Search in Google Scholar

Wikum, D.A. & Wali M.K. (1974). Analysis of a North Dakota gallery forest: Vegetation in relation to topographic and soil gradients. Ecol. Monogr., 44, 441–464. DOI: 10.2307/1942449.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo