1. bookVolume 72 (2021): Issue 2 (December 2021)
    NLP, Corpus Linguistics and Interdisciplinarity
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

A Robust Approach to Variation in Carpathian Rusyn: Resampling-Based Methods for Small Data Sets

Published Online: 30 Dec 2021
Volume & Issue: Volume 72 (2021) - Issue 2 (December 2021) - NLP, Corpus Linguistics and Interdisciplinarity
Page range: 603 - 617
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

Quantitative, corpus based research on spontaneous spoken Carpathian Rusyn language can cause several data-related problems: Speakers are using ambivalent forms in different quantities, resulting in a biased data set – while a stricter data-cleaning process would lead to a large scale data loss. On top of that, polytomous categorical dependent variables are hard to analyze due to methodological limitations. This paper provides several approaches to face unbalanced and biased data sets containing variation of conjugational forms of the verb maty ‘to have’ and (po-)znaty ‘to know’ in Carpathian Rusyn language. Using resampling based methods like Cross-Validation, Bootstrapping and Random Forests, we provide a strategy for circumventing possible methodological pitfalls and gaining the most information from our precious data, without trying to p-hack the results. Calculating the predictive power of several sociolinguistic factors on linguistic variation, we can make valid statements about the (sociolinguistic) status of Rusyn and the stability of the old dialect continuum of Rusyn varieties.

Keywords

[1] Auer, P., and Hinskens, F. (1996). Convergence and Divergence of Dialects in Europe. In Sociolinguistica (10).10.1515/9783110245158.1 Search in Google Scholar

[2] Woolhiser, C. (2005). Political borders and dialect divergence/convergence in Europe. P. Auer, F. Hinskens and P. Kerswill (eds.). Dialect change: Convergence and divergence in european languages. Cambridge, pages 236–262. Search in Google Scholar

[3] RStudio Team. (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA. Accessible at: http://www.rstudio.com/. Search in Google Scholar

[4] Magocsi, P. R. (2015). With Their Backs to the Mountains: A History of Carpathian Rus’ and Carpatho-Rusyns. Budapest.10.1515/9789633861073 Search in Google Scholar

[5] H. A. Skrypnyk (ed.). (2013). Ukrajinci-Rusyny: Etnolinhvistyčni ta etnokul’turni procesy v istoryčnomu rozvytku. Kyjiv. Search in Google Scholar

[6] Boudovskaia, E. E. (2006). The morphology of Transcarpathian Ukrainian dialects. Los Angeles. Search in Google Scholar

[7] Rabus, A. (2019). Vergangenheitsbildung in gesprochenen karpatorussinischen Varietäten: Quantitativ-statistische Perspektiven. Die Welt der Slaven 69(1), pages 15–33. Search in Google Scholar

[8] Plishkova, A. (2009). Language and national identity: Rusyns south of Carpathians. Translated by Patricia A. Krafcik. With a bio-bibliographic introduction by Paul Robert Magocsi. New York (Classics of Carpatho-Rusyn scholarship, 14). Search in Google Scholar

[9] Pugh, S. M. (2009). The Rusyn language: A grammar of the literary standard of Slovakia with reference to Lemko and Subcarpathian Rusyn. München (Languages of the World/Materials, 476). Search in Google Scholar

[10] Pan’kevyč, I. (1938). Ukrajins’ki hovory Pidkarpats’koji Rusy i sumežnych oblastej. Z pryložennjam 5 dialektolohičnych map. Častyna I. Zvučnja i morfolohija. Praha. Search in Google Scholar

[11] Chambers, J. K. (2002). Patterns of Variation including Change. The Handbook of Language Variation and Change, pages 358–361. Search in Google Scholar

[12] Bates, D., Maechler, M., Bolker, B., and Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), pages 1–48. DOI 10.18637/jss. v067.i0.10.18637/jss.v067.i01 Search in Google Scholar

[13] Hlavac, M. (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2. Accessible at: https://CRAN.R-project.org/package=stargazer. Search in Google Scholar

[14] Elff, M. (2020). mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion. R package version 0.8.5.1. Accessible at: https://CRAN.R-project.org/package=mclogit. Search in Google Scholar

[15] Mosteller, F., and Tukey, J. W. (1968). Data analysis, including statistics. In Handbook of Social Psychology. Addison-Wesley, Reading, MA. Search in Google Scholar

[16] Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBSM38, SIAM, Philadelphia, Penn.10.1137/1.9781611970319 Search in Google Scholar

[17] Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc., 78, pages 316–331.10.1080/01621459.1983.10477973 Search in Google Scholar

[18] VanderWeele, T. J., and Shpitser, I. (2013). On the definition of a confounder. Annals of Statistics. 41(1), pages 196–220.10.1214/12-AOS1058 Search in Google Scholar

[19] Hinneburg, A., Mannila, H., Kaislaniemi, S., Nevalainen T., and Raumolin-Brunberg, H. (2006). How to Handle Small Samples: Bootstrap and Bayesian Methods in the Analysis of Linguistic Change. Literary and Linguistic Computing 22(2), pages 137–150.10.1093/llc/fqm006 Search in Google Scholar

[20] Fox, J. (2002). Bootstrapping Regression Models Appendix to An R and S-PLUS Companion to Applied Regression. Search in Google Scholar

[21] Canty, A., and Ripley, B. (2021). boot. Bootstrap R (S-Plus) Functions. R package version 1.3-25. Accessible at: https://cran.r-project.org/web/packages/boot/boot.pdf. Search in Google Scholar

[22] Lüdecke, D. (2020). _sjstats: Statistical Functions for Regression Models (Version 0.18.0). Accessible at: https://CRAN.R-project.org/package=sjstats. Search in Google Scholar

[23] Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and regression trees. Search in Google Scholar

[24] Breiman L. (2001). Random forests. Machine Learning, 45(1), pages 5–32. Search in Google Scholar

[25] Liaw A., and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), pages 18–22. Search in Google Scholar

[26] Hothorn, T., Buehlmann, P., Dudoit, S., Molinaro, A., and Van Der Laan, M. (2006). Survival Ensembles. Biostatistics, 7(3), pages 355–373. Search in Google Scholar

[27] Strobl, C., Boulesteix, A., Zeileis, A., and Hothorn, T. (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8(25). Accessible at: http://www.biomedcentral.com/1471-2105/8/25. Search in Google Scholar

[28] Strobl, C., Boulesteix, A., Kneib, T., Augustin, T. and Zeileis, A. (2008). Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9(307). Accessible at: http://www.biomedcentral.com/1471-2105/9/307. Search in Google Scholar

[29] Strobl, C., Hothorn, T. and Zeileis, A. (2009). Party on! A New, Conditional Variable Importance Measure for Random Forests Available in the party Package. Department of Statistics: Technical Reports, No. 50. Search in Google Scholar

[30] Schimon, A., and Rabus, A. (2016). Wahrnehmungsdialektologische Untersuchungen zum Russinischen in Zakarpattja am Beispiel der Region Chust. Zeitschrift für Slawistik 61(3), pages 401–432.10.1515/slaw-2016-0025 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo