Against level-3-only analyses in corpus linguistics

Stefan Th. Gries

Open Access

Against level-3-only analyses in corpus linguistics

Stefan Th. Gries

| May 28, 2024

ICAME Journal

Volume 48 (2024): Issue 1 (May 2024)

About this article

Cite

Page range: 23 - 47

Received: May 26, 2023

Accepted: Aug 14, 2023

DOI: https://doi.org/10.2478/icame-2024-0002

Keywords
learner corpus research, varieties research, regression, multi-level modeling, genitive alternation

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

In the last few decades, much work in corpus linguistics has attempted to discover, and then interpret, differences in the frequencies of use of linguistic elements (words, patterns, constructions, discourse features, etc.). It is probably fair to say that such studies were particularly frequent in (i) learner corpus research, (ii) corpus-based varieties research, and (iii) sociolinguistically motivated studies. For instance, many studies have discussed the differences in how often certain elements are used (i) in corpus data from native speakers vs. corpus data from learner from different L1 backgrounds, (ii) in corpora representing different inner- and outer-circle varieties, or (iii) by speakers in corpora representing people of different gender or sexual identities.

This paper will make the admittedly bold claim that any such study can in fact by definition unable to ‘prove’ what is often their main points, namely that the distributional differences found are in fact due to the one hypothesized explanatory variable(s) of L1, VARIETY, or, e.g., GENDER even when the distributional differences are significant and come with a decent effect size. To substantiate this claim, I will discuss some terminology from the family of methods known as multi-level modeling, namely the distinction between level-1, level-2, ... level-n variables and its relevance for many corpus studies. Second, I will then demonstrate how studies using only the above kinds of variables cannot distinguish the effect of their favored predictors from the effect of local/contextual level-1 variables. Third, in discussing this, I will exemplify how such effects need to be explored quantitatively instead.

eISSN:: 1502-5462
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Linguistics and Semiotics, Applied Linguistics, Quantitative, Computational, and Corpus Linguistics, Theoretical Frameworks and Disciplines, Linguistics, other, Germanic Languages, English, Social Sciences, Communication Science

Journal RSS Feed

Against level-3-only analyses in corpus linguistics

Published Online: May 28, 2024

Page range: 23 - 47

Received: May 26, 2023

Accepted: Aug 14, 2023

DOI: https://doi.org/10.2478/icame-2024-0002

Keywords
learner corpus research, varieties research, regression, multi-level modeling, genitive alternation

© 2024 Stefan Th. Gries, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Against level-3-only analyses in corpus linguistics

Published Online: May 28, 2024

Page range: 23 - 47

Received: May 26, 2023

Accepted: Aug 14, 2023

DOI: https://doi.org/10.2478/icame-2024-0002

Keywordslearner corpus research, varieties research, regression, multi-level modeling, genitive alternation

© 2024 Stefan Th. Gries, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
learner corpus research, varieties research, regression, multi-level modeling, genitive alternation