Charting orthographical reliability in a corpus of English historical letters

Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as this is resource-intensive, linguists use editions that have been produced by non-linguists. Many editions claim to retain original spellings, but in practice text is always normalized at the graph level and possibly more so. This does not preclude using such a corpus for orthographical research, but there has been no systematic way to determine the philological reliability of an edited text. In this paper we present a typological methodology we are developing for the evaluation of orthographical quality of edition-based corpora, with the aim of making the best use of bad data in the context of editions and manuscript practices. As a case study, we apply this methodology to the Early Modern and Late Modern English sections of the Corpus of Early English Correspondence.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Linguistics and Semiotics, Applied Linguistics, Quantitative, Computational, and Corpus Linguistics, Theoretical Frameworks and Disciplines, Linguistics, other, Germanic Languages, English, Social Sciences, Communication Science, Communication Science, other

Journal RSS Feed

Charting orthographical reliability in a corpus of English historical letters

Anni Sairio

Samuli Kaislaniemi

Anna Merikallio

Terttu Nevalainen

Published Online: Apr 11, 2018

Page range: 79 - 96

DOI: https://doi.org/10.1515/icame-2018-0005

Keywords

© 2018 Anni Sairio et al., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.