Charting orthographical reliability in a corpus of English historical letters
Published Online: Apr 11, 2018
Page range: 79 - 96
DOI: https://doi.org/10.1515/icame-2018-0005
Keywords
© 2018 Anni Sairio et al., published by De Gruyter Open
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as this is resource-intensive, linguists use editions that have been produced by non-linguists. Many editions claim to retain original spellings, but in practice text is always normalized at the graph level and possibly more so. This does not preclude using such a corpus for orthographical research, but there has been no systematic way to determine the philological reliability of an edited text. In this paper we present a typological methodology we are developing for the evaluation of orthographical quality of edition-based corpora, with the aim of making the best use of bad data in the context of editions and manuscript practices. As a case study, we apply this methodology to the Early Modern and Late Modern English sections of the Corpus of Early English Correspondence.