A database of North American double modals and self-repairs from YouTube

   | 21. Okt. 2022


Sequences of two modal verbs in spoken English can represent use of a nonstandard syntactic feature (double modal) or a corrected utterance in which a speaker begins with one modal auxiliary, but switches to another (self-repair). This article presents the Double Modals and Self-Repairs (DMSR) database, a table of naturalistic double modals and self-repairs in videos from local government entities in North America, created from the Corpus of North American Spoken English (CoNASE). The paper describes the procedures used for the database’s creation, discusses potential uses, and presents an exploratory analysis in which a logistic regression classifier is trained with CoNASE data to distinguish authentic double modals from self-repair sequences on the basis of local discourse context. The analysis demonstrates how large corpora of speech can be used to investigate the links between syntactic and pragmatic phenomena and shows specifically that double modals are an interactive device, while two-modal sequences as self-repairs may be the result of high cognitive load. The paper concludes with a discussion of multimodal corpus creation from YouTube for the study of lexical, syntactic, and interactional phenomena in speech as well as for the analysis of complex, multilevel computer-mediated communication (CMC) phenomena.

