Open Access

Kabyle corpus digital database and exploitation. Test of lexicometric analysis of the identity dimension in the romanesque discourse

Journal of Linguistics/Jazykovedný casopis's Cover Image
Journal of Linguistics/Jazykovedný casopis
Building Web corpora as sources for linguistic research and its applications

Cite

The purpose of this contribution is to show, through a preliminary analysis of a corpus sample composed of the first five kabyle novels (1963-1990), the contribution of lexicometry as a new method based on statistics, in the treatment of large corpora and the establishment of databases. The aim is to describe all the phases intrinsic to the preliminary processing of a corpus (transcription, tagging and lemmatization) before submitting them to the various stages of its exploitation. Thus, in our corpus, we have opted to deal with the theme of identity induced by the five works by highlighting both the overused vocabulary and the singularity of each work in relation to the corpus as a whole. But before moving on to the quantitative analysis of the vocabulary, a work of data preparation is necessary. We intend to focus on the orthographic choices to be adopted by removing all ambiguities, the marking out and the lemmatization of the corpus. In order to do this, we have resorted to Lexico5 computer tool.

eISSN:
1338-4287
Language:
English
Publication timeframe:
2 times per year
Journal Subjects:
Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other