Levels of Annotation in the Slovene Training Corpus ssj500k 2.2

This paper presents the Slovene Training Corpus ssj500k 2.2, which has been annotated on the levels of tokenization, sentence segmentation, part-of-speech tagging, lemmatization, syntactic dependencies, named entities, verbal multi-word expressions, and semantic role labeling. It describes the individual layers of annotation and shows the scope of using the training corpus in the production of various lexicons, such as the lexicon of multi-word units and the valency lexicon of modern Slovene. It concludes by presenting our future work, i.e. the annotation of multi-word expressions based on the Slovene Lexical Database.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 2 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Linguistik und Semiotik, Theorien und Fachgebiete, Linguistik, andere

Zeitschrift RSS Feed

Levels of Annotation in the Slovene Training Corpus ssj500k 2.2

Mija Bon

Polona Gantar

Online veröffentlicht: 21. Dez. 2019

Seitenbereich: 390 - 399

DOI: https://doi.org/10.2478/jazcas-2019-0068

Schlüsselwörtercorpus linguistics, training corpus, corpus annotation, Slovene language

© 2019 Mija Bon et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Schlüsselwörter
corpus linguistics, training corpus, corpus annotation, Slovene language