type as a problem of automatic morphological analysis

The aim of our paper is to demonstrate the procedures by which the data needed to refine tools for automatic morphological analysis of Czech can be obtained using a corpus, namely the Araneum Bohemicum IV Maximum (Czech, 20.03) 7.10 G web corpus of the ARANEA series and Araneum Bohemicum Maximum (Czech, 15.04) 3,20 G (hereinafter Araneum). Particularly, we will focus on propria of the Kladenští type, i.e., substantivized adjectives of denoting groups of persons according to affiliation. The goal of the probe into the Aranea web corpus is: 1) a corpus-based description of frequented properties of the Kladenští type, which can be used as a starting point for rule disambiguation; 2) creating a list of the most frequent lemmas belonging to the Kladenští type, which can then be included into dictionaries of automatic morphological analyzers (e.g. the MorfFlex dictionary by Hajič and Hlaváčová). We believe that the probe can help improve the results of tools for automatic morphological analysis of Czech.

eISSN:: 1338-4287
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 2 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Linguistik und Semiotik, Theorien und Fachgebiete, Linguistik, andere

Zeitschrift RSS Feed

Kladenští type as a problem of automatic morphological analysis

Online veröffentlicht: 17. Aug. 2022

Seitenbereich: 862 - 872

DOI: https://doi.org/10.2478/jazcas-2022-0011

Schlüsselwörterautomatic morphological analysis, derivational type, part of speech transition

© 2022 Klára Osolsobě et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
automatic morphological analysis, derivational type, part of speech transition