1. bookVolume 68 (2017): Issue 2 (December 2017)
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

Compound Adverbs as an Issue in Machine Analysis of Czech language

Published Online: 24 Jan 2018
Volume & Issue: Volume 68 (2017) - Issue 2 (December 2017)
Page range: 396 - 403
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

Compound adverbs represent an interesting issue in terms of Automatic Morphological Analysis (AMA). The reason is that compound adverbs in Czech are expressions formed by compounding existing words that are different parts of speech without any change in their form. An indicative sign of compound adverbs is that they can always be decomposed again. Compound adverbs may be written as one word but sometimes a multiword form coexists. A word that is originally a different part of speech gains an adverbial meaning and becomes an adverb. This article presents the results of a corpus probe aimed at mapping expressions that are demonstrably compound adverbs and were not recognized by AMA or were incorrectly tagged by AMA as another part of speech. Analysis of data obtained from the Czech National Corpus (ČNK) SYN v3 show that the unrecognized and incorrectly tagged units can be divided into several groups. Based on knowledge of these groups it is possible to refine part of speech tagging by AMA. The corpus probe examined units written in accordance with the current codification as well as substandard units.

Keywords

[1] Osolsobě, K. (2014). Česká morfologie a korpusy. Karolinum, Praha.Search in Google Scholar

[2] Dokulil, M. (1962). Tvoření slov v češtině, díl 1. Teorie odvozování. Nakladatelství Československé akademie věd, Praha.Search in Google Scholar

[3] Internetová jazyková příručka (2016). ÚJČ AV ČR, Praha.Search in Google Scholar

[4] Cvrček, V. (2010). Mluvnice současné češtiny. Karolinum, Praha.Search in Google Scholar

[5] Multiword Expressions, (2016). In Wiki of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg PA.Search in Google Scholar

[6] Knapová, M. (1973). K otázkám adverbializace. Slovo a slovesnost, 34(2):150–157.Search in Google Scholar

[7] Trávníček. F. (1951). Mluvnice spisovné češtiny. Slovanské nakladatelství, Praha.Search in Google Scholar

[8] Sag, I. A., Baldwin, T., Bond, F., Copestake, A., and Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. Heidelberg, Berlin.10.1007/3-540-45715-1_1Search in Google Scholar

[9] Křen, M., Čermák, F., Hlaváčová, J., Hnátková, M., Jelínek, T., Kocek, J., Kopřivová, M., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová V., Skoumalová, H., and Šulc, M. (2014). Korpus SYN, verze 3 z 27. 1. 2014. Ústav Českého národního korpusu FF UK, Praha.Search in Google Scholar

[10] Horák, A., Pala, K., Rambousek, A., and Povolný, M. (2006). DEBVisDic – First Version of New Client-Server Wordnet Browsing and Editing Tool. In Proceedings of the Third International WordNet Conference – GWC 2006, pages 325–328, Masaryk University, Brno, Czech Republic.Search in Google Scholar

[11] Čermák, F. (2009). Slovník české frazeologie a idiomatiky. Leda, Praha.Search in Google Scholar

[12] Petkevič, V. (2014). Problémy automatické morfologické disambiguace češtiny. Naše řeč, 97(4-5):194–207.Search in Google Scholar

[13] Křen, M. (2016). Automatická anotace frazémů a ustálených kolokací. In WIKI Český národní korpus. FF UK, Praha.Search in Google Scholar

[14] Halfrunt, G. (1958). Psychosomatic aspects of bad poetry. Prostetnik Research Methods, 42(1):66–132.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo