1. bookVolumen 72 (2022): Edición 4 (June 2022)
    Building Web corpora as sources for linguistic research and its applications
Detalles de la revista
License
Formato
Revista
eISSN
1338-4287
Primera edición
05 Mar 2010
Calendario de la edición
2 veces al año
Idiomas
Inglés
access type Acceso abierto

Consistency of morphological dictionary MorfFlex

Publicado en línea: 17 Aug 2022
Volumen & Edición: Volumen 72 (2022) - Edición 4 (June 2022) - Building Web corpora as sources for linguistic research and its applications
Páginas: 855 - 861
Detalles de la revista
License
Formato
Revista
eISSN
1338-4287
Primera edición
05 Mar 2010
Calendario de la edición
2 veces al año
Idiomas
Inglés
Abstract

Language corpora usually contain, in addition to their own texts, various types of annotations. The most common one is a morphological annotation, which consists in assigning a lemma and a morphological tag to each wordform. For morphological tagging, morphological dictionaries are traditionally used. Our paper presents a new version of the so-called “Prague” morphological dictionary MorfFlex used for tagging many Czech corpora (particularly Prague Dependency Treebanks, corpora published by the Institute of the Czech National Corpus in Prague or large Czech web corpora of the Aranea series). Three basic principles were used to update the dictionary: the Golden Rule of Morphology, the Principle of Paradigm Unity, and the Principle of Paradigm Uniqueness.

Keywords

BENKO, Vladimír: Aranea: Yet Another Family of (Comparable) Web Corpora. In: Text, Speech and Dialogue. 17th International Conference, TSD 2014, Brno, Czech Republic, September 8–12, 2014. Proceedings. Eds. P. Sojka – A. Horák – I. Kopeček – K. Pala. LNCS, Springer International Publishing Switzerland 2014. s. 247–256. Search in Google Scholar

HAJIČ, Jan: Disambiguation of Rich Inflection. (Computational Morphology of Czech). Praha: Karolinum 2004. Search in Google Scholar

HAJIČ, Jan – HAJIČOVÁ, Eva – MIKULOVÁ, Marie – MÍROVSKÝ, Jiří: Prague Dependency Treebank. In: Handbook on Linguistic Annotation. Eds. N. Ide – J. Pustejovsky. Berlin: Springer Verlag 2017, s. 555–594.10.1007/978-94-024-0881-2_21 Search in Google Scholar

HAJIČ, Jan – BEJČEK, Eduard – BÉMOVÁ, Alevtina – BURÁŇOVÁ, Eva – FUČÍKOVÁ, Eva – HAJIČOVÁ, Eva – HAVELKA, Jiří – HLAVÁČOVÁ, Jaroslava – HOMOLA, Petr – IRCING, Pavel – KÁRNÍK, Jiří – KETTNEROVÁ, Václava – KLYUEVA, Natalia – KOLÁŘOVÁ, Veronika – KUČOVÁ, Lucie – LOPATKOVÁ, Markéta – MAREČEK, David – MIKULOVÁ, Marie – MÍROVSKÝ, Jiří – NEDOLUZHKO, Anna – NOVÁK, Michal – PAJAS, Petr – PANEVOVÁ, Jarmila – PETEREK, Nino – POLÁKOVÁ, Lucie – POPEL, Martin – POPELKA, Jan – ROMPORTL, Jan – RYSOVÁ, Magdaléna – SEMECKÝ, Jiří – SGALL, Petr – SPOUSTOVÁ, Johanka – STRAKA, Milan – STRAŇÁK, Pavel – SYNKOVÁ, Pavlína – ŠEVČÍKOVÁ, Magda – ŠINDLEROVÁ, Jana – ŠTĚPÁNEK, Jan – ŠTĚPÁNKOVÁ, Barbora – TOMAN, Josef – UREŠOVÁ, Zdeňka – VIDOVÁ, Hladká Barbora – ZEMAN, Daniel – ZIKÁNOVÁ, Šárka – ŽABOKRTSKÝ Zdeněk: Prague Dependency Treebank – Consolidated 1.0. Data/software, LINDAT/CLARIAH-CZ digital library. Prague: Charles University 2020a. Dostupné na: http://hdl.handle.net/11234/1-3185. Search in Google Scholar

HAJIČ, Jan – HLAVÁČOVÁ, Jaroslava – MIKULOVÁ, Marie – STRAKA, Milan – ŠTĚPÁNKOVÁ Barbora: MorfFlex CZ 2.0. Data/software, LINDAT/CLARIAH-CZ digital library. Prague: Charles University 2020b. Dostupné na: http://hdl.handle.net/11234/1-3186. Search in Google Scholar

HLAVÁČOVÁ, Jaroslava – MIKULOVÁ, Marie – ŠTĚPÁNKOVÁ, Barbora – HAJIČ Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. In: Jazykovedný časopis, 2019, roč. 70, č. 2, s. 380–389.10.2478/jazcas-2019-0067 Search in Google Scholar

HLAVÁČOVÁ, Jaroslava: Formalizace systému české morfologie s ohledem na automatické zpracování českých textů. Ph.D. thesis, Praha: FF UK 2009. Search in Google Scholar

HNÁTKOVÁ, Milena – KŘEN, Michal – PROCHÁZKA, Pavel – SKOUMALOVÁ, Hana: The SYN-series corpora of written Czech. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC‘14). Reykjavík: ELRA 2014, s. 160–164. Search in Google Scholar

MIKULOVÁ, Marie – HAJIČ, Jan – HANA, Jiří – HANOVÁ, Hana – HLAVÁČOVÁ, Jaroslava – JEŘÁBEK, Emil – ŠTĚPÁNKOVÁ, Barbora – VIDOVÁ, Hladká Barbora – ZEMAN, Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank – Consolidated 2020 release. Technical report no. 2020/TR-2020-64, Praha: Institute of Formal and Applied Linguistics, Charles University 2020. Search in Google Scholar

Slovník spisovného jazyka českého. Hl. red. B. Havránek. Praha: Nakl. Československé akademie věd 1960–1971. Search in Google Scholar

STRAKA, Milan – STRAKOVÁ, Jana: MorphoDiTa: Morphological Dictionary and Tagger. Data/software, LINDAT/CLARIAH-CZ digital library. Prague: Charles University 2014. Dostupné na: http://hdl.handle.net/11858/00-097C-0000-0023-43CD-0. Search in Google Scholar

STRAKOVÁ, Jana – STRAKA, Milan – HAJIČ, Jan: Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics 2014, s. 13–18.10.3115/v1/P14-5003 Search in Google Scholar

SYN. ÚČNK FF UK, Praha. Dostupný z http://www.korpus.cz [25. 06. 2018] Search in Google Scholar

Artículos recomendados de Trend MD

Planifique su conferencia remota con Sciendo