1. bookVolume 68 (2017): Issue 2 (December 2017)
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

Text collections for evaluation of Russian morphological taggers

Published Online: 24 Jan 2018
Volume & Issue: Volume 68 (2017) - Issue 2 (December 2017)
Page range: 258 - 267
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

Keywords

[1] Sorokin, A., Shavrina, T., Lyashevskaya, O., Bocharov, V., Alexeeva, S., Droganova, K., and Fenogenova, A. (forthcoming). MorphoRuEval-2017: an evaluation track for the automatic morphological analysis methods for Russian. In Computational linguistics and intellectual technologies. Proceedings of International Workshop Dialogue’2017, Moscow.Search in Google Scholar

[2] Lyashevskaya, O. N., Plungian, V. A., and Sichinava, D. V. (2005). O morfologicheskom standarte Korpusa sovremennogo russkogo jazyka [Morphological standard of the Corpus of contemporary Russian]. In Nacional’nyj korpus russkogo jazyka: 2003–2005 [Russian National Corpus: 2003 - 2005], pages 111–135, Moscow. Accessible at: http://ruscorpora.ru/sbornik2005/08lashevs.pdf.Search in Google Scholar

[3] Selegey, D., Shavrina, T., Selegey, V., and Sharoff, S. (2016). Automatic morphological tagging of Russian social media corpora: training and testing. In Computational linguistics and intellectual technologies. Proceedings of International Workshop Dialogue’2016, Moscow.Search in Google Scholar

[4] Bocharov, V. V., Alexeeva, S. V., Granovsky, D. V., Protopopova, E. V., Stepanova, M. E., and Surikov, A. V. (2013). Crowdsourcing morphological annotation. In Computational linguistics and intellectual technologies. Proceedings of International Workshop Dialogue’2013, Vol. 12 (19), Moscow.Search in Google Scholar

[5] Boguslavsky, I. (2014). SynTagRus–a Deeply Annotated Corpus of Russian. In Blumenthal, P., Novakova, I., and Siepmann, D., editors, Les émotions dans le discours-Emotions in Discourse, pages 367–380, Peter Lang, Frankfurt am Main, Germany.Search in Google Scholar

[6] Nivre, J. (2016). Reflections on Universal Dependencies. Department of Linguistics and Philology, Uppsala University.Search in Google Scholar

[7] Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, Ch. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., and Zeman, D. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of LREC 2016, pages 1659–1666, Portorož, Slovenia.Search in Google Scholar

[8] Zalizniak, A. A. (1977/2003). Grammaticheskij slovar’ russkogo jazyka [A Grammatical Dictionary of Russian.] Moscow.Search in Google Scholar

[9] Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., and Divjak, D. (2008). Designing and evaluating Russian tagsets. In Proceedings of LREC 2008, Marrakech, Marocco.Search in Google Scholar

[10] Lyashevskaya, O., Droganova, K., Zeman, D., Alexeeva, M., Gavrilova, T., Mustafina, N., and Shakurova, E. (2016). Universal Dependencies for Russian: a New Syntactic Dependencies Tagset. In Series: Linguistics, WP BRP 44/LNG/2016.Search in Google Scholar

[11] Toldova, S., Sokolova, E., Astafiyeva, I., Gareyshina, A., Koroleva, A., Privoznov, D., Sidorova, E., Tupikina, L., and Lyashevskaya, O. (2012). Ocenka metodov avtomaticheskogo analiza teksta 2011-2012: Sintaksicheskie parsery russkogo jazyka [NLP evaluation 2011-2012: Russian syntactic parsers.] In Computational linguistics and intellectual technologies. Proceedings of International Workshop Dialogue 2012. Vol. 11 (18), pages 797–809, RGGU, Moscow.Search in Google Scholar

[12] Lyashevskaya, O. (2016). The grammatical tagset of Russian. In Lyashevskaya, O. Korpusnye instrumenty v leksiko-grammaticheskikh issledovavijakh russkogo jazyka [Corpus approach to Russian grammar and lexicon], pages 435–456, Languages of Slavic culture press, Moscow.Search in Google Scholar

[13] McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., and Lee, J. (2013). Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL. Accessible at: https://ryanmcd.github.io/papers/treebanksACL2013.pdf.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo