1. bookVolume 68 (2017): Issue 2 (December 2017)
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Published Online: 24 Jan 2018
Volume & Issue: Volume 68 (2017) - Issue 2 (December 2017)
Page range: 169 - 178
Journal Details
License
Format
Journal
eISSN
1338-4287
ISSN
0021-5597
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.

Keywords

[1] Iomdin, L. L. (2013). Nekotorye mikrosintaksičeskie konstruktsii v russkom jazyke s učastiem slova čto v kačestve sostavnogo elementa. [Certain microsyntactic constructions in Russian which contain the word čto as a constituent element.] Južnoslovenski filolog, LXIX:137–147. [In Russian.]Search in Google Scholar

[2] Iomdin, L. L. (2014). Xorošo menja tam ne bylo: sintaksis i semantika odnogo klassa russkix razgovornyx konstruktsij. [Good thing I wasn’t there: syntax and semantics of a class of Russian colloquial constructions.] In Grammaticalization and lexicalization in the Slavic languages. Proceedings from the 36th meeting of the commission on the grammatical structure of the Slavic languages of the International committee of Slavists, pages 423–436, Verlag Otto Sagner, München/Berlin/Washington D.C. [In Russian.]Search in Google Scholar

[3] Iomdin, L. L. (2015). Konstruktsii mikrosintaksisa, obrazovannye russkoj leksemoj raz. [Construction of microsyntax built by the Russian word raz.] SLAVIA, časopis pro slovanskou filologii, 84(3):291–306. [In Russian.]Search in Google Scholar

[4] Iomdin, L. (2016). Microsyntactic Phenomena as a Computational Linguistics Issue. In Grammar and Lexicon: Interactions and Interfaces. Proceedings of the Workshop, pages 8–18, Osaka, Japan. Accesible at: http://aclweb.org/anthology/W/W16/W16-38.pdf.Search in Google Scholar

[5] Fillmore, Ch. (1988). The Mechanisms of Construction Grammar. In Proceedings of the Fourteenth Annual Meeting of the Berkeley Linguistics Society, pages 35–55.10.3765/bls.v14i0.1794Search in Google Scholar

[6] Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.Search in Google Scholar

[7] Rakhilina, E. V., editor (2010). Lingvistika konstruktsij. [The linguistics of constructions.] Azbukovnik Publishers, Moscow. [In Russian.]Search in Google Scholar

[8] Lauwers, P. and Wettere, van N. (2017). La Micro-constructionnalisation En Tandem: La Copularisation De Tourner/virer. Langue française, 194(2):85–103.Search in Google Scholar

[9] Rhodes, R. (2009). Tautological constructions in English … and beyond. Presented to the Syntax and Semantics Circle, UCB. Accessible at: http://linguistics.berkeley.edu/~russellrhodes/pdfs/syntax_circle_taut_qp.pdf.Search in Google Scholar

[10] Iomdin, L. (2017). Kak nam byt’ s konstruktsijami tipa kak byt? [What to do about constructions like what to do?] Computational Linguistics and Intellectual Technologies. Dialogue 2017, 16 (23)(2):150–161. [In Russian, Engl. Abstract.]Search in Google Scholar

[11] Marakasova, A. A. and Iomdin, L. L. (2016). Mikrosintaksičeskaja razmetka v korpuse russkix tekstov SynTagRus [Microsyntactic tagging in the SynTagRus corpus of Russian texts.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školykonferencii IPPI RAN, pages 445–449, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570285171.pdf.Search in Google Scholar

[12] Dyachenko, P. V., Iomdin, L. L., Lazursky, A. V., Mityushin, L. G., Podlesskaya, Yu, O., Sizov, V. G., Frolova, T. I., and Tsinman, L. L. (2015). Sovremennoe sostojanie gluboko annotirovannogo korpusa tekstov russkogo jazyka (SynTagRus). [The current state of the deeply annotated corpus of Russian texts (SynTagRus).] In Nacional’nyj korpus russkogo jazyka. 10 let proektu. Trudy Instituta russkogo jazyka im. V.V. Vinogradova. M, Vol. 6, pages 272–299. [In Russian.]Search in Google Scholar

[13] Apresjan, Ju., D., Iomdin, L. L., Sannikov, A. V., and Sizov, V. G. (2004). Semantičeskaja razmetka v gluboko annotirovannom korpuse russkogo jazyka. [Semantic Tagging in a deeply annotated corpus of Russian.] In Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika – 2004», pages 41–54, Izd-vo Sankt-Peterburgskogo universiteta, Saint Petersburg, Russia. [In Russian.]Search in Google Scholar

[14] Mel’čuk, I. A. (1974). Opyt teorii lingvističeskix modelej «Smysl Û Tekst». [An experience of creating the theory of linguistic models of the Meaning Û Text type.] Nauka Publishers, Moscow. [In Russian.]Search in Google Scholar

[15] Inshakova, E. S. (2016). Razrešenie sintaksičeskoj mestoimennoj anafory v sisteme «ETAP-3». [Resolution of syntactic pronominal anaphora in the ETAP-3 system.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školy-konferencii IPPI RAN, pages 420–429, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570282678.pdf.Search in Google Scholar

[16] Marakasova, A. A. (2016). Avtomatičeskoe razrešenie anafory v russkom tekste: slučaj nulevogo sub”ekta. [Automatic resolution og anaphora in a Russian text: the case of a zero subject.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školykonferencii IPPI RAN, pages 431–436, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570285121.pdf.Search in Google Scholar

[17] Dikonov, V. G. and Poritski, V. V. (2014). A Virtual Russian Sense Tagged Corpus and Catching Errors In A Russian Û Semantic Pivot Dictionary. Computational Linguistics and Intellectual Technologies. Dialogue 2014, 13(20):128–137.Search in Google Scholar

[18] Mihalcea, R. (1998). SemCor semantically tagged corpus, SenseEval 2 & 3 data in SemCor format. Accessible at: http://www.cse.unt.edu/~rada/downloads.html.Search in Google Scholar

[19] Petrolito, T. and Bond, F. (2014). A survey of WordNet Annotated Corpora. In Proceedings of the Seventh Global WordNet Conference, pages 236–243, Tartu, Estonia.Search in Google Scholar

[20] Rosén, V., Smedt, K. de, Smørdal Losnegaard, G., Bejček, E., Savary, A. and Osenova, P. (2016). MWEs in Treebanks: From Survey to Guidelines. In Proceedings, LREC 2016, Tenth International Conference on Language Resources and Evaluation, pages 2323–2330, Portorož, Slovenia.Search in Google Scholar

[21] Savary, A., Sangati, F., Candito, M. et al. (2017). The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 31–47, Valencia, Spain.10.18653/v1/W17-1704Search in Google Scholar

[22] Apresjan, Ju. D., Boguslavsky, I. M., Iomdin, L. L., Lazursky, A. V., Mitjushin, L. G., Sannikov, V. Z., and Tsinman, L. L. (1992). Lingvističeskij processor dlja složnyx informacionnyx sistem. [A linguistic processor for complex information systems.] Nauka Publishers, Moscow. [In Russian.]Search in Google Scholar

[23] Apresjan, Ju. D., Boguslavsky, I. M., Iomdin, L. L., and Sannikov, V. Z. (2010). Teoretičeskie problemy russkogo sintaksisa: Vzaimodejstvie grammatiki i slovarja. [Theoretical Issues of Russian Syntax: Interaction of the Grammar and the Lexicon.] In Apresjan, Ju. D., editor, Jazyki slavjanskix kul’tur. Moscow. [In Russian.]Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo