1. bookVolume 72 (2021): Issue 2 (December 2021)
    NLP, Corpus Linguistics and Interdisciplinarity
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

From Graphematics to Phrasal, Sentential, and Textual Semantics Through Morphosyntax by Means of Corpus-Driven Grammar and Ontology: A Case Study on One Tibetan Text

Published Online: 30 Dec 2021
Volume & Issue: Volume 72 (2021) - Issue 2 (December 2021) - NLP, Corpus Linguistics and Interdisciplinarity
Page range: 319 - 329
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

This article presents the current results of an ongoing study of the possibilities of fine-tuning automatic morphosyntactic and semantic annotation by means of improving the underlying formal grammar and ontology on the example of one Tibetan text. The ultimate purpose of work at this stage was to improve linguistic software developed for natural-language processing and understanding in order to achieve complete annotation of a specific text and such state of the formal model, in which all linguistic phenomena observed in the text would be explained. This purpose includes the following tasks: analysis of error cases in annotation of the text from the corpus; eliminating these errors in automatic annotation; development of formal grammar and updating of dictionaries. Along with the morpho-syntactic analysis, the current approach involves simultaneous semantic analysis as well. The article describes semantic annotation of the corpus, required by grammar revision and development, which was made with the use of computer ontology. The work is carried out with one of the corpus texts – a grammatical poetic treatise Sum-cu-pa (VII c.).

Keywords

[1] The Basic Corpus of the Tibetan Classical Language. (2019). Search in Google Scholar

[2] The Corpus of Indigenous Tibetan Grammar Treatises. (2019). Search in Google Scholar

[3] The corpus of Tibetan grammatical works. In Automatic documentation and mathematical linguistics, vol. 49, no. 5, pages 182–191. https://doi.org/10.3103/S0005105515050064.10.3103/S0005105515050064 Search in Google Scholar

[4] Wagner, A., and Zeisler, B. (2004). A syntactically annotated corpus of Tibetan. In Proceedings of the 4th International Conference on Language Resources and Evaluation. Lisbon, pages 1141–1144. Search in Google Scholar

[5] Semantic Roles, Case Relations, and Cross-Clausal Reference in Tibetan. URL: http://www.sfb441.unituebingen.de/b11/b11corpora.html#clarkTrees. Search in Google Scholar

[6] Grokhovskiy, P., Khokhlova, M., Smirnova, M., and Zakharov V. (2015). Tibetan Linguistic Terminology on the Base of the Tibetan Traditional Grammar Treatises Corpus. In P. Král and V. Matoušek (eds.), Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science, vol 9302. Springer, Cham.10.1007/978-3-319-24033-6_34 Search in Google Scholar

[7] Dobrov, A., Dobrova, A., Grokhovskiy, P., Soms, N., and Zakharov V. (2016). Morphosyntactic analyser for the Tibetan language: aspects of structural ambiguity. In International Conference on Text, Speech, and Dialogue, pages 215–222. DOI 10.1007/978-3-319-45510-5_25.10.1007/978-3-319-45510-5_25 Search in Google Scholar

[8] Aho, A. V., and Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM. 18, 6, pages 333–340.10.1145/360825.360855 Search in Google Scholar

[9] Dobrov, A., Dobrova, A., Smirnova, M., and Soms, N. (2019). Formal Grammatical and Ontological Modeling of Corpus Data on Tibetan Compounds. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, vol. 2: KEOD, Vienna, Austria. Search in Google Scholar

[10] Dobrov, A., Dobrova, A., Grokhovskiy, P., Soms, N. (2017). Morphosyntactic Parser and Textual Corpora: Processing Uncommon Phenomena of Tibetan Language. In Proceedings of the International Conference IMS-2017, pages 143–153. DOI 10.1145/3143699.3143719.10.1145/3143699.3143719 Search in Google Scholar

[11] Dobrov, A., Dobrova, A., Grokhovskiy, P., Smirnova, M., and Soms, N. (2018). Computer ontology of Tibetan for morphosyntactic disambiguation. In D. A. Alexandrov, A. V. Boukhanovsky, A. V. Chugunov, Y. Kabanov and O. Koltsova (eds.), Digital Transformation and Global Society, pages 336–349, Cham. Springer International Publishing. https://doi.org/10.1007/978-3-030-02846-6_27.10.1007/978-3-030-02846-6_27 Search in Google Scholar

[12] Dobrov, A., Dobrova, A., Grokhovskiy, P., Smirnova, M., and Soms, N. (2018). Modeling in a computer ontology as a morphosyntactic disambiguation strategy. In P. Sojka, A. Horák, I. Kopecek and K. Pala (eds.), Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science, vol. 11107, pages 76–83, Cham. Springer International Publishing. https://doi.org/10.1007/978-3-030-00794-2_8.10.1007/978-3-030-00794-2_8 Search in Google Scholar

[13] Grokhovskii, P., and Smirnova M. (2017). Principles of Tibetan compounds processing in lexical database. In Proceedings of the International Conference IMS, pages 135–142. SCITEPRESS. ISBN: 978-1-4503-5437-0. DOI 10.1145/3143699.3143718. Search in Google Scholar

[14] Beyer, S. (1992). The Classical Tibetan Language. State University of New York, New York. Search in Google Scholar

[15] Hill, Nathan W. (2019). Tibetan zero nominalization. Revue d’Etudes Tibétaines, no. 48, Paris. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo