Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Workshop Track Proceedings of 1st International Conference on Learning Representations, Scottsdale, Arizona, USA, May 2013, pp. 1–12. Search in Google Scholar

[2] P. Jeffrey, S. Richard, and D. M. Christopher, “GloVe: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. Search in Google Scholar

[3] B. Wang, A. Wang, F. Chen, Y. Wang, and C.-C. J. Kuo, “Evaluating word embedding models: methods and experimental results,”APSIPA Transactions on Signal and Information Processing, vol. 8, Art no. e19, Jul. 2019. https://doi.org/10.1017/ATSIP.2019.1210.1017/ATSIP.2019.12 Search in Google Scholar

[4] A. Znotiņš, “Word embeddings for Latvian natural language processing tools,”Human Language Technologies – The Baltic Perspective, vol. 289, IOS Press, pp. 167–173, 2016. https://doi.org/10.3233/978-1-61499-701-6-167 Search in Google Scholar

[5] A. Znotiņš and G. Barzdiņš, “LVBERT: Transformer-based model for Latvian language understanding,” Human Language Technologies – The Baltic Perspective, vol. 328, IOS Press, pp. 111–115, 2020. https://doi.org/10.3233/FAIA20061010.3233/FAIA200610 Search in Google Scholar

[6] R. Vīksna and I. Skadiņa, “Large language models for Latvian named entity recognition,” Human Language Technologies – The Baltic Perspective, vol. 328,IOS Press, pp. 62–69, 2020. https://doi.org/10.3233/FAIA20060310.3233/FAIA200603 Search in Google Scholar

[7] “EuroParl,” [Online]. Available: https://www.statmt.org/europarl/. Accessed on: May 2021. Search in Google Scholar

[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding”, in Proceedings of NAACL-HLT, Minneapolis, Minnesota, 2019, p. 4171–4186. Search in Google Scholar

[9] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv, Art no.1802.05365, pp. 1–15, 2018. Search in Google Scholar

[10] M. Ulčar, A. Žagar, C. Armendariz, A. Repar, S. Pollak, M. Purver, and M. Robnik-Šikonja, “Evaluation of contextual embeddings on less-resourced languages,” arXiv, Art no. 2107.10614, pp. 1–45, 2021. Search in Google Scholar

[11] X. Rong, “word2vec parameter learning explained”, arXiv, Art no. 1411.2738v4, pp. 1–21, 2016. Search in Google Scholar

[12] W. Ling, C. Dyer, A. Black, and I. Trancoso, “Two/Too simple adaptations of Word2Vec for syntax problems”, in Proceedings of the 2015 Conference of the North American, Denver, Colorado, May-June 2015, pp. 1299–1304. https://doi.org/10.3115/v1/N15-114210.3115/v1/N15-1142 Search in Google Scholar

[13] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information”, Transactions of the Association for Computational Linguistics, vol. 5, June 2017, pp. 135–146. https://doi.org/10.1162/tacl_a_0005110.1162/tacl_a_00051 Search in Google Scholar

[14] Z. Zhao, T. Liu, S. Li, and B. Li, “Ngram2vec: Learning improved word representations from Ngram co-occurrence statistics”, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, Sep. 2017, pp. 244–253. https://doi.org/10.18653/v1/D17-102310.18653/v1/D17-1023 Search in Google Scholar

[15] “UDPipe,” [Online]. Available: https://ufal.mff.cuni.cz/udpipe/1. Accessed on: May 2021. Search in Google Scholar

[16] “Gensim library,” [Online]. Available: https://radimrehurek.com/gensim/. Accessed on: May 2021. Search in Google Scholar

[17] “Ngram2vec tool repository,” [Online]. Available: https://github.com/zhezhaoa/ngram2vec. Accessed on: May 2021. Search in Google Scholar

[18] “Structured Skip-Gram tool repository,” [Online]. Available: https://github.com/wlin12/wang2vec. Accessed on: May 2021. Search in Google Scholar

[19] M. Ulčar, K. Vaik, J. Lindstrom, M. Dailidenaite, and M. Robnik-Sikonja, “Multilingual culture-independent word analogy datasets,” in Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 2020, pp. 4074–4080. Search in Google Scholar

[20] “Translated analogy dataset repository,” [Online]. Available: https://www.clarin.si/repository/xmlui/handle/11356/1261. Accessed on: May 2021. Search in Google Scholar

[21] “SpaCy tool,” [Online]. Available: https://spacy.io/. Accessed on: May 2021. Search in Google Scholar

[22] “LVTB dataset repository.” [Online]. Available: https://github.com/UniversalDependencies/UD_Latvian-LVTB/tree/master. Accessed on: May 2021. Search in Google Scholar

[23] “LUMII_AiLab NER dataset repository.” [Online]. Available: https://github.com/LUMII-AILab/FullStack/tree/master/NamedEntities. Accessed on: May 2021. Search in Google Scholar

[24] O. Levy and Y. Goldberg, “Linguistic Regularities in Sparse and Explicit Word Representations”, in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, June 2014, pp. 171–180. https://doi.org/10.3115/v1/W14-161810.3115/v1/W14-1618 Search in Google Scholar

[25] “CommonCrawl,” [Online]. Available: https://commoncrawl.org/. Accessed on: May 2021. Search in Google Scholar

[26] P. Paikens, “Deep neural learning approaches for Latvian morphological tagging,” Human Language Technologies – The Baltic Perspective, vol. 289,IOS Press, pp. 160–166, 2016.https://doi.org/10.3233/978-1-61499-701-6-160 Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Informática, Inteligencia artificial, Tecnologías de la información, Gestión de proyectos, Desarrollo de software

RSS Feed de revista

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Rolands Laucis

Gints Jēkabsons

Publicado en línea: 30 dic 2021

Páginas: 132 - 138

DOI: https://doi.org/10.2478/acss-2021-0016

Palabras claveNamed entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings

© 2021 Rolands Laucis et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Named entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings