Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Rolands Laucis; Gints Jēkabsons

Open Access

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Rolands Laucis

and

Gints Jēkabsons

| Dec 30, 2021

Applied Computer Systems

Volume 26 (2021): Issue 2 (December 2021)

About this article

Cite

Page range: 132 - 138

DOI: https://doi.org/10.2478/acss-2021-0016

Keywords
Named entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings

This work is licensed under the Creative Commons Attribution 4.0 International License.

Nowadays, natural language processing (NLP) is increasingly relaying on pre-trained word embeddings for use in various tasks. However, there is little research devoted to Latvian – a language that is much more morphologically complex than English. In this study, several experiments were carried out in three NLP tasks on four different methods of creating word embeddings: word2vec, fastText, Structured Skip-Gram and ngram2vec. The obtained results can serve as a baseline for future research on the Latvian language in NLP. The main conclusions are the following: First, in the part-of-speech task, using a training corpus 46 times smaller than in a previous study, the accuracy was 91.4 % (versus 98.3 % in the previous study). Second, fastText demonstrated the overall best effectiveness. Third, the best results for all methods were observed for embeddings with a dimension size of 200. Finally, word lemmatization generally did not improve results.

eISSN:: 2255-8691
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development

Journal RSS Feed

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Published Online: Dec 30, 2021

Page range: 132 - 138

DOI: https://doi.org/10.2478/acss-2021-0016

Keywords
Named entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings

© 2021 Rolands Laucis et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Published Online: Dec 30, 2021

Page range: 132 - 138

DOI: https://doi.org/10.2478/acss-2021-0016

KeywordsNamed entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings

© 2021 Rolands Laucis et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Named entity recognition, natural language processing, part-of-speech tagging, word analogy, word embeddings