Evaluation of Quality of Slovak Language Use in LLMS

[1] RADFORD, A. ‒ WU, J. ‒ CHILD, R. ‒ LUAN, D. ‒ AMODEI, D. ‒ SUTSKEVER, I. (2019). Language models are unsupervised multitask learners. Search in Google Scholar

[2] PAPINENI, K. ‒ ROUKOS, S. ‒ WARD, T. ‒ ZHU, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). Search in Google Scholar

[3] LIN, C. Y. (2004). ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74-81. Search in Google Scholar

[4] CHINCHOR, N. (1991). MUC-3 Evaluation Metrics and Linguistic Phenomena Tests. In: NATURAL LANGUAGE PROCESSING SYSTEMS EVALUATION WORKSHOP. p. 13. Search in Google Scholar

[5] ZHANG, T. ‒ KISHORE, V. ‒ WU, F. ‒ WEINBERGER, K. Q. ‒ ARTZI, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675. Search in Google Scholar

[6] ZHAO, W. ‒ PEYRARD, M. ‒ LIU, F. ‒ GAO, Y. ‒ MEYER, C. M. ‒ EGER, S. (2019). MoverScore: Text generation evaluating with contextualized embeddings and Earth Mover Distance. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 563-578. Search in Google Scholar

[7] BROWN, T. B. ‒ MANN, B. ‒ RYDER, N. ‒ SUBBIAH, M. ‒ KAPLAN, J. ‒ DHARIWAL, P. ‒ AMODEI, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Search in Google Scholar

[8] WEI, J. ‒ WANG, X. ‒ SCHUURMANS, D. ‒ BOSMA, M. ‒ ICHTER, B. ‒ XIA, F. ‒ LE, Q. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903. Search in Google Scholar

[9] MAYNEZ, J. ‒ NARAYAN, S. ‒ BOHNET, B. ‒ MCDONALD, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1906-1919. Search in Google Scholar

[10] SHENG, E. ‒ CHANG, K. W. ‒ NATARAJAN, P. ‒ PENG, N. (2021). Societal biases in language generation: Progress and challenges. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 4275-4293. Search in Google Scholar

[11] GEHMAN, S. ‒ GURURANGAN, S. ‒ SAP, M. ‒ CHOI, Y. ‒ SMITH, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3356-3369. Search in Google Scholar

[12] POPOVIĆ, M. (2017). chrF++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation, 612-618. Search in Google Scholar

[13] JOSHI, P. ‒ SANTY, S. ‒ BUDHIRAJA, A. ‒ BALI, K. ‒ CHOUDHURY, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282-6293. Search in Google Scholar

[14] BEDNÁR, P. ‒ DOBEŠ, M. ‒ GARABÍK, R. (2024). Training of large language model Mistral on Slovak language data. Jazykovedný časopis. Under review. Search in Google Scholar

[15] VAN DER LEE, C. ‒ GATT, A. ‒ VAN MILTENBURG, E. ‒ WUBBEN, S. ‒ KRAHMER, E. (2019). Best practices for the human evaluation of automatically generated text. Proceedings of the 12th International Conference on Natural Language Generation, 355-368. Search in Google Scholar

[16] CHIANG, W.-L. ‒ LI, Z. ‒ LIN, Z. ‒ SHENG, Y. ‒ WU, Z. ‒ ZHANG, P. ‒ ZHANG, C. (2023). Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/ Search in Google Scholar

[17] KOCMI, T. ‒ FEDERMANN, C. (2023). Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520. Search in Google Scholar

[18] BENDER, E. M. ‒ GEBRU, T. ‒ MCMILLAN-MAJOR, A. ‒ SHMITCHELL, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). ACM. Search in Google Scholar

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 4 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Informationstechnik, Datanbanken und Data Mining, Technik, Elektrotechnik, Informationstechnik

Zeitschrift RSS Feed

Evaluation of Quality of Slovak Language Use in LLMS

Marek Dobeš

Online veröffentlicht: 24. Feb. 2025

Seitenbereich: 28 - 33

Eingereicht: 10. Sept. 2024

Akzeptiert: 13. Nov. 2024

DOI: https://doi.org/10.2478/aei-2025-0004

SchlüsselwörterLLM, metrics, Slovak language

© 2025 Marek Dobeš, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Schlüsselwörter
LLM, metrics, Slovak language