Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction

Benko, V. (2015). Araneum Bohemicum Maius, verze 15.04. Ústav Českého národního korpusu FF UK, Praha 2015. Accessible at: http://www.korpus.cz. Search in Google Scholar

Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv. /abs/1810.04805 Accessible at: https://doi.org/10.48550/ar-Xiv.1810.04805. Search in Google Scholar

Chordia, V. (2021). PunKtuator: A multilingual punctuation restoration system for spoken and written text. In Proceedings of the 16^th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. pages 312–320. Association for Computational Linguistics. Accessible at: https://doi.org/10.18653/v1/2021.eacl-demos.37. Search in Google Scholar

Internet Language Reference Book. (2008–2023). Praha: Ústav pro jazyk český AV ČR. Accessible at: https://prirucka.ujc.cas.cz/. Search in Google Scholar

Karlík, P. (2017). Vokativ. In M. Nekula et al. (eds.): Nový encyklopedický slovník češtiny. Accessible at: https://www.czechency.org/slovnik/search?action=listpub&search=vokativ. Search in Google Scholar

Kovář, V. et al. (2016). Evaluation and improvements in punctuation detection for Czech. In P. Sojka et al. (eds.): Text, Speech, and Dialogue, pages 287–294. Springer International Publishing. Search in Google Scholar

Lehečka, J. et al. (2021). Comparison of Czech Transformers on Text Classification Tasks. In L. Espinosa-Anke et al. (eds): Statistical Language and Speech Processing. SLSP 2021. Lecture Notes in Computer Science, vol. 13062. Springer. Accessible at: https://doi.org/10.1007/978-3-030-89579-2_3. Search in Google Scholar

Liu, Y. et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv. /abs/1907.11692. Accessible at: https://doi.org/10.48550/arXiv.1907.11692. Search in Google Scholar

Machura, J. et al. (2022). Automatic Grammar Correction of Commas in Czech Written Texts: Comparative Study. In P. Sojka et al. (eds): Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science, vol 13502. Springer. Accessible at: https://doi.org/10.1007/978-3-031-16270-1_10. Search in Google Scholar

Nunberg, G. (1990). The Linguistics of Punctuation. CSLI lecture notes. Cambridge University Press. Search in Google Scholar

Radford, A. et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. Search in Google Scholar

Suchomel, V. (2018). csTenTen17, a Recent Czech Web Corpus. In Twelveth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2018, pages 111–123. Search in Google Scholar

Švec, J. et al. (2014) General framework for mining, processing and storing large amounts of electronic texts for language modelling purposes. Lang Resources & Evaluation 48, pages 227–248. Accessible at: https://doi.org/10.1007/s10579-013-9246-z. Search in Google Scholar

Švec, J. et al. (2021). Transformer-based automatic punctuation prediction and word casing reconstruction of the ASR output. In K. Ekštein et al. (eds.): Text, Speech, and Dialogue, Springer International Publishing, pages 86–94. Search in Google Scholar

eISSN:: 1338-4287
Langue:: Anglais

Périodicité:: 2 fois par an
Sujets de la revue:: Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other

RSS Feed de la revue

Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction

Publié en ligne: 25 déc. 2023

Pages: 357 - 368

DOI: https://doi.org/10.2478/jazcas-2023-0052

Mots cléscomma, Czech, vocative, machine learning, RoBERTa

© 2023 Jakub Machura et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
comma, Czech, vocative, machine learning, RoBERTa