This work is licensed under the Creative Commons Attribution 4.0 International License.
Abdelali, A., Hassan, S., & Mubarak, H. (2021). Pre-Training BERT on Arabic Tweets: Practical Considerations. Qatar Computing Research Institute. Doha 5825, Qatar: arXiv.Search in Google Scholar
Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., & McDermott, M. (2019). Publicly Available Clinical BERT Embeddings. (pp. 72-78). Proceedings of the 2nd Clinical Natural Language Processing Workshop.Search in Google Scholar
Canete, J., Chaperon, G., & Fuentes, R. (2019). Spanish pre-trained bert model and evaluation data. PML4DC at ICLR.Search in Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z., Wang, S., & Hu, G. (2019). Pre-Training with Whole Word Masking for Chinese BERT.Search in Google Scholar
Dai, A., & Le, Q. (2015). Semi-supervised sequence learning”, In Advances in neural information processing systems., (pp. 3079–3087).Search in Google Scholar
De Vries, W., Van Cranenburgh, A., Bisazza, A., Caselli, T., Van Noord, G., & Nissim, M. (2019). BERTje: A Dutch BERT Model. arXiv preprint arXiv.Search in Google Scholar
Delobelle, P., Winters, T., & Berendt, B. (2020). RobBERT: a Dutch RoBERTa-based Language Model.Search in Google Scholar
Delvin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT.Search in Google Scholar
Hugging Face (2022). Transformers (4.4.2) [Computer software]. Retrieved from https://github.com/huggingface/transformersSearch in Google Scholar
Gerz, D., Vulić, I., Ponti, E., Naradowsky, J., Reichart, R., Korhonen, A. (2018) Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction. Transactions of the Association for Computational Linguistics; 6 451–465. doi: https://doi.org/10.1162/tacl_a_00032Search in Google Scholar
Joseph, T., Lev, R., & Yoshua, B. (2010). Word representations: A simple and general method for semi-supervised learning. 48th Annual Meeting of the Association for Computational Linguistics.Search in Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. (pp. 1097-1105). Advances in neural information processing systems.Search in Google Scholar
Kryeziu, L., & Shehu, V. (2022). A Survey of Using Unsupervised Learning Techniques in Building Masked Language Models for Low Resource Languages. 2022 11th Mediterranean Conference on Embedded Computing (MECO) (pp. 1-6). Budva: MECO.Search in Google Scholar
Kryeziu, L., Shehu, V., & Caushi, A. (2022). Evaluation and Verification of NLP Datasets for the Albanian Language. International Conference on Artificial Intelligence of Things. Istanbul, Turkey.Search in Google Scholar
Kamps, J., Kondylidis, N., & Rau, D. (2020). Impact of Tokenization, Pretraining Task, and Transformer Depth on Text Ranking. In TREC.Search in Google Scholar
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., Ho So, C., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, (pp. 1234–1240).Search in Google Scholar
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach.Search in Google Scholar
Matthew, E., Mark, N., Mohit, J., Matt, G., Christopher, C., Kenton, L., & Luke, Z. (2018). Deep contextualized word representations.Search in Google Scholar
Özçift, A., Akarsu, K., Yumuk, F., & Söylemez, C. (2021). Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish.Search in Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI.Search in Google Scholar
Tian, H., Yang, K., Liu, D., & Lv, J. (2020). Anchibert: A pre-trained model for ancient chineselanguage understanding and generation. arXiv.Search in Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., . . . Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, (pp. 6000–6010).Search in Google Scholar
Virtanen, A., Kanerva, J., Ilo, R., Luoma, J., Luotolahti, J., Salakoski, T., . . . Pyysalo, S. (2019). Multilingual is not enough: Bert for finnish.Search in Google Scholar
Winograd, T. (1972). Understanding natural language. (pp. 1-191). Cognitive psychology.Search in Google Scholar
Wu, S., & Dredze, M. (2019). The surprising cross-lingual effectiveness of BERT. Conference on Empirical Methods in Natural Language Processing (pp. 833–844). Hong Kong, China: Association for Computational Linguistics.Search in Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems (NeurIPS), (pp. 5754–5764).Search in Google Scholar
Yuwen, Z., & Zhaozhuo, X. (2018). Bert for question answering on squad 2.0.Search in Google Scholar