Advancements in Offensive Language Detection: A Comprehensive Review and Experimental Analysis
Publicado en línea: 20 feb 2025
Páginas: 162 - 179
DOI: https://doi.org/10.2478/ias-2024-0012
Palabras clave
© 2024 C. Nalini et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The proliferation of offensive language in digital communication has become a significant challenge in the internet era, prompting the urgent need for advanced Natural Language Processing (NLP) techniques for its identification and mitigation. With a particular focus on NLP techniques, machine learning, deep learning, and transformer models, this study presents a thorough review of the shifting landscape of offensive language identification from the years 2020 through 2023. The datasets utilized in prior research have been scrutinized, specifically those of Dravidian languages such as Tamil, Malayalam, etc. Preprocessing techniques encompass a range of data cleansing and word embedding methodologies, including TF-IDF and Word2Vec, which are employed to train and optimize the model. We reviewed past work to compare the standard supervised learning models like Support Vector Machine and Naive Bayes to emergent transformer models like BERT, identifying the superior approach that would improve a model’s accuracy and effectiveness.