Advancements in Offensive Language Detection: A Comprehensive Review and Experimental Analysis

The proliferation of offensive language in digital communication has become a significant challenge in the internet era, prompting the urgent need for advanced Natural Language Processing (NLP) techniques for its identification and mitigation. With a particular focus on NLP techniques, machine learning, deep learning, and transformer models, this study presents a thorough review of the shifting landscape of offensive language identification from the years 2020 through 2023. The datasets utilized in prior research have been scrutinized, specifically those of Dravidian languages such as Tamil, Malayalam, etc. Preprocessing techniques encompass a range of data cleansing and word embedding methodologies, including TF-IDF and Word2Vec, which are employed to train and optimize the model. We reviewed past work to compare the standard supervised learning models like Support Vector Machine and Naive Bayes to emergent transformer models like BERT, identifying the superior approach that would improve a model’s accuracy and effectiveness.

Idioma:: Inglés

Calendario de la edición:: 6 veces al año
Temas de la revista:: Informática, Fundamentos de la informática, Informática teórica, Seguridad informática y criptología

RSS Feed de revista

Advancements in Offensive Language Detection: A Comprehensive Review and Experimental Analysis

C. Nalini

R. Shanthakumari

Y. Agashia Maria

T. Janarthanan

M. Manibharathi

Publicado en línea: 20 feb 2025

Páginas: 162 - 179

DOI: https://doi.org/10.2478/ias-2024-0012

Palabras claveoffensive language, Dravidian language, word embedding, machine learning, deep learning, transformer model

© 2024 C. Nalini et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Palabras clave
offensive language, Dravidian language, word embedding, machine learning, deep learning, transformer model