Computing, Telecommunication and Control

Информатика, телекоммуникации и управление

2687-0517

10.18721/JCSTCS.18302

Application of machine learning algorithms and neural networks for analyzing the influence of data type in hate speech detection

Применение алгоритмов машинного обучения и нейронных сетей для анализа влияния типа данных при выявлении ненавистнических высказываний

Mbele Ossiyi

L.P.

lucprucell@gmail.com

0000-0003-1116-7765

56049610600

Drobintsev

Pavel

drobintsev_pd@spbstu.ru

6603839750

Sergey M. Ustinov

Сергей

usm50@yandex.ru

Peter the Great St. Petersburg Polytechnic University Peter the Great St.Petersburg Polytechnic University

30 09 2025

18 3 23 35

At present, communication has reached an unprecedented level of activity thanks to online social platforms that have overcome geographical and linguistic barriers. However, the shift to online communication is accompanied by the spread of hate speech, which negatively affects the social environment of these platforms. In the field of natural language processing, research is being conducted to develop models for detecting and classifying hate speech, aimed at improving the safety and quality of the online environment. However, many of these studies are based on commonly used datasets that turn out to be unbalanced and insufficiently adapted to the new grammatical features of hate speech. This article presents a comparative study of the effectiveness of machine and deep learning algorithms in detecting hate speech based on a synthetic dataset. Three separate experiments were conducted using original and synthetically perturbated data. The findings indicate that employing a synthetic dataset enhances the representation of extremely negative or infrequently encountered communication scenarios, contributing to their more effective detection. Deep learning algorithms demonstrated superior performance in all experiments. The top-performing models in the first and second experiments, both using zero-shot learning, yielded accuracies of 52.04% and 62.13%, respectively. The last experiment revealed that the BiGRU + fastText architecture outperformed other models, achieving an accuracy of 72.68%.

sentiment analysis emotion recognition in text attention mechanism embedding CNN LSTM GRU