Application of machine learning algorithms and neural networks for analyzing the influence of data type in hate speech detection

Intelligent Systems and Technologies, Artificial Intelligence
Authors:
Abstract:

At present, communication has reached an unprecedented level of activity thanks to online social platforms that have overcome geographical and linguistic barriers. However, the shift to online communication is accompanied by the spread of hate speech, which negatively affects the social environment of these platforms. In the field of natural language processing, research is being conducted to develop models for detecting and classifying hate speech, aimed at improving the safety and quality of the online environment. However, many of these studies are based on commonly used datasets that turn out to be unbalanced and insufficiently adapted to the new grammatical features of hate speech. This article presents a comparative study of the effectiveness of machine and deep learning algorithms in detecting hate speech based on a synthetic dataset. Three separate experiments were conducted using original and synthetically perturbated data. The findings indicate that employing a synthetic dataset enhances the representation of extremely negative or infrequently encountered communication scenarios, contributing to their more effective detection. Deep learning algorithms demonstrated superior performance in all experiments. The top-performing models in the first and second experiments, both using zero-shot learning, yielded accuracies of 52.04% and 62.13%, respectively. The last experiment revealed that the BiGRU + fastText architecture outperformed other models, achieving an accuracy of 72.68%.