Method for classifying risk incidents based on self-organization of semantic clusters
A method for automatic classification of textual descriptions of emergency risk incidents based on self-organizing semantic clustering is presented, which does not require prior data labeling. Unlike traditional approaches, the method involves a two-stage scheme, which consists of self-organization of a latent taxonomy of incidents through hierarchical thematic decomposition of the text corpus, as well as continuous classification of new messages according to their degree of belonging to all automatically selected classes at once. This transition from rigid assignment to a single class to fuzzy membership allows hybrid incidents to be decomposed into several risk factors, reflecting their mixed nature. The developed algorithm forms an interpretable and stable taxonomy of incidents that preserves the structural isolation of clusters even with a high proportion of hybrid events. Testing on the NRC data corpus showed that most messages have a dominant risk factor with significant secondary components. The average semantic consistency of clusters was ~0.62 (cosine measure), and the classification confidence is distributed around the mean, reflecting the presence of both pure and mixed incidents. The results confirm that the proposed method provides a mathematically correct decomposition of complex situations into a set of risk factors and reduces the sensitivity of classification to noise and inaccuracies in the input text. The methodology is focused on proactive risk analysis in complex technical systems and can be used for automated decision support in industrial safety systems.