Contextual regularization of the feature space of weakly structured data for analyzing the risk topology of complex technical systems
The paper addresses the problem of eliminating sparsity and «false orthogonality» in short, weakly structured technical messages that hinder systematic analysis and modeling of the risk topology of complex technical systems. A method of contextual regularization of the feature space is proposed, which treats the enrichment of vector representations as a controlled diffusion process on a graph of joint occurrence of lemmas. The context topology is specified by a weighted adjacency matrix based on positive pointwise mutual information, and the recursive diffuser performs iterative feature propagation with depth attenuation and adaptive IDF gating, which suppresses noisy connections and amplifies diagnostically significant terms. The regularization parameter tuning is formalized as a task of maximizing the target quality functional, combining metrics of structural separability and semantic completeness with a threshold penalty for separability degradation. A priori, the limited nature of the diffusion process is demonstrated, and the elimination of orthogonality of terminologically heterogeneous descriptions in the presence of a contextual «bridge» in the graph is proven. Experimental testing on the NRC operational message corpus demonstrates a significant increase in the semantic coherence of topics while maintaining the geometric separability of clusters. The resulting regularized space improves the interpretability of the thematic structure of incidents and creates a basis for the subsequent self-organization of the risk event taxonomy and the construction of verifiable decision support contours.