Method for automated enrichment of a knowledge base on glass compositions and properties based on data from scientific publications

Intelligent Systems and Technologies, Artificial Intelligence
Authors:
Abstract:

Automating the extraction of glass composition and property data from scientific literature is critically important for accelerating the development of new material. This work presents a method integrating: 1) the collection of full-text articles using the Elsevier Research Products APIs, 2) text preprocessing, 3) context-dependent extraction of structured data using a large language model (LLM) and a domain-specific prompt, 4) enrichment of a knowledge base on glasses. The key achievement is the development of a prompt that yields an F1-score of 0.99 for extracting chemical compositions, their properties and correctly establishing relationships between them on a sample of 50 articles. The proposed method significantly simplifies the automatic creation and continuous updating of knowledge bases on glasses, thereby eliminating the traditional reliance on manually curated, potentially outdated resources and providing a robust, data-driven foundation for the efficient designing of glasses with target properties using machine learning.