Dataset creation for comprehensive performance evaluation of automatic speech recognition systems

Intelligent Systems and Technologies, Artificial Intelligence
Authors:
Abstract:

The performance evaluation of Automatic Speech Recognition (ASR) systems heavily depends on the availability of diverse and representative test datasets encompassing a wide range of complexities in various domains. This work introduces a novel methodology for collecting and preparing datasets for comprehensive ASR system evaluation. The proposed dataset incorporates a modern vocabulary enriched with numerous unique terms and proper nouns, facilitating an in-depth evaluation of overall ASR performance and the effectiveness of context-biasing techniques in computer science. Additionally, the dataset retains critical text features such as Punctuation and Capitalization (P&C), enabling a rigorous evaluation of P&C prediction algorithms. We present a detailed account of the dataset creation process, along with its statistical and qualitative analysis. Furthermore, we benchmark state-of-the-art ASR models, context-biasing approaches, and P&C prediction techniques using the proposed dataset, providing valuable insights into their relative performance.