Count time series analysis of jobs scheduling in the hybrid supercomputer center

Applied problem solving with machine learning
Authors:
Abstract:

Increasing the efficiency of supercomputer centers is an extremely important task, especially in the context of growing demand for high-performance computing and a shortage of supercomputer resources. Statistical analysis of the results of various indicators of supercomputer performance is aimed at creating models of computing resource management and forming a basis for using artificial intelligence methods. The purpose of this research is to study the incoming flow of user requests (jobs), which largely determines the load on supercomputer resources. To analyze the incoming flow of user jobs, generalized linear models and generalized estimating equations, as well as the autoregressive conditional Poisson model, were used. It allowed taking
into account the dependence of observations and the effect of overdispersion. Based on the results of supercomputer operation observations, estimates of the time trend were obtained, as well as indicators of changes in the intensity of the job flow within weekly and annual cycles with classification by areas of expertise and computing clusters. Indicators of statistical significance of changes within the weekly and annual cycles were established. As a result of an advanced statistical analysis using multiple comparison methods, statistically significant orders of the main effects of the weekly and annual factors were obtained.