Predictive models and dynamics of estimates of applied tasks characteristics using machine learning methods
The paper considers the machine learning problem of simultaneous estimation of the conditional survival distribution and dynamic characteristics of computational tasks. The problem arises in cluster workload management and is extremely relevant for optimal scheduling. To solve the problem, a new method is proposed, based on the combination of the attention mechanism and the random survival forest. The key feature is the use of a tree structure derived from a random survival forest. The forest construction algorithm uses only the survival dataset. Each leaf uses the unconditional Kaplan-Meier estimate, which is a serious limitation of the forest, especially for rare events in some parts of the feature space. Moreover, the random survival forest does not allow estimating the dynamic parameters of the task. The proposed method solves these problems by extending the already constructed random survival forest with the attention mechanism inside each leaf of the tree. The Beran estimator is used to model survival distribution, and the Nadaraya-Watson regression with the same parameters is used to predict the dynamic characteristics of tasks. To do this, subsets of training data corresponding to the same leaf as the input vector are used. As a result, the joint model is obtained that allows us to estimate the survival function more accurately and at the same time to predict the dynamic characteristics of the task. The developed model combines the advantages of smooth models based on the attention mechanism and stepwise decision trees.