Flexible deep forest classifier with multi-head attention
A new modification of the deep forest (DF), called the attention-based deep forest (ABDF), for solving classification problems is proposed in the paper. The main idea behind the modification is to use the attention mechanism to aggregate predictions of the random forests at each level of the DF to enhance the classification performance of the DF. The attention mechanism is implemented by assigning the attention weights with trainable parameters to class probability vectors. The trainable parameters are determined by solving an optimization problem minimizing the loss function of predictions at each level of the DF. In order to reduce the number of random forests, the multi-head attention is incorporated into the DF. Numerical experiments with real data illustrate the ABDF and compare it with the original DF.