Research and comparative analysis of the effectiveness of software and hardware implementations of transposed matrix multiplication
The article is devoted to the study and comparative analysis of the software and hardware implementation of the transposed matrix multiplication operation and its modified version, the matrix multiplication transpose. A feature of this study is the use of high-level synthesis tools to obtain and optimize hardware implementations of these operations. The relevance of this study is due to the widespread use of matrix operations, such as transposition and multiplication, to solve various applied problems, the power-law asymptotic complexity of matrix calculations and the lack of data on the effectiveness of using high-level synthesis tools in the tasks of creating hardware devices for matrix calculations. A step-by-step method for synthesizing and optimizing the hardware implementation of these operations is proposed. A comparative study of the software and hardware implementations of these two operations was carried out. It is shown that the gain in performance of hardware implementations is achieved by increasing the degree of parallelism of matrix calculations. Additionally, studies were conducted on the required resources while increasing productivity through parallelization.