×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

Optimization of the dense matrix multiplication procedure for shared memory systems

Abstract

Optimization of the dense matrix multiplication procedure for shared memory systems

Egunov V.A. Shabalovsky V.A., Dudkin D.M.

Incoming article date: 11.03.2024

The study presents an extensive analysis of methods for low-level optimization of the matrix multiplication algorithm for computing systems with shared memory. Based on a comparison of various approaches, including block optimization, parallel execution with OpenMP, vectorization with AVX and the use of the Intel MKL library, significant improvements in the performance of the resulting software implementations are revealed. In particular, block optimization reduces the number of cache misses, parallelism effectively uses multicore, and vectorization and Intel MKL demonstrate maximum acceleration due to more efficient software optimizations. The obtained results emphasize the importance of careful selection of optimization methods and their compliance with the architecture of the computing system in order to achieve the required performance parameters of the designed software.

Keywords: low-level optimization, block optimization, parallel execution, OpenMP, vectorization, AVX, Intel MKL, performance, benchmarking, matrix multiplication