We have considered an approach to estimating the performance for a wide range of science applications calculated on modern HPC systems with globally addressed memory. Modeling and estimation of memory bandwidth have been examined for a set of applications with parallel structure based on MPI/OpenMP technology. The HPCG benchmark was used to create a workload representing a wide range of calculation and communication tasks in science applications. A set of experiments for checking the model on a real HPC system with globally addressed memory (ccNUMA architecture with 12 Tb of memory with single image of operating system installed) was conducted for estimating the size of the task and highlighting the benefits of optimized model usage. The optimized model will allow to estimate the performance of modern and future systems developed based on the ccNUMA architecture which contains 24 Tb of memory in one node. The model will also allow to compare the results of NUMA systems with other modern HPC architectures.
Citation: Drobintsev P.D., Kotlyarov V.P., Levchenko A.V. Experimental aspects of memory bandwidth for HPC systems with ccNUMA architecture. St. Petersburg State Polytechnical University Journal. Computer Science. Telecommunications and Control Systems. 2017, Vol. 10, No. 3, Pp. 32–41. DOI: 10.18721/JCSTCS.10303