Performance Evaluation of Load-balanced Particle-based Plasma Kinetic Simulations

天野 孝伸; 松本 洋介

09:00 〜 10:30

[PEM17-P13] Performance Evaluation of Load-balanced Particle-based Plasma Kinetic Simulations

*天野孝伸¹、松本洋介² (1.東京大学地球惑星科学専攻、2.千葉大学国際高等研究基幹)

キーワード：プラズマ、数値シミュレーション、粒子法

Plasma kinetic simulation has been an indispensable tool for laboratory, space, and astrophysical plasma modeling for a long time. To mitigate the inherently-large computational demand in solving Vlasov or Boltzmann equation in six-dimensional phase space, the standard Particle-In-Cell (PIC) scheme represents the phase space density using aggregation of computational particles. On the other hand, the electromagnetic field is defined on a mesh. The computational particles interact with the mesh quantities but move freely in continuous phase space across the mesh boundaries. The difference in the data structures between particles and fields makes it challenging to fully exploit the performance of modern supercomputers, which typically consist of a hierarchy of parallelism.

We have recently developed a general dynamic load-balancing framework for particle-based kinetic simulations in C++. It is based on decomposing the computational domain into small chunks [e.g., Germaschewski et al., 2016, Derouillat et al., 2018, Rowan et al., 2021]. The load balancing is performed by distributing the chunks into MPI ranks so that each rank's computational load becomes equal as much as possible. Therefore, the chunk size and the number of chunks per MPI process are essential parameters controlling the efficiency of dynamic load balancing. Furthermore, it has been well-recognized that the chunk size also affects the single-core performance because a smaller chunk makes it possible to use CPU cash more efficiently.

The motivation of this work is to evaluate the performance of a standard PIC code built on top of the newly developed framework. We evaluate the load-balancing performance as a function of chunk size for problems with highly-inhomogeneous density, such as magnetic reconnection. Flat-MPI and OpenMP-MPI hybrid parallelization strategies will be tested on Intel Xeon and Fujitsu A64FX architectures. SIMD optimization and efficient cash usage may also be discussed.

講演情報

[P-EM17] 宇宙プラズマ理論・シミュレーション

[PEM17-P13] Performance Evaluation of Load-balanced Particle-based Plasma Kinetic Simulations