09:00 〜 10:30
[PEM17-P13] Performance Evaluation of Load-balanced Particle-based Plasma Kinetic Simulations
キーワード:プラズマ、数値シミュレーション、粒子法
Plasma kinetic simulation has been an indispensable tool for laboratory, space, and astrophysical plasma modeling for a long time. To mitigate the inherently-large computational demand in solving Vlasov or Boltzmann equation in six-dimensional phase space, the standard Particle-In-Cell (PIC) scheme represents the phase space density using aggregation of computational particles. On the other hand, the electromagnetic field is defined on a mesh. The computational particles interact with the mesh quantities but move freely in continuous phase space across the mesh boundaries. The difference in the data structures between particles and fields makes it challenging to fully exploit the performance of modern supercomputers, which typically consist of a hierarchy of parallelism.
We have recently developed a general dynamic load-balancing framework for particle-based kinetic simulations in C++. It is based on decomposing the computational domain into small chunks [e.g., Germaschewski et al., 2016, Derouillat et al., 2018, Rowan et al., 2021]. The load balancing is performed by distributing the chunks into MPI ranks so that each rank's computational load becomes equal as much as possible. Therefore, the chunk size and the number of chunks per MPI process are essential parameters controlling the efficiency of dynamic load balancing. Furthermore, it has been well-recognized that the chunk size also affects the single-core performance because a smaller chunk makes it possible to use CPU cash more efficiently.
The motivation of this work is to evaluate the performance of a standard PIC code built on top of the newly developed framework. We evaluate the load-balancing performance as a function of chunk size for problems with highly-inhomogeneous density, such as magnetic reconnection. Flat-MPI and OpenMP-MPI hybrid parallelization strategies will be tested on Intel Xeon and Fujitsu A64FX architectures. SIMD optimization and efficient cash usage may also be discussed.
We have recently developed a general dynamic load-balancing framework for particle-based kinetic simulations in C++. It is based on decomposing the computational domain into small chunks [e.g., Germaschewski et al., 2016, Derouillat et al., 2018, Rowan et al., 2021]. The load balancing is performed by distributing the chunks into MPI ranks so that each rank's computational load becomes equal as much as possible. Therefore, the chunk size and the number of chunks per MPI process are essential parameters controlling the efficiency of dynamic load balancing. Furthermore, it has been well-recognized that the chunk size also affects the single-core performance because a smaller chunk makes it possible to use CPU cash more efficiently.
The motivation of this work is to evaluate the performance of a standard PIC code built on top of the newly developed framework. We evaluate the load-balancing performance as a function of chunk size for problems with highly-inhomogeneous density, such as magnetic reconnection. Flat-MPI and OpenMP-MPI hybrid parallelization strategies will be tested on Intel Xeon and Fujitsu A64FX architectures. SIMD optimization and efficient cash usage may also be discussed.