日本地球惑星科学連合2023年大会

講演情報

[J] オンラインポスター発表

セッション記号 P (宇宙惑星科学) » P-EM 太陽地球系科学・宇宙電磁気学・宇宙環境

[P-EM17] 宇宙プラズマ理論・シミュレーション

2023年5月23日(火) 09:00 〜 10:30 オンラインポスターZoom会場 (2) (オンラインポスター)

コンビーナ:天野 孝伸(東京大学 地球惑星科学専攻)、三宅 洋平(神戸大学大学院システム情報学研究科)、梅田 隆行(名古屋大学 宇宙地球環境研究所)、中村 匡(福井県立大学)

現地ポスター発表開催日時 (2023/5/22 17:15-18:45)

09:00 〜 10:30

[PEM17-P13] Performance Evaluation of Load-balanced Particle-based Plasma Kinetic Simulations

*天野 孝伸1松本 洋介2 (1.東京大学 地球惑星科学専攻、2.千葉大学 国際高等研究基幹)

キーワード:プラズマ、数値シミュレーション、粒子法

Plasma kinetic simulation has been an indispensable tool for laboratory, space, and astrophysical plasma modeling for a long time. To mitigate the inherently-large computational demand in solving Vlasov or Boltzmann equation in six-dimensional phase space, the standard Particle-In-Cell (PIC) scheme represents the phase space density using aggregation of computational particles. On the other hand, the electromagnetic field is defined on a mesh. The computational particles interact with the mesh quantities but move freely in continuous phase space across the mesh boundaries. The difference in the data structures between particles and fields makes it challenging to fully exploit the performance of modern supercomputers, which typically consist of a hierarchy of parallelism.

We have recently developed a general dynamic load-balancing framework for particle-based kinetic simulations in C++. It is based on decomposing the computational domain into small chunks [e.g., Germaschewski et al., 2016, Derouillat et al., 2018, Rowan et al., 2021]. The load balancing is performed by distributing the chunks into MPI ranks so that each rank's computational load becomes equal as much as possible. Therefore, the chunk size and the number of chunks per MPI process are essential parameters controlling the efficiency of dynamic load balancing. Furthermore, it has been well-recognized that the chunk size also affects the single-core performance because a smaller chunk makes it possible to use CPU cash more efficiently.

The motivation of this work is to evaluate the performance of a standard PIC code built on top of the newly developed framework. We evaluate the load-balancing performance as a function of chunk size for problems with highly-inhomogeneous density, such as magnetic reconnection. Flat-MPI and OpenMP-MPI hybrid parallelization strategies will be tested on Intel Xeon and Fujitsu A64FX architectures. SIMD optimization and efficient cash usage may also be discussed.