11:00 〜 13:00
[STT41-P01] Optimization of the integrated tsunami analysis code JAGURS for Earth Simulator 4
キーワード:津波、スーパーコンピュータ、数値計算
Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has been developing an integrated tsunami analysis code, JAGURS. JAGURS code has various higher-order analysis functions that can be used to analyze nonlinear long wave theory used for tsunami hazard mapping, considers dispersion effects (Baba et al, 2021), seawater compressibility and elastic deformation of the earth's crust due to seawater loading (Baba et al., 2017), and submarine landslide tsunamis (Baba et al., 2019).
JAMSTEC has also been actively introducing high-performance computing hardware, and in FY2020 launched Earth Simulator 4 (ES4), a multi-architecture supercomputer based on AMD CPUs combined with NEC's Vector Engine and NVIDIA's A100 GPU. ES4 has a vector arithmetic performance ratio of about 4.8 times that of Earth Simulator 3 (ES3) per single core, and a theoretical computing performance of 19.5 PFLOPS for the entire system. Various scientific and technological codes have been optimized and utilized for the ES series, including JAGURS.
JAGURS is still being optimized and tuned for ES4, and is being used to understand large earthquakes using high-precision numerical tsunami analysis (e.g., Kusumoto et al., 2021).
In this presentation, we report the effect of optimization tuning for ES4 of JAGURS and the performance evolution from ES3.
Nonlinear long wave theory (Case 1) and nonlinear dispersive wave theory (Case 2) were selected as model case to be optimized for JAGURS code. For these codes, basic performance information was collected in ES3, and these codes were transplanted to ES4 system, and computations were run under similar boundary conditions to confirm stable operation. As a result, the computational performance ratio compared to ES3 was over 2x in Case 1 and less than 2x in Case 2. In addition, we collected information on the computational load of each subroutine to find it with the highest load. The subroutines with high computational load were then optimized for ES4. This optimization resulted in the computing performance ratio of about 3x in Case 1, suggesting that ES4 is effective in improving computing performance. On the other hand, in Case 2, the performance ratio was remained less than 2x, suggesting that the memory transfer performance had a significant impact on the computing performance improvement.
JAMSTEC has also been actively introducing high-performance computing hardware, and in FY2020 launched Earth Simulator 4 (ES4), a multi-architecture supercomputer based on AMD CPUs combined with NEC's Vector Engine and NVIDIA's A100 GPU. ES4 has a vector arithmetic performance ratio of about 4.8 times that of Earth Simulator 3 (ES3) per single core, and a theoretical computing performance of 19.5 PFLOPS for the entire system. Various scientific and technological codes have been optimized and utilized for the ES series, including JAGURS.
JAGURS is still being optimized and tuned for ES4, and is being used to understand large earthquakes using high-precision numerical tsunami analysis (e.g., Kusumoto et al., 2021).
In this presentation, we report the effect of optimization tuning for ES4 of JAGURS and the performance evolution from ES3.
Nonlinear long wave theory (Case 1) and nonlinear dispersive wave theory (Case 2) were selected as model case to be optimized for JAGURS code. For these codes, basic performance information was collected in ES3, and these codes were transplanted to ES4 system, and computations were run under similar boundary conditions to confirm stable operation. As a result, the computational performance ratio compared to ES3 was over 2x in Case 1 and less than 2x in Case 2. In addition, we collected information on the computational load of each subroutine to find it with the highest load. The subroutines with high computational load were then optimized for ES4. This optimization resulted in the computing performance ratio of about 3x in Case 1, suggesting that ES4 is effective in improving computing performance. On the other hand, in Case 2, the performance ratio was remained less than 2x, suggesting that the memory transfer performance had a significant impact on the computing performance improvement.