Japan Geoscience Union Meeting 2023

Presentation information

[J] Online Poster

S (Solid Earth Sciences ) » S-TT Technology & Techniques

[S-TT43] Creating future of solid Earth science with high performance computing (HPC)

Tue. May 23, 2023 10:45 AM - 12:15 PM Online Poster Zoom Room (17) (Online Poster)

convener:Takane Hori(Japan Agency for Marine-Earth Science and Technology), Yuji Yagi(Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba), Katsuhiko Shiomi(National Research Institute for Earth Science and Disaster Resilience), Takanori Matsuzawa(National Research Institute for Earth Science and Disaster Resilience)

On-site poster schedule(2023/5/22 17:15-18:45)

10:45 AM - 12:15 PM

[STT43-P01] Development and optimization of a numerical code for the simulation of slow slip events in GPU nodes

*Takanori Matsuzawa1 (1.National Research Institute for Earth Science and Disaster Resilience)

Keywords:Slow Slip Event, GPU, High Performance Computing

Earthquakes and slow earthquakes have a seismic moment of more than 10 orders of magnitude. Slip velocities also vary by more than 10 orders of magnitude, ranging from almost locked state to coseismic slip. In addition, typical slip duration of M3-class earthquake is about 0.1 s, while recurrence intervals of megathrust earthquakes sometimes reach to several years. To numerically reproduce Such hierarchical multiscale phenomena directly, large scale computing is essential. This is also one of the main targets in the research project, "Spatio-temporal multiscale modeling and forecast of slow and fast earthquakes" in the MEXT Grant-in-Aid for Transformative Research Areas (A), entitled “The Science of Slow-to-Fast Earthquakes”.
In recent years, reduction of power consumption and carbon offset is a significant problem, as known as the terms of "green computing" or "sustainable IT". For example, a super computer system, which mainly consists of GPU nodes, is going to be introduced in Information Technology Center, the University of Tokyo, as the replacement of Oakforest-PACS system. I am developing a numerical code of earthquake and slow earthquakes simulation for multi-GPU nodes, to execute calculations in a variety of environments.
This program aims to reproduce slow slip events (SSEs), which has the duration from 1 day to several years, in the time scale of seismic cycles of megathrust earthquakes. Plate interface is modeled by small triangular elements, at which frictional stress is given by the rate- and state-dependent friction law with cutoff velocities. Interaction between elements is given as the stress change, assuming quasi-static response of semi-infinite elastic medium. As the Green's function of stress change for unit slip can be obtained as an analytical solution, temporal evolution of slip velocity and stress is calculated by a boundary element method, adopting an adaptive time step Runge-Kutta method. A main bottleneck of this simulation is the evaluation of the product of a large dense matrix and a vector to calculate stress change.
To accelerate the calculation with GPU nodes, I adopted parallel computing using MPI and NVIDIA CUDA. The large matrix is divided and copied to GPU nodes at once at the initial stage of a calculation, as data transfer between GPU boards and main CPU boards are relatively slow, and the matrix is constant for all time steps. In the case of the model of the Nankai region, number of elements (N) is about 170,000. Therefore, the size of matrix is about 230GB. This program can use both of CPUs and GPUs at the same time, as total memory on GPUs is sometimes not sufficient to store the matrix. Fortunately, GPU-nodes of Wisteria/BDEC-01 (Aquarius nodes) in Information Technology Center, the University of Tokyo, have 8 GPUs (NVIDIA A100) per node, which have about 320GB memory. Therefore, the Nankai model can be executed in one node.
I evaluated the code on Wisteria/BDEC01. Computation at Aquarius nodes was about 16 times faster per node than the case of CPU nodes (Odyssey nodes). This means that the calculation per GPU board is two times faster than Odyssey nodes.
This program can change the work load ratio on CPUs and GPUs. The calculation speed was almost the same, when 0.4 % of calculation was executed on CPUs. However, the calculation speeds became slower than the case with more work load on CPUs. This suggests that hybrid computing of CPU and GPUs is not effective in this environment. I used a profiler and examined the optimization of the code. About 99% of computation time was used to evaluate matrix-vector product by cuBLAS. This suggests that my code is sufficiently optimized at the case of the Nankai model.