5:15 PM - 7:15 PM
[SSS09-P02] Development of a high-speed parallel seismic wave propagation calculation code that runs on both the CPU and GPU
Keywords:Seismic wave propagation, Numerical simulation, GPGPU, Parallel Computing, Finite Difference Method
General-purpose computing on graphics processing units (GPGPU) is widely used in scientific and technical computing. However, conventional GPGPU requires a dedicated language, Compute Unified Device Architecture (CUDA), instead of C/C++ or Fortran, making implementation complex. Additionally, CUDA-based programs run only on GPUs. As many large-scale CPU-based supercomputers and clusters remain in use, maintaining numerical simulation codes that run on both CPUs and GPUs has been challenging. Recently, by adding a few OpenACC directives or using parallelism-guarantee language standards, it has become possible to retain CPU compatibility while leveraging GPGPU acceleration. This study develops a seismic wave propagation code that runs on both GPUs and CPUs by adapting an existing CPU-based algorithm.
This study examines algorithms extracted from the OpenSWPC community code for seismic wave propagation (Maeda et al., 2017). OpenSWPC solves the equations of motion and constitutive equations of viscoelastic bodies as multivariable partial differential equations using the memory variable method. It employs parallel finite difference methods on a uniform grid in both time and space. We used a version of OpenSWPC with its main computational kernel isolated and evaluated its efficiency after GPU adaptation.
We explored three methods for porting CPU code to GPUs. Method A replaces Fortran’s do loops with the do concurrent statement introduced in Fortran 2008. Method B adds minimal OpenACC directives to specify only GPU computation areas. Method C explicitly manages data transfers between CPU and GPU with detailed OpenACC directives.
We implemented these methods and evaluated their performance in CPU and GPU environments. On a single node, Method A was simplest but caused a considerable CPU slowdown. Method B achieved substantial GPU acceleration while keeping CPU code unchanged. Method C, with optimal tuning, outperformed Method B but required explicit data transfer management. Insufficient directive inclusion in Method C sometimes resulted in slower performance than Method B. In parallel execution, Methods A and B required writing data back to the CPU before communication, limiting GPU-aware MPI functionality and becoming a bottleneck. Thus, Method C proved the most suitable for large-scale execution. However, it was crucial to first assess Method B’s single-GPU acceleration before fine-tuning OpenACC directives to maintain or exceed that performance.
We evaluated our implementation on the Miyabi supercomputer at the University of Tokyo's Information Technology Center. Miyabi, operational since January 2025, features Intel Xeon Max 9480 CPUs and NVIDIA GH200 Superchip GPUs. The Method C implementation achieved an 8–10× speedup on GPUs over CPUs when using an equal number of computing units. High-parallel performance improvements were also observed on CPUs, but GPUs demonstrated even greater scalability, achieving significant acceleration with 256 GPUs.
By integrating these findings into OpenSWPC, we enable an easy-to-use numerical simulation code applicable from PC prototyping to CPU clusters and GPU supercomputers. Additionally, the CPU-GPU compatible code developed in this study will be released on GitHub on the day of the presentation. Our findings on GPU acceleration are expected to benefit various computational tasks in seismology.
This study examines algorithms extracted from the OpenSWPC community code for seismic wave propagation (Maeda et al., 2017). OpenSWPC solves the equations of motion and constitutive equations of viscoelastic bodies as multivariable partial differential equations using the memory variable method. It employs parallel finite difference methods on a uniform grid in both time and space. We used a version of OpenSWPC with its main computational kernel isolated and evaluated its efficiency after GPU adaptation.
We explored three methods for porting CPU code to GPUs. Method A replaces Fortran’s do loops with the do concurrent statement introduced in Fortran 2008. Method B adds minimal OpenACC directives to specify only GPU computation areas. Method C explicitly manages data transfers between CPU and GPU with detailed OpenACC directives.
We implemented these methods and evaluated their performance in CPU and GPU environments. On a single node, Method A was simplest but caused a considerable CPU slowdown. Method B achieved substantial GPU acceleration while keeping CPU code unchanged. Method C, with optimal tuning, outperformed Method B but required explicit data transfer management. Insufficient directive inclusion in Method C sometimes resulted in slower performance than Method B. In parallel execution, Methods A and B required writing data back to the CPU before communication, limiting GPU-aware MPI functionality and becoming a bottleneck. Thus, Method C proved the most suitable for large-scale execution. However, it was crucial to first assess Method B’s single-GPU acceleration before fine-tuning OpenACC directives to maintain or exceed that performance.
We evaluated our implementation on the Miyabi supercomputer at the University of Tokyo's Information Technology Center. Miyabi, operational since January 2025, features Intel Xeon Max 9480 CPUs and NVIDIA GH200 Superchip GPUs. The Method C implementation achieved an 8–10× speedup on GPUs over CPUs when using an equal number of computing units. High-parallel performance improvements were also observed on CPUs, but GPUs demonstrated even greater scalability, achieving significant acceleration with 256 GPUs.
By integrating these findings into OpenSWPC, we enable an easy-to-use numerical simulation code applicable from PC prototyping to CPU clusters and GPU supercomputers. Additionally, the CPU-GPU compatible code developed in this study will be released on GitHub on the day of the presentation. Our findings on GPU acceleration are expected to benefit various computational tasks in seismology.
