11:00 AM - 1:00 PM
[MGI33-P01] Development of a fast spherical harmonic transformation library aimed at numerical simulations of Jovian atmospheres
Keywords:spherical harminics transformation, spectral method, Jovian atmosphere, banded structure, rotating convection
ISPACK3 was initially developed for Intel-based CPUs. One of the key points in the coding for fast conversion is the implementation of a new method (Ishioka 2018) for fast computation of the recurrence formula of the associated Legendre function. This method is also incorporated in other spherical harmonic transform drives for astronomical and planetary interior dynamics calculations. Another point is that the subroutines at the lowest level of the transformation are written by assembler in order to make the best use of the SIMD instructions of the CPU. By these efforts, the conversion calculation speed on Intel-based CPUs can be accelerated to about 60% of the peak performance.
Based on the know-how about Intel-based CPUs, we improve the subroutines for A64FX-based CPUs used in Fugaku. It is important to use SIMD instructions as much as possible for speed-up as well. We initially rewrote the lowest-level subroutines in the assembler as we do for Intel CPUs, however, we were not able to obtain the code that effectively uses SIMD instructions as expected. Instead, by rewriting the Fortran code and using the SIMD translation function of the Fortran compiler of the system, we succeeded in creating an executable binary that effectively uses the SIMD instruction. As a result, we were able to perform the conversion calculation at about 40% of the peak performance.
In order to present the performance of the developed transformation library, we tried a spherical harmonic transformation calculation with the world's largest degree of freedom at Fugaku. We performed the forward and inverse transforms with the maximum total wavenumber of 2^19-1=524287 on 1024 nodes of Fugaku with 1024 MPI parallelism and 48 thread parallelism, and achieved a speed of about 1300 TFlops (about 43% of the peak performance).
We are now implementing this library in our anelastic rotating spherical thermal convection model for atmospheres of Jovian type planets, and try to perform long-time simulations with high-resolution. In this presentation, some preliminary results will be shown.