[HDS12-P04] Development of GPU Tsunami Simulator for Expanding Tsunami Scenario Bank
Keywords:Tsunami simulation, High Performance Computing, Tsunami Scenario Bank
However, current commonly used tsunami simulation technique needs 30 hours to finish 1 case of the calculation which simulates 6 hours from earthquake in 10km square area at 10m spatial resolution, which is needed for estimation, using 1 CPU core. Under this condition, we need vast amount of time and computation resources to execute a lot of tsunami simulations that cover various tsunami scenarios. Resource saving and speeding up of tsunami simulation is necessary for building tsunami scenario bank effectively and efficiently.
From this perspective, we are developing the simulator that can compute tsunami faster and at low cost using graphic processing unit (GPU). GPU’s massive parallel computation capability is now used for a lot of purpose. The area of tsunami simulation we focused is Sotobo coastal area Chiba prefecture. This area is divided to 13 regions whose minimum spatial resolution is 10m square and total number of spatial grids is 82 million, or 6.4 million grids on average for each region. We ported the commonly used tsunami simulation program which uses non-linear long wave theory and leap-flog scheme and considers run-up to GPU. And then, we tuned the ported program according to the characteristics of GPU’s way of execution, using memory, and so on.
Firstly, using concurrent kernel execution API (Application Programming Interface) of NVIDIA CUDA, we changed the program to be able to execute calculation asynchronously in the part where there is no need to synchronize each nested region. Or connecting data between nested regions with appropriate timing, computation of rougher grid region and that of finer grid region are executed simultaneously by 1 GPU. This can extract more GPU power and speed up the simulation.
Next, we reduced “if” branch to avoid performance degradation due to warp divergence. A lot of conditional branching exist in tsunami simulation program, such as upwind difference scheme, run-up boundary processing. CPU can process these if-else clauses in an efficient way, but GPU processes if-clause and else-clause separately and take time. So, we improved the program reducing if-else clause as far as possible.
Thirdly, we tuned global memory access. When we compute tsunami with GPU, we must use global memory to store 100 million grids of data. And the access to global memory consumes time. So, we changed the program to increase coalesced global memory access, avoid frequent access to certain variable by copying it to registers.
And various tunings are applied, such as reduction of divisional operation, replacement of power function to other functions, reduction of automatic cast from double to float, reduction of kernel calls.
By these improvements on the program, execution of tsunami simulation is accelerated to a large degree, and the simulator becomes to be able to finish the calculation of 1 region within 1 hour with 1 GPU. Furthermore, we enabled concurrent execution using multi GPUs and multi nodes, and created the environment to process massive tsunami simulations.