Japan Geoscience Union Meeting 2025

Presentation information

[J] Oral

S (Solid Earth Sciences ) » S-CG Complex & General

[S-CG60] Driving Solid Earth Science through Machine Learning

Mon. May 26, 2025 1:45 PM - 3:15 PM 105 (International Conference Hall, Makuhari Messe)

convener:Hisahiko Kubo(National Research Institute for Earth Science and Disaster Resilience), Makoto Naoi(Hokkaido University), Keisuke Yano(The Institute of Statistical Mathematics), Yusuke Tanaka(Geospatial Information Authority of Japan), Chairperson:Yusuke Tanaka(Tohoku University), Hisahiko Kubo(National Research Institute for Earth Science and Disaster Resilience)

1:45 PM - 2:00 PM

[SCG60-01] Pioneering the Use of Large Language Models in Earthquake Research

*Hisahiko Kubo1, Wu Stephen2, Masayuki Kano3, Shinya Katoh4, Atsuko Oana5, Tomohisa Okazaki6, Nozomi Okada7, Nobuki Kame4, Yuki Kodera8, Daisuke Sato9, Takahiro Shiina10, Kengo Shimojo8, Koji Tamaribuchi11, Makoto Naoi12, Ryuichi Nishiyama4, Kazuro Hirahara6,13, Takashi Miyamoto 14, Masumi Yamada15 (1.National Research Institute for Earth Science and Disaster Resilience, 2.The Institute of Statistical Mathematics, 3.Tohoku University, 4.Earthquake Research Institute, The University of Tokyo, 5.Institute of Technology, Shimizu Corporation, 6.Center for Advanced Intelligence Project, RIKEN , 7.Shizuoka University, 8.Meteorological Research Institute, 9.Japan Agency for Marine-Earth Science and Technology, 10.National Institute of Advanced Industrial Science and Technology, 11.Japan Meteorological Agency , 12. Hokkaido University , 13.Kagawa University, 14.University of Yamanashi, 15.Disaster Prevention Research Institute, Kyoto University)

Keywords:Large Language Model, Earthquake Research, Hackathon

Recent technological advances and service expansions in Large Language Models (LLMs) have been remarkable, leading to their widespread adoption in various aspects of our daily lives. LLMs are gradually being incorporated into research and education, but their direct application to research activities remains limited. Here, we organized a hackathon in the summer of 2024 to explore the use of LLMs in the field of earthquake research. It should be noted that the LLMs used in this hackathon were available in August 2024. A total of 18 participants participated in the following four challenges:

1. Feature extraction and explanation for simulation results
We used LLMs to automatically generate explanations on strong motion regions and site characteristics for the seismic intensity distribution maps of NIED J-SHIS. Although we focused on a simulation scenario at the Tachikawa Fault in this hackathon, we believe that LLMs could play a valuable role in providing comprehensive explanations for many earthquake scenarios. Additionally, we explored the introduction of natural language for function optimization, testing both an interactive approach using ChatGPT and a non-interactive approach using TextGrad (Hou et al., 2023). It is expected to be used for tasks such as automatic detection of outliners and provision of explanations, which have previously been done manually.

2. Automated analysis of earthquake-related social data
Using LLMs, we analyzed web news articles and their user comments related to the August 8, 2024, Hyuga-nada earthquake and Nankai Trough earthquake extra information, especially summarizing comments classified into specific categories, examining the relationship between article attitudes and comment tends, analyzing qualitative changes in comments, tracking temporal changes in public sentiment toward scientists, and conducting cluster analysis of comments. In tasks that summarized comments based on predefined analytical directions, LLMs provided results that closely matched our subjective impressions. We also attempted comment labeling and quantitative analysis with LLMs; however, the reliability of LLM-generated labels remained a challenge. We believe that LLMs are a useful tool for efficiently processing large amounts of social data and monitoring societal responses in real time.

3. Automated generation of answers for earthquakes and volcanoes
We attempted to generate answers to anticipated questions related to earthquakes and volcanoes using LLMs with retrieval-augmented generation (RAG) that read available PDF documents from institutions such as the Japan Meteorological Agency, the Cabinet Office, the Headquarters for Earthquake Research Promotion, and the Coordinating Committee for Prediction of Volcanic Eruptions. We compared the performance of NotebookLM (Google, 2024), a closed LLM, with local LLMs such as Llama 3.1 and Gemma2. NotebookLM could provide reasonable answers using only its intuitive interface, although hallucinations occasionally occurred. Answers from local LLMs were sometimes inaccurate, because the simple RAG system was used. We also found the importance of careful data organization and question formulation to improve the accuracy of LLM-generated answers.

4. Automated generation of simulation codes
To develop Python codes for earthquake cycle simulations in subduction zones and data assimilation, we tackled three tasks: (1) earthquake cycle simulation using a spring-block model, (2) data assimilation with a particle filter for the Lorenz 63 model, and (3) four-dimensional variational data assimilation to estimate frictional properties at plate boundaries. All tasks were ultimately completed, although manual intervention such as equation modification and code structures adjustment was required throughout the process. Our results demonstrated that LLMs significantly improved coding efficiency.

Through this hackathon, we were able to outline the direction for the use of LLMs in earthquake research. As LLM technology continues to evolve, their applications in research domains are expected to expand further.