Research on improving the effectiveness of data ubiquitous on the Internet<br />~Topic analysis of SNS information during disasters using BERT~

Ryoma Tachibana; Ryohei Suzuki; Masahiro Ooi; Kazuhiro Kikuma

5:15 PM - 7:15 PM

[STT41-P08] Research on improving the effectiveness of data ubiquitous on the Internet
~Topic analysis of SNS information during disasters using BERT~

*Ryoma Tachibana¹, Ryohei Suzuki¹, Masahiro Ooi², Kazuhiro Kikuma¹ (1.Nihon University College of Engineering, 2.National Research Institute for Earth Science and Disaster Resilience)

Keywords:Training Scenarios, allocation information, BERT

1. Introduction
We aim to enhance information value by aggregating vast internet data (sensor, activity logs, location, SNS) and extracting useful insights. As an example, we are developing an automated method for generating disaster training scenarios for earthquakes and tsunamis.

2. Proposed Method
2.1 Issues in Scenario Creation
Creating training scenarios for tabletop exercises requires a significant amount of time and necessitates creators with expertise in predicting "damage situations" and "social impacts" based on regional characteristics. Due to these challenges, regularly creating new training scenarios is difficult (Kubo et al., 2020).
To address this issue, our research proposes a method for automatically generating training scenarios using generative AI based on a disaster information database.
2.2 Information Collection and Topic Classification
SNS contains a wide variety of information, making it challenging to efficiently extract useful disaster-related information. Therefore, this paper proposes a method utilizing a BERT model to classify collected information by topic.
The proposed method consists of the following steps:
(1)Collecting SNS data
(2)Creating a training corpus
(3)Training classification using BERT
Figure 1 illustrates the overall flow of the proposed method. In this study, we focus on classifying posted data using BERT (highlighted in red). Ultimately, we aim to create a disaster information database that organizes classified data in chronological order, enabling an understanding of human movements and disaster progression.

3. Experiment and Evaluation
The proposed method was applied to the Noto Peninsula earthquake that occurred on January 1, 2024. For SNS data collection, we gathered 2,500 posts containing at least one of the keywords: "earthquake," "Noto Peninsula earthquake," "tsunami," "collapse," "landslide," or "disaster," from January 1 to January 8, 2024.
We manually assigned topics to these posts to create a training corpus. The topics were determined subjectively by reading the content of the posts, resulting in ten different topics. We then trained the model using labeled data and evaluated its classification accuracy using test data. The evaluation method involved measuring topic-wise accuracy to assess performance.

4. Results and Discussion
Training with 1,999 samples, we tested 501 cases, achieving 406 correct predictions. Table 1 shows accuracy per topic.
Topics like "opinions (86.6%)" and "prayers (94.5%)" had high accuracy due to distinct words and patterns easily learned by the model. The "earthquake" topic (80.6%) included structured phrases like "Seismic intensity 6 earthquake in [location]," allowing for higher accuracy. This suggests that the model recognized "seismic intensity" as a key term in classifying earthquake-related posts. However, topics such as "support (27.2%)" and "landslide (0.0%)" performed poorly due to ambiguous contexts, similar vocabulary across topics, and limited training data.
Figure 2 illustrates some misclassified cases.
The "damage" topic included words like "tsunami," "fire," and "collapse," making classification difficult. Similarly, the "landslide" topic was sometimes confused with general weather reports, making accurate classification challenging.
These results indicate the need for expanded training data, better distinguishing keywords, and improved context recognition to enhance classification performance.

5. Conclusion
We proposed a BERT-based method for classifying SNS disaster data, essential for automatic training scenario generation. Some topics had low accuracy, requiring data balancing and better feature extraction. Misinformation filtering is also needed.
Future work includes enhancing classification accuracy, structuring data chronologically, and refining automatic scenario generation with generative AI.

Presentation information

[S-TT41] Seismic monitoring and processing system

[STT41-P08] Research on improving the effectiveness of data ubiquitous on the Internet
~Topic analysis of SNS information during disasters using BERT~

Presentation information

[S-TT41] Seismic monitoring and processing system

[STT41-P08] Research on improving the effectiveness of data ubiquitous on the Internet~Topic analysis of SNS information during disasters using BERT~

[STT41-P08] Research on improving the effectiveness of data ubiquitous on the Internet
~Topic analysis of SNS information during disasters using BERT~