<strong>Predicting the categorized Siberian wildfire intensities by machine learning</strong>

Teppei J Yasunari; Ichigaku Takigawa; Kyu-Myong Kim; Akira Takeshima

1:45 PM - 2:00 PM

[ACG52-13] Predicting the categorized Siberian wildfire intensities by machine learning

*Teppei J Yasunari¹, Ichigaku Takigawa^2,3, Kyu-Myong Kim⁴, Akira Takeshima⁵ (1.Arctic Research Center, Hokkaido University, Sapporo, Japan, 2.Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Japan, 3.Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan, 4.NASA Goddard Space Flight Center, MD, USA, 5.Center for Environmental Remote Sensing, Chiba University, Chiba, Japan)

Keywords:wildfire, forest fire, machine learning, Siberia, Arctic, NASA

In 2021, we presented the results of Siberian wildfire count prediction using the area-averaged NASA’s satellite and MERRA-2 data over the Republic of Sakha by machine learning (ML) method (Yasunari et al., 2021, JpGU Meeting 2021, https://bit.ly/3WtRTii). However, the accuracy of the predicted fire counts was not so excellent.

In this presentation, we changed the primary target (object variable) from the area-averaged fire counts to the categorized fire intensity data. However, we still use the fire count data for comparisons to confirm the improvement from the previous study. The Aqua and Terra MODIS averaged monthly mean fire pixel count (FPC; https://feer.gsfc.nasa.gov/) data over the Republic of Sakha were separated into seven fire intensity categories, from no fire to six levels of fire intensity. We used the Fisher-Jenks algorithm's natural breaks (https://github.com/mthh/jenkspy) to group the fire count data into the six fire levels. We use the same 170 explanatory variables again, considering lagged data for up to six months for each variable (1190 variables in total), the same as Yasunari et al. (2021). This time, we used two objective variable data (modfirec: area-averaged monthly mean fire count data; modfirec_gr: grouped fire intensity data from modfirec). Linear and ensemble models are used (linear models: Linear, Lasso, and Ridge; ensemble models: ExtraTreesRegressor, RandomForestRegressor, and GradientBoostingRegressor) in the scikit-learn library in Python (https://scikit-learn.org/stable/). In the time-series data from January 2003 and December 2019, we first used the data from July 2003 to December 2014 for training and from January 2015 to December 2019 for testing because we included the 6-month lagged data.

Although the performance on the fire count data (modfirec) was poor (the best R² = 0.47 and 0.59 in the linear and ensemble models: Lasso and RandomForestRegressor, respectively), which was primarily similar to Yasunari et al., 2021), the one on the categorized fire intensity (modfirec_gr) showed much better results (the best R² = 0.58 and 0.76 in the linear and ensemble models). In the former presentation (Yasunari et al., 2021), better results were only obtained when we removed small fire count data from the baseline fire count data (modfirec). However, although we did not remove those data, much better models’ performance was obtained this time for the categorized fire data. In any case, the ensemble models showed much better performance.

To improve the model performance, we further examined how many feature variables from the top contributions we should consider to obtain the best performance in the ensemble model. In this regard, we re-separated the time-series monthly mean data averaged over the Republic of Sakha into training data (13.5 years), validation data (1 year), and test data (2 years). We checked the model performance improvement, changing the number of feature variables from 2 to the top 15% of the total number of feature variables used in the training. As a result, overall, the ensemble models showed better performance (R²) for the validation data (RandomForestRegressor: 0.89; ExtraTreesRegressor: 0.89; GradientBoostingRegressor: 0.93). Then, we used the best selection of the features from each case and reconstructed the ML models again with the training data. Finally, those re-constructed models were used for the test data to obtain the final models’ performance. Using RandomForestRegressor with the 169 important feature variables (explanatory variables) showed the best performance (R² = 0.82). Overall, categorizing the fire count data into several fire intensity levels worked better rather than directly predicting the fire count data. We will explain more details on the day of the presentation.

Presentation information

[A-CG52] Science in the Arctic Region

[ACG52-13] Predicting the categorized Siberian wildfire intensities by machine learning