<strong>Improving Data Assimilation Using Machine Learning: Insights from the Lorenz-96 Model and Ensemble Kalman Filter</strong>

Namal Rathnayake; Masashi Minamide

9:45 AM - 10:00 AM

[ATT30-09] Improving Data Assimilation Using Machine Learning: Insights from the Lorenz-96 Model and Ensemble Kalman Filter

*Namal Rathnayake¹, Masashi Minamide¹ (1.The University of Tokyo)

Keywords:Chaotic systems, Data assimilation, Observation bias, Ensemble Kalman Filter, Machine learning

Forecasting the future states of chaotic systems poses significant challenges due to their sensitivity to initial conditions and complex dynamics. Accurate predictions are crucial for numerous applications, but limitations in data assimilation, such as the nonlinearity of observation operators and observation bias, hinder the ability to constrain initial and boundary conditions, leading to restricted accuracy. Observation bias can stem from various sources, including measurement errors, model assumptions, lack of representativeness, and environmental factors. Neglecting bias in data assimilation can result in inaccurate predictions and unreliable forecasts, prompting extensive research into estimation techniques for observation bias, such as the variational bias correction method. However, fully removing observation bias, especially for observations with strong nonlinearity like cloud-affected satellite radiances, remains challenging. Therefore, understanding and mitigating the impact of bias is essential for enhancing the performance of forecasting models in chaotic systems.
In this study, we conducted an Observing System Simulation Experiment (OSSE) to assess the impact of bias on data assimilation performance in forecasting chaotic systems. Specifically, we used the Lorenz-96 model, which simulates chaotic behavior akin to atmospheric dynamics, in conjunction with the Ensemble Kalman Filter (EnKF). To create observations (Xo), we introduced an error consisting of Gaussian noise and a bias to actual states (Xt). Utilizing the EnKF, we generated analysis data (Xa) by incorporating Xo and initial forecast data. We iterated the OSSE by inputting Xa into the Lorenz-96 model to produce forecasts (Xf) beyond the initial state. The results from this EnKF experiment served as input and a benchmark for the subsequent machine learning experiment.
The datasets were organized to facilitate analysis using machine learning (ML) techniques. We divided the data into training and testing sets to evaluate the models' performance. This study employed 15 ML algorithms, including linear regression, tree regression, support vector machines, ensemble methods, and neural networks. For training, we used Xf and Xo as inputs and Xa, which comprised the past consecutive time series data. Xt was used as the testing target. To gauge the models' reliability and accuracy, we assessed them using R-squared and root mean square error (RMSE).
The study found that data assimilation accuracy and model fit varied across different bias values (0, 1, 2, 3, 5, 7). A lower root means square error (RMSE) values indicated better agreement, while higher R-squared scores indicated stronger relationships between predictors and targets. Although RMSE values were similar among the evaluated models, R-squared scores varied. The Ensemble Kalman Filter (EnKF) performed best in terms of R-squared at bias values 0 and 1, while the Ensemble bagged tree model outperformed all other models at bias values 3, 5, and 7. The EnKF consistently had the lowest R-squared scores as the observation bias increased. Conversely, machine learning (ML) models surpassed the EnKF, notably at higher observation bias. This improvement is likely due to ML models' ability to use more inputs during training, including past data from both forecasts and observations, which enables them to capture hidden dynamics from the temporal series of data that may not be well extracted by the linear-based EnKF. As a result, ML models could identify complex patterns and relationships in the data, leading to a more accurate estimation of the present state.
In chaotic systems where slight variations in initial conditions lead to significant differences in future states, the richer input space available to ML models gives them an advantage over EnKF. In conclusion, the superior performance of ML models compared to EnKF is particularly notable for chaotic systems with higher observation bias.

Presentation information

[A-TT30] Machine Learning Techniques in Weather, Climate, Ocean, Hydrology and Disease Predictions

[ATT30-09] Improving Data Assimilation Using Machine Learning: Insights from the Lorenz-96 Model and Ensemble Kalman Filter