9:45 AM - 10:00 AM
[ATT30-09] Improving Data Assimilation Using Machine Learning: Insights from the Lorenz-96 Model and Ensemble Kalman Filter
Keywords:Chaotic systems, Data assimilation, Observation bias, Ensemble Kalman Filter, Machine learning
In this study, we conducted an Observing System Simulation Experiment (OSSE) to assess the impact of bias on data assimilation performance in forecasting chaotic systems. Specifically, we used the Lorenz-96 model, which simulates chaotic behavior akin to atmospheric dynamics, in conjunction with the Ensemble Kalman Filter (EnKF). To create observations (Xo), we introduced an error consisting of Gaussian noise and a bias to actual states (Xt). Utilizing the EnKF, we generated analysis data (Xa) by incorporating Xo and initial forecast data. We iterated the OSSE by inputting Xa into the Lorenz-96 model to produce forecasts (Xf) beyond the initial state. The results from this EnKF experiment served as input and a benchmark for the subsequent machine learning experiment.
The datasets were organized to facilitate analysis using machine learning (ML) techniques. We divided the data into training and testing sets to evaluate the models' performance. This study employed 15 ML algorithms, including linear regression, tree regression, support vector machines, ensemble methods, and neural networks. For training, we used Xf and Xo as inputs and Xa, which comprised the past consecutive time series data. Xt was used as the testing target. To gauge the models' reliability and accuracy, we assessed them using R-squared and root mean square error (RMSE).
The study found that data assimilation accuracy and model fit varied across different bias values (0, 1, 2, 3, 5, 7). A lower root means square error (RMSE) values indicated better agreement, while higher R-squared scores indicated stronger relationships between predictors and targets. Although RMSE values were similar among the evaluated models, R-squared scores varied. The Ensemble Kalman Filter (EnKF) performed best in terms of R-squared at bias values 0 and 1, while the Ensemble bagged tree model outperformed all other models at bias values 3, 5, and 7. The EnKF consistently had the lowest R-squared scores as the observation bias increased. Conversely, machine learning (ML) models surpassed the EnKF, notably at higher observation bias. This improvement is likely due to ML models' ability to use more inputs during training, including past data from both forecasts and observations, which enables them to capture hidden dynamics from the temporal series of data that may not be well extracted by the linear-based EnKF. As a result, ML models could identify complex patterns and relationships in the data, leading to a more accurate estimation of the present state.
In chaotic systems where slight variations in initial conditions lead to significant differences in future states, the richer input space available to ML models gives them an advantage over EnKF. In conclusion, the superior performance of ML models compared to EnKF is particularly notable for chaotic systems with higher observation bias.