Verifiable Reinforcement Learning Framework by Incremental Estimation of Environmental Model

Kento Nagata

6:50 PM - 7:10 PM

[2E6-GS-8-05] Verifiable Reinforcement Learning Framework by Incremental Estimation of Environmental Model

~Achieving Safe AI via Understanding Acquired Policy~

〇Kento Nagata¹, Sachiyo Arai¹ (1. Chiba University)

Keywords:Reinforcement Learning, Model Estimation, Incremental Estimation , Safe AI

Control tasks for automobiles, plants, and other applications generally involve the introduction of control theory, in which the environment is described by a mathematical model. Although such models are highly readable, reliable, and guaranteed to be stable, they are often constructed using first-principles modeling such as equations of motion, which limits their application to tasks with nonlinearities and instability, such as autonomous flying drones. In contrast, reinforcement learning is being applied to real-world applications because it can provide control strategies without the need for environmental models. However, since the strategies are only weights of neural networks, it is difficult to guarantee their rationality and stability. Therefore, we propose a method for explicitly estimating environmental models by utilizing the trajectories of actions and states obtained in the process of trial-and-error reinforcement learning, with the aim of obtaining the interpretation and stability of deep reinforcement learning strategies. The results show that the model is estimated as a linear model by applying system identification, but the analysis and methods need to be improved in order to construct a more interpretable model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2E6-GS-8] Robot and real worlds:

[2E6-GS-8-05] Verifiable Reinforcement Learning Framework by Incremental Estimation of Environmental Model

~Achieving Safe AI via Understanding Acquired Policy~

Password