Reward-oriented environment inference on reinforcement learning

Kazuki Takahashi

10:40 AM - 11:00 AM

[4E1-GS-2-03] Reward-oriented environment inference on reinforcement learning

〇Kazuki Takahashi¹, Tomoki Fukai², Yutaka Sakai³, Takashi Takekawa¹ (1. Kogakuin University, 2. Okinawa Institute of Science and Technology Graduate University, 3. Tamagawa University Brain Science Institute)

Keywords:reinforcement learning, variational bayes, dimensionality reduction

The development of deep neural networks has made it possible to achieve performance that exceeds human performance in simulation reinforcement learning problems. However, for real-world problems, issues such as explainability and online learning remain. Because real-world environments include reward-independent observables, the apparent pattern of observables becomes so large that it is difficult to explain AI's operating principles. In addition, achieving high performance requires a large amount of training data, making online learning difficult. Therefore, in this study, we attempt online policy learning in an environment that generates a huge number of patterns of observables by combining reward-dependent and reward-independent environments. The proposed learning method consists of action decisions that control exploration and exploitation by sampling and reward-oriented environment inference that reduces the number of observable patterns to a concise state. As a result, the reward-oriented environment inference model recovers the reward-dependent environment from a large number of observable patterns. Furthermore, the combination of the proposed model and the action decision improved the learning speed of the optimal policy.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4E1-GS-2] Machine learning: agents

[4E1-GS-2-03] Reward-oriented environment inference on reinforcement learning

Password