JSAI2023

Presentation information

Organized Session

Organized Session » OS-27

[2Q1-OS-27a] 強化学習の新展開

Wed. Jun 7, 2023 9:00 AM - 10:40 AM Room Q (601)

オーガナイザ:太田 宏之、甲野 佑、高橋 達二

9:00 AM - 9:20 AM

[2Q1-OS-27a-01] Theory of internal reinforcement learning

Introduction of quality into values

〇Tatsuji Takahashi Takahashi1,2 (1. Tokyo Denki University, 2. RIKEN Center for Advanced Intelligence Project (AIP))

Keywords:bounded rationality, satisficing, computational rationality

The theory of reinforcement learning (RL) is organized around the optimality principle of maximizing rewards, based on Markov decision processes, dynamic programming, and Monte Carlo methods. In this paper, we propose a simple modification to the theoretical framework of RL onto rewards. We introduce some ``quality'' into rewards that are usually considered only quantitative (with the total order relation). The introduction is represented by transformation of rewards with an aspiration level (zero-level) that is a certain threshold for values. This little modification leads to another definition of subjective (or internal) regret, then an existing model of Risk-sensitive Satisficing (RS), representation of the rational risk attitudes and the qualitatively high performance of bounded (objective) regret. We review the theoretical framework of our natural (internal) RL, compare it with the artificial (external) RL and Simon's paradigm of bounded rationality and satisficing, answer frequently asked questions, and where we go from here. Specifically, the possibilities of modeling societies and economies are discussed.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password