10:00 AM - 10:20 AM
[1N1-GS-5-01] Optimization of subjective utility to derive cooperative actions in a prisoner's dilemma environment
Keywords:social dilemma, Instrinstac reward, reward shaping
In a society, there exists a situation called social dilemma where either the individual interest or the public interest must be given priority. It’s known that human doesn’t always prioritize individual interests in such situations. Whereas, reinforcement learning agents maximize individual rewards because their goal is to maximize rewards, which isn’t convenient in a social dilemma. To solve this problem, a method to derive the utility from the reward by evolutionary computation and apply the utility to reinforcement learning was proposed, which leads to cooperative behavior in a two-prisoner’s dilemma game, one of the models of social dilemmas. However, in this method, the form of the utility-deriving function is fixed and only the coefficients are evolved, so it’s not clear what kind of function is suitable. Therefore, in this study, in order to optimize the function itself, we use a method to obtain its weights by evolutionary computation using a three-layer perceptron that can represent arbitrary function, and investigate whether mutual cooperation occurs and the utility-deriving function. Simulation experiments show that, regardless of the number of neurons in the middle layer, the evolved versatile functions will satisfy a specific relation and generate mutual cooperation in a two-prisoner’s dilemma game.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.