Learning Interpretable Control Policies with Decision Trees via the Cross-Entropy Method

Yukiko Tanaka

10:30 AM - 12:10 PM

[3Rin2-08] Learning Interpretable Control Policies with Decision Trees via the Cross-Entropy Method

〇Yukiko Tanaka^1,2, Takuya Hiraoka^1,2, Yoshimasa Tsuruoka^2,3 (1. NEC, 2. National Institute of Advanced Industrial Science and Technology, 3. The University of Tokyo)

Keywords:Reinforcement Learning, Decision Tree, Cross-Entropy Method, Control Policy

Learning interpretable policies for control problems is important for industrial requirements for safety and maintenance. A common approach to acquiring interpretable policies is to learn a decision tree that imitates a black-box (e.g., neural network-based) policy trained to maximize the expected reward in a given environment. However, such approximated decision tree policies are suboptimal in the sense that they do not necessarily maximize the expected reward. In this paper, we propose a method for learning a decision tree policy that directly maximizes the reward using the cross-entropy method. Our experimental results show that our method can acquire interpretable decision tree policies that perform better than baseline policies learned by the imitation approach.

Presentation information

[3Rin2] Interactive Session 1

[3Rin2-08] Learning Interpretable Control Policies with Decision Trees via the Cross-Entropy Method