10:30 AM - 12:10 PM
[3Rin2-08] Learning Interpretable Control Policies with Decision Trees via the Cross-Entropy Method
Keywords:Reinforcement Learning, Decision Tree, Cross-Entropy Method, Control Policy
Learning interpretable policies for control problems is important for industrial requirements for safety and maintenance. A common approach to acquiring interpretable policies is to learn a decision tree that imitates a black-box (e.g., neural network-based) policy trained to maximize the expected reward in a given environment. However, such approximated decision tree policies are suboptimal in the sense that they do not necessarily maximize the expected reward. In this paper, we propose a method for learning a decision tree policy that directly maximizes the reward using the cross-entropy method. Our experimental results show that our method can acquire interpretable decision tree policies that perform better than baseline policies learned by the imitation approach.