3:00 PM - 3:20 PM
[4D3-GS-2-04] A Contextual Bandit Algorithm Method Using Decision Trees and UCB
Keywords:Contextual Bandit, Upper Confidence Bound, Decision Tree, Recommendation System, Bootstrapping
Contextual Bandit Algorithm, an online recommendation system, makes sequential recommendations based on contextual information such as user attributes and past purchase history, and estimates rewards that indicate whether or not a product will be purchased. In doing so, the method balances exploration, which recommends various products, and utilization, which recommends products with high expected rewards, to maximize the accumulated reward. To accurately estimate rewards from context, it is essential to assume a model suitable for the situation, but often a nonlinear relationship between context and reward is observed. TreeBootstrap, which uses decision trees to estimate rewards, has been proposed as a method suitable for such situations. However, since TreeBootstrap balances search and utilization by bootstrapping the training data, it may not be able to use the information obtained from search sufficiently, especially in utilization, and the cumulative reward may not be sufficient. In this study, we propose TreeUCB, which balances exploration and exploitation by using a confidence upper bound of expected reward, instead of bootstrap sampling of training data. We demonstrate the effectiveness of the proposed method through experiments using artificial and real data.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.