[3Xin4-63] Analysis of offline model-free reinfocement learning based recommender systems
Keywords:Recommender System, Model-free Reinforcement Learning, Offline Reinforcement Learning, Deep Learning
In recommendation systems, offline reinforcement learning is expected to learn new recommendation policies only from the log data collected during system operation in order to maximize the long-term user experience. However, there is a challenge, called distribution shift, in which the newly learned recommendation policy by using the collected log data deviates from the initial distribution of the recommendation policy. In this study, we classify the model-free offline reinforcement learning methods proposed to address the distribution shift into three categories,including Supervised Regularization (SR), Batch Regularizatio (BR), and Uncertainty Regularization (UR), and compare each method when applied to a recommendation system. In the evaluation experiment, we compared the recommendation accuracy of clicks and purchases using a dataset for session-based recommendation systems, and evaluated them based on cumulative rewards through simulation. According to the results, we confirmed that the more strongly constraint approaches the distribution of the dataset, the worse the performance is compared to the baseline GRU4Rec.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.