JSAI2025

Presentation information

General Session

General Session » GS-2 Machine learning

[2S5-GS-2] Machine learning:

Wed. May 28, 2025 3:40 PM - 5:20 PM Room S (Room 701-2)

座長:赤木 康紀(日本電信電話株式会社 人間情報研究所)

4:00 PM - 4:20 PM

[2S5-GS-2-02] Proposal of an Off-Policy Evaluation Method Considering the Entire Reward Distribution in Large Action Spaces.

〇Taishiro Takashi1, Yuta Sakai1, Masayuki Goto1 (1. Waseda University)

Keywords:CounterFactual Machine Learning, Off-Policy Evaluation, Recommender Systems, Off-Policy Evaluation based on the Conjunct Effect Model

In counterfactual machine learning, off-policy evaluation (OPE) aims to estimate the true performance of decision-making policies using logged data. Traditional estimators evaluate policy performance based solely on rewards directly induced by the policy. However, in real-world scenarios like e-commerce recommendation systems, users often take actions (e.g., purchases) outside the recommended list, leading to unaccounted rewards. To address this, estimators must evaluate performance beyond the recommended items.Existing methods struggle as action spaces grow, with accuracy deteriorating in large-scale environments. For example, e-commerce platforms may have action spaces ranging from thousands to millions of items, requiring robust methods to maintain accuracy. This study proposes a novel estimator extending the OffCEM framework to mitigate accuracy degradation, achieving high performance in large action spaces. Theoretical analysis and experiments show that the proposed method outperforms previous estimators, delivering enhanced accuracy in large-scale settings.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password