[4Xin2-68] Posterior Tracking Algorithm for Multi Objective Thresholding Bandits
Keywords:multi-armed bandits, multi objective optimization, pure exploration problem
Multi-objective thresholding bandits aim to identify all the good arms by repeatedly selecting one arm from given set of K arms at each time to observe multi-dimensional rewards. Here, an arm is said to be good if its expected reward of each dimension is no less than its specified threshold of the dimension. In fixed confidence setting, we show the optimal allocation of each arm drawn which achieves asymptotic lower bound in this problem, and present the expression of generalized likelihood ratio statistics used for the stopping condition. We apply them and the algorithm, named P-Tracking, based on posterior sampling to this problem. We verify the effectiveness of P-Tracking by using artificial data. Through experimental comparison against C-Tracking and D-Tracking, which conduct fixing the expected reward estimation by forced exploration in stead of posterior sampling to explore for correct answer search, and naive multi-dimensional extension of HDoC, which is effective in one-dimensional reward thresholding bandits, we show that P-Tracking identifies all good arms from averagely fewer samples.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.