Proceedings:
Proceedings of the International Symposium on Combinatorial Search, 5
Volume
Issue:
Vol. 5 No. 1 (2012): Fifth Annual Symposium on Combinatorial Search
Track:
Extended Abstracts of Papers Presented Elsewhere
Downloads:
Abstract:
UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS),is based on UCB, a policy for the Multi-armed Bandit problem (MAB) thatminimizes the cumulative regret. However, search differs from MAB inthat in MCTS it is usually only the final ``arm pull''that collects a reward, rather than all ``arm pulls''.Therefore, it makes more sense to minimize the simple, rather thancumulative, regret. We introduce policies formulti-armed bandits with lower simpleregret than UCB and develop a two-stage scheme (SR+CR) for MCTSwhich outperforms UCT empirically. We also propose a samplingscheme based on value of information (VOI), achieving an algorithmthat empirically outperforms other proposed algorithms.
DOI:
10.1609/socs.v3i1.18221
SOCS
Vol. 5 No. 1 (2012): Fifth Annual Symposium on Combinatorial Search