Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

Authors

Zahy Bnaya,Alon Palombo,Rami Puzis,Ariel Felner

New York University,Ben-Gurion University,Ben-Gurion University,Ben-Gurion University

Proceedings:

Proceedings of the International Symposium on Combinatorial Search, 8

Volume

Issue:

Vol. 8 No. 1 (2015): Eighth Annual Symposium on Combinatorial Search

Track:

Short Papers

Downloads:

Download PDF

Abstract:

Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS algorithms can use different strategies to aggregate these rewards and provide an estimation for the states’ values. The most common aggregation method is to store the mean reward of all simulations. Another common approach stores the best observed reward from each state. Both of these methods have complementary benefits and drawbacks. In this paper, we show that both of these methods are biased estimators for the real expected value of MDP states. We propose an hybrid approach that uses the best reward for states with low noise, and otherwise uses the mean. Experimental results on the Sailing MDP domain show that our method has a considerable advantage when the rewards are drawn from a noisy distribution.

DOI:

10.1609/socs.v6i1.18378

SOCS

Vol. 8 No. 1 (2015): Eighth Annual Symposium on Combinatorial Search

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.