Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming

Authors

Blai Bonet

Hector Geffner

Track:

All Papers

Downloads:

Abstract:

RTDP is a recent heuristic-search DP algorithm for solving non-deterministic planning problems with full observability. In relation to other dynamic programming methods, RTDP has two benefits: first, it does not have to evaluate the entire state space in order to deliver an optimal policy, and second, it can often deliver good policies pretty fast. On the other hand, RTDP final convergence is slow. In this paper we introduce a labeling scheme into RTDP that speeds up its convergence while retaining its good anytime behavior. The idea is to label a state s as solved when the heuristic values, and thus, the greedy policy defined by them, have converged over s and the states that can be reached from s with the greedy policy. While due to the presence of cycles, these labels cannot be computed in a recursive, bottom-up fashion in general, we show nonetheless that they can be computed quite fast, and that the overhead is compensated by the recomputations avoided. In addition, when the labeling procedure cannot label a state as solved, it improves the heuristic value of a relevant state. This results in the number of Labeled RTDP trials needed for convergence, unlike the number of RTDP trials, to be bounded. From a practical point of view, Labeled RTDP (LRTDP) converges orders of magnitude faster than RTDP, and faster also than another recent heuristic-search DP algorithm, LAO*. Moreover, LRTDP often converges faster than value iteration, even with the heuristic h = 0, thus suggesting that LRTDP has a quite general scope.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.