Learning to Plan Probabilistically

Ron Sun and Chad Sessions

This paper discusses the learning of probabilistic planning without a priori domain-specific knowledge. Different from existing reinforcement learning algorithms that generate only reactive policies and existing probabilistic planning algorithms that requires a substantial amount of a priori knowledge in order to plan, we devise a two-stage bottom-up learning-to-plan process, in which first reinforcement learning/dynamic programming is applied, without the use of a priori domainspecific knowledge, to acquire a reactive policy and then explicit plans are extracted from the learned reactive policy. Plan extraction is based on a beam search algorithm that performs temporal projection in a restricted fashion guided by the value functions resulting from reinforcement learning/dynamic programming.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.