Abstract:
Recent domain-determinization techniques have been very successful in many probabilistic planning problems. We claim that traditional heuristic MDP algorithms have been unsuccessful due mostly to the lack of efficient heuristics in structured domains. Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor; yet, discounted optimization is required to solve problems with potential dead-ends. We propose a general extension of classical planning heuristics to goal-oriented discounted MDPs, in order to overcome this flaw. We apply our theoretical analysis to the well-known classical planning heuristics Hmax and Hadd, and prove that the extended Hmax is admissible. We plugged our extended heuristics to popular graph-based (Improved-LAO*, LRTDP, LDFS) and ADD-based (sLAO*, sRTDP) MDP algorithms: experimental evaluations highlight competitive results compared with the winners of previous competitions (FF-Replan, FPG, RFF), and show that our discounted heuristics solve more problems than non-discounted ones, with better criteria values. As for classical planning, the extended Hadd outperforms the extended Hmax on most problems.
DOI:
10.1609/aaai.v25i1.8016