Solving POMDPs Using Quadratically Constrained Linear Programs

Christopher Amato, Daniel S. Bernstein, Shlomo Zilberstein

Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One approach that effectively addresses the intractable memory requirements of POMDP algorithms is based on representing POMDP policies as finite-state controllers. In this paper, we illustrate some fundamental disadvantages of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs. Although QCLP optimization techniques guarantee only local optimality, the results we obtain using an existing optimization method show significant solution improvement over the state-of-the-art techniques. The results open up promising research directions for solving large POMDPs using nonlinear programming methods.

Subjects: 15.5 Decision Theory; 3.4 Probabilistic Reasoning

Submitted: Oct 15, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.