AAAI Publications, Twenty-Fourth International FLAIRS Conference

Font Size: 
Hybrid Value Iteration for POMDPs
Diego Maniloff, Piotr Gmytrasiewicz

Last modified: 2011-03-21


The Partially Observable Markov Decision Process (POMDP) provides a rich mathematical model for designing agents that have to formulate plans under uncertainty. The curses of dimensionality and history associated with solving POMDPs have lead to numerous refinements of the value iteration algorithm. Several exact methods with different pruning strategies have been devised, yet, limited scalability has lead research to focus on ways to approximate the optimal value function. One set of approximations relies on point-based value iteration, which maintains a fixed-size value function, and is typically executed offline. Another set of approximations relies on tree search, which explores the implicit tree defined by the value iteration equation, and is typically executed online. In this paper we present a hybrid value iteration algorithm that combines the benefits of point-based value iteration and tree search. Using our approach, a hybrid agent executes tree search online, and occasionally updates its offline-computed lower bound on the optimal value function, resulting in improved lookahead and higher obtained reward, while meeting real-time constraints. Thus, unlike other hybrid algorithms that use an invariant value function computed offline, our proposed scheme uses information from the real-time tree search process to reason whether to perform a point-based backup online. Keeping track of partial results obtained during online planning makes the computation of point-based backups less prohibitive. We report preliminary results that support our approach.

Full Text: PDF