Proceedings:
Book One
Volume
Issue:
Proceedings of the International Conference on Automated Planning and Scheduling, 27
Track:
Main Track
Downloads:
Abstract:
Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP — a state-of-the-art offline solver — to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source—a challenging real-world problem.
DOI:
10.1609/icaps.v27i1.13832
ICAPS
Proceedings of the International Conference on Automated Planning and Scheduling, 27