Proceedings:
Vol. 22 (2012): Twenty-Second International Conference on Automated Planning and Scheduling
Volume
Issue:
Vol. 22 (2012): Twenty-Second International Conference on Automated Planning and Scheduling
Track:
Full Technical Papers
Downloads:
Abstract:
This paper presents a decision-theoretic planning approach for probabilistic environments where the agent's goal is to win, which we model as maximizing the probability of being above a given reward threshold. In competitive domains, second is as good as last, and it is often desirable to take risks if one is in danger of losing, even if the risk does not pay off very often. Our algorithm maximizes the probability of being above a particular reward threshold by dynamically switching between a suite of policies, each of which encodes a different level of risk. This method does not explicitly encode time or reward into the state space, and decides when to switch between policies during each execution step. We compare a risk-neutral policy to switching among different risk-sensitive policies, and show that our approach improves the agent's probability of winning.
DOI:
10.1609/icaps.v22i1.13528
ICAPS
Vol. 22 (2012): Twenty-Second International Conference on Automated Planning and Scheduling