Targeting Specific Distributions of Trajectories in MDPs

David L. Roberts, Mark J. Nelson, Charles L. Isbell, Jr., Michael Mateas, Michael L. Littman

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

Subjects: 12. Machine Learning and Discovery

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.