Rewarding Behaviors

Authors

Fahiem Bacchus

Craig Boutilier

Adam Grove

Proceedings:

Proceedings of the AAAI Conference on Artificial Intelligence, 13

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 13

Track:

Handling Uncertainty

Downloads:

Download PDF

Abstract:

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the well-developed, expressive theory that includes effective solution techniques. But the Markov assumption - that dynamics and rewards depend on the current state only, and not on history - is often inappropriate. This is especially true of rewards: we frequently wish to associate rewards with behaviors that extend over time. Of course, such reward processes can be encoded in an MDP should we have a rich enough state space (where states encode enough history). However it is often difficult to "hand craft" suitable state spaces that encode an appropriate amount of history. We consider this problem in the case where non-Markovian rewards are encoded by assigning values to formulas of a temporal logic. These formulas characterize the value of temporally extended behaviors. We argue that this allows a natural representation of many commonly encountered non-Markovian rewards. The main result is an algorithm which, given a decision process with non-Markovian rewards expressed in this manner, automatically constructs an equivalent MDP (with Markovian reward structure), allowing optimal policy construction using standard techniques.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 13

ISBN 978-0-262-51091-2

August 4-8, 1996, Portland, Oregon. Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.