Efficient Exploration for Optimizing Immediate Reward

Authors

Dale Schuurmans, University of Waterloo and Lloyd Greenwald, Drexel University

Proceedings:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Track:

Learning

Downloads:

Download PDF

Abstract:

We consider the problem of learning an effective behavior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue. We investigate the inherent data-complexity of behavior-learning when the goal is simply to optimize immediate reward. Although easier than reinforcement learning where one must also cope with state dynamics, immediate reward learning is still a common problem and is fundamentally harder than supervised learning. For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possible reward models, or a bias on the space of possible controllers. We investigate the two paradigmatic learning approaches of indirect (reward-model) learning and direct-control learning, and show that neither uniformly dominates the other in general. Model-based learning has the advantage of generalizing reward experiences across states and actions, but direct-control learning has the advantage of focusing only on potentially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advantageous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning efficiency.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 16

ISBN 978-0-262-51106-3

July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.