Abstraction-Guided Policy Recovery from Expert Demonstrations

Authors

Canmanie T. Ponnambalam

Delft University of Technology

Frans A. Oliehoek

Delft University of Technology

Matthijs T. J. Spaan

Delft University of Technology

Proceedings:

Book One

Volume

Issue:

Proceedings of the International Conference on Automated Planning and Scheduling, 31

Track:

Special Track on Planning and Learning

Downloads:

Download PDF

Abstract:

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail.

DOI:

10.1609/icaps.v31i1.16004

ICAPS

Proceedings of the International Conference on Automated Planning and Scheduling, 31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.