Published:
2018-02-08
Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 32
Volume
Issue:
Thirty-Second AAAI Conference on Artificial Intelligence 2018
Track:
AAAI Technical Track: Robotics
Downloads:
Abstract:
We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.
DOI:
10.1609/aaai.v32i1.12103
AAAI
Thirty-Second AAAI Conference on Artificial Intelligence 2018
ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.