Published:
2018-02-08
Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 32
Volume
Issue:
Thirty-Second AAAI Conference on Artificial Intelligence 2018
Track:
AAAI Technical Track: Robotics
Downloads:
Abstract:
In many reinforcement learning problems, parameters of the model may vary with its phase while the agent attempts to learn through its interaction with the environment. For example, an autonomous car's reward on selecting a path may depend on traffic conditions at the time of the day or the transition dynamics of a drone may depend on the current wind direction. Many such processes exhibit a cyclic phase-structure and could be represented with a control policy parameterized over a circular or cyclic phase space. Attempting to model such phase variations with a standard data-driven approach (e.g. deep networks) without explicitly modeling the phase of the model can be challenging. Ambiguities may arise as the optimal action for a given state can vary depending on the phase. To better model cyclic environments, we propose phase-parameterized policies and value function approximators that explicitly enforce a cyclic structure to the policy or value space. We apply our phase-parameterized reinforcement learning approach to both feed-forward and recurrent deep networks in the context of trajectory optimization and locomotion problems. Our experiments show that our proposed approach has superior modeling performance than traditional function approximators in cyclic environments.
DOI:
10.1609/aaai.v32i1.12105
AAAI
Thirty-Second AAAI Conference on Artificial Intelligence 2018
ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.