AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence

Font Size: 
OFFER: Off-Environment Reinforcement Learning
Kamil Andrzej Ciosek, Shimon Whiteson

Last modified: 2017-02-13


Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.


Markov Decision Process; Policy Gradient; Variance Reduction; Actor-Critic; REINFORCE

Full Text: PDF