AAAI Publications, 2018 AAAI Spring Symposium Series

Font Size: 
State Abstraction Synthesis for Discrete Models of Continuous Domains
Jacob Menashe, Peter Stone

Last modified: 2018-03-15


Reinforcement Learning (RL) is a paradigm for enabling autonomous learningwherein rewards are used to influence an agent's action choices in various states. As the number of states and actions available to an agent increases,so it becomes increasingly difficult for the agent to quickly learn theoptimal action for any given state. One approach to mitigating the detrimentaleffects of large state spaces is to represent collections ofstates together as encompassing ``abstract states". State abstraction itself leads to a host of new challenges for an agent. Onesuch challenge is that of automatically identifying new abstractions thatbalance generality and specificity; the agent must identify both thesimilarities and the differences between states that are relevant to itsgoals, while ignoring unnecessary details that would otherwise hinder theagent's progress. We call this problem of identifying useful abstract statesthe Abstraction Synthesis Problem (ASP). State abstractions can provide a significant benefit to model-based agents bysimplifying their models. T-UCT, a hierarchical model-learning algorithmfor discrete, factored domains, is one such method that leverages stateabstractions to quickly learn and control an agent's environment. Suchabstractions play a pivotal role in the success of T-UCT; however, T-UCT'ssolution to ASP requires a fully discrete state space. In this work we develop and compare enhancements to T-UCT that relax itsassumption of discreteness. We focus on solving ASP in domains withmultidimensional, continuous state factors, using only the T-UCT agent'slimited experience histories and minimal knowledge of the domain's structure.Finally, we present a new abstraction synthesis algorithm, RCAST, and comparethis algorithm to existing approaches in the literature. We provide thealgorithmic details of RCAST and its subroutines, and we show that RCASToutperforms earlier approaches to ASP by enabling T-UCT to accumulatesignificantly greater total reward with minimal expert configuration andprocessing time.


Hierarchical Reinforcement Learning; Model-based Reinforcement Learning; State Space Abstractions for Reinforcement Learning; Bayesian Networks

Full Text: PDF