AAAI Publications, Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Font Size: 
Classical Planning Algorithms on the Atari Video Games
Nir Lipovetzky, Miquel Ramírez, Hector Geffner

Last modified: 2015-04-01


The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature aknown initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, nocompact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra’s algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra’s algorithm they are“blind” and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces.The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2), . . . ,IW(k) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A simple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.


Atari Games, online planning

Full Text: PDF