Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games

  • Zuozhi Yang Drexel University
  • Santiago Ontañón Drexel University


Monte Carlo Tree Search has been successfully applied to complex domains such as computer Go. However, despite its success in building game-playing agents, there is little understanding of general principles to design or learn its playout policy. Many systems, such as AlphaGo, use a policy optimized to mimic human expert as the playout policy. But are strong policies good playout policies? In this paper, we take a case study in real-time strategy games. We use bandit algorithms to optimize stochastic policies as both gameplay policies and playout policies for MCTS in the context of RTS games. Our results show that strong policies do not make the best playout policies, and that policies that maximize MCTS performance as playout policies are actually weak in terms of gameplay strength