Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games

Zuozhi Yang; Santiago Ontañón

doi:10.1609/aiide.v16i1.7423

Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games

Authors

Zuozhi Yang Drexel University
Santiago Ontañón Drexel University

DOI:

https://doi.org/10.1609/aiide.v16i1.7423

Abstract

Monte Carlo Tree Search has been successfully applied to complex domains such as computer Go. However, despite its success in building game-playing agents, there is little understanding of general principles to design or learn its playout policy. Many systems, such as AlphaGo, use a policy optimized to mimic human expert as the playout policy. But are strong policies good playout policies? In this paper, we take a case study in real-time strategy games. We use bandit algorithms to optimize stochastic policies as both gameplay policies and playout policies for MCTS in the context of RTS games. Our results show that strong policies do not make the best playout policies, and that policies that maximize MCTS performance as playout policies are actually weak in terms of gameplay strength

Downloads

Published

2020-10-01

How to Cite

Yang, Z., & Ontañón, S. (2020). Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 16(1), 144-150. https://doi.org/10.1609/aiide.v16i1.7423

Download Citation

Issue

Vol. 16 No. 1 (2020): Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

Section

Full Oral Papers

Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information