Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved