Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Shihui Li; Yi Wu; Xinyue Cui; Honghua Dong; Fei Fang; Stuart Russell

doi:10.1609/aaai.v33i01.33014213

Authors

Shihui Li Carnegie Mellon
Yi Wu University of California, Berkeley
Xinyue Cui Tsinghua University
Honghua Dong Tsinghua University
Fei Fang Carnegie Mellon
Stuart Russell University of California, Berkeley

DOI:

https://doi.org/10.1609/aaai.v33i01.33014213

Abstract

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription