Abstract:
Bowling named two desiderata for multiagent learning algorithms: rationality and convergence. This paper introduces correlated-Q learning, a natural generalization of Nash-Q and FF-Q that satisfies these criteria. NashoQ satisfies rationality, but in general it does not converge. FF-Q satisfies convergence, but in general it is not rational. Correlated-Q satisfies rationality by construction. This papers demonstrates the empirical convergence of correlated-Q on a standard testbed of general-sum Markov games.