Polynomial-Time Reinforcement Learning of Near-Optimal Policies

Karèn Pivazyan and Yoav Shoham, Stanford University

Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial complexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity - the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.