Error Bounds for Approximate Value Iteration

Rémi Munos

Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = A T Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted L_p-norms (p>=1) of the approximation errors. The results extend usual analysis in L_infinity-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in L_p-norm, for p=1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.

Content Area: 13. Markov Decision Processes & Uncertainty

Subjects: 12.1 Reinforcement Learning; 9.3 Mathematical Foundations

Submitted: May 5, 2005

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.