Proceedings:
Book One
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 20
Track:
Markov Decision Processes and Uncertainty
Downloads:
Abstract:
Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = A T Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted L_p-norms (p>=1) of the approximation errors. The results extend usual analysis in L_infinity-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in L_p-norm, for p=1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 20