Proceedings:
Proceedings of the Twentieth International Conference on Machine Learning
Volume
Issue:
Proceedings of the Twentieth International Conference on Machine Learning
Track:
Contents
Downloads:
Abstract:
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problems -- from a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by successive policy improvement steps as a function of the maximum norm of value determination errors during policy evaluation steps. Unfortunately, such results have limited practical range since most function approximators (such as linear regression) select the best fit in a given class of parameterized functions by minimizing some (weighted) quadratic norm. In this paper, we provide error bounds for Approximate Policy Iteration using quadratic norms, and illustrate those results in the case of feature-based linear function approximation.
ICML
Proceedings of the Twentieth International Conference on Machine Learning