* Sridhar Mahadevan*

Most work on value function approximation adheres to Samuel’s original design: agents learn a task-specific value function using parameter estimation, where the approximation architecture (e.g, polynomials) is specified by a human designer. This paper proposes a novel framework generalizing Samuel’s paradigm using a coordinate-free approach to value function approximation. Agents learn both representations and value functions by constructing geometrically customized task-independent basis functions that form an orthonormal set for the Hilbert space of smooth functions on the underlying state space manifold. The approach rests on a technical result showing that the space of smooth functions on a (compact) Riemanian manifold has a discrete spectrum associated with the Laplace-Beltrami operator. In the discrete setting, spectral analysis of the graph Laplacian yields a set of geometrically customized basis functions for approximating and decomposing value functions. The proposed framework generalizes Samuel’s value function approximation paradigm by combining it with a formalization of Saul Amarel’s paradigm of representation learning through global state space analysis.

*Content Area: * 13. Markov Decision Processes & Uncertainty

*Subjects: *12.1 Reinforcement Learning; 12. Machine Learning and Discovery

*Submitted: *May 10, 2005

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.