Proceedings:
Book One
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 12
Track:
Student Abstracts
Downloads:
Abstract:
The most popular delayed reinforcement learning technique, Q-learning (Watkins 1989)) estimates the future reward expected from executing each action in every state. If these estimates are correct, then an agent can use them to select the action with maximal expected future reward in each state, and thus perform optimally. Watkins has proved that Q-learning produces an optimal policy (the function mapping states to actions) and that these estimates converge to the correct values given the optimal policy.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 12