Proceedings:
No. 1: Agents, AI in Art and Entertainment, Knowledge Representation, and Learning
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 13
Track:
Reinforcement Learning
Downloads:
Abstract:
Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected payoff per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between different policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optimality, which can filter gain-optimal policies to select those that reach absorbing goals with the minimum cost. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more efficiently than a gain-optimal method by learning bias-optimal policies.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 13
ISBN 978-0-262-51091-2