Proceedings:
No. 1: Thirty-First AAAI Conference On Artificial Intelligence
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 31
Track:
Main Track: Planning and Scheduling
Downloads:
Abstract:
In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.
DOI:
10.1609/aaai.v31i1.11026
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 31