AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence

Font Size: 
Optimizing Quantiles in Preference-Based Markov Decision Processes
Hugo Gilbert, Paul Weng, Yan Xu

Last modified: 2017-02-12


In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.


Markov decision process; Quantile

Full Text: PDF