AAAI Publications, Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Font Size: 
Pathological Effects of Variance on Classification-Based Policy Iteration
Bernardo Ávila Pires, Csaba Szepesvári

Last modified: 2015-04-01


We carry out an empirical study of classification-based policy iteration (CBPI) in a simplified Markovian Decision Process (MDP). In this simple MDP, we expose some pathological cases where variance in state-action value estimates can degrade the performance of CBPI to the point of complete ineffectiveness. In particular, it is shown that with enough variance in the returns, e.g., if we estimate state-action values with a single rollout, CBPI drifts away from the/an optimal policy over iterations, even when the optimal policy is its initial policy to iterate over. From our investigation we also arrived at a natural cost-sensitive classification problem where the costs are noisy, a problem which to the best of our knowledge has not been studied in the classification literature.


policy iteration; multiclass classification; cost-sensitive classification; variance;

Full Text: PDF