A Theoretical and Algorithmic Analysis of Configurable MDPs
This paper analyzes, from theoretical and algorithmic perspectives, a class of problems recently introduced in the literature of Markov decision processes—configurable Markov decision processes. In this new class of problems we jointly optimize the probability transition function and associated optimal policy, in order to improve the performance of a decision-making agent. We contribute a complexity analysis on the problem from a computational perspective, where we show that, in general, solving a configurable MDP is NP-Hard. We also discuss practical challenges often faced in solving this class of problems. Additionally, we formally derive a gradient-based approach that sheds some light on the correctness and limitations of existing methods. We conclude by demonstrating the application of different parameterizations of configurable MDPs in two scenarios, offering a discussion on advantages and drawbacks from modeling and algorithmic perspectives. Our contributions set the foundation for a better understanding of this recent problem, and the wider applicability of the underlying ideas to different planning problems.