DOI:
10.1609/socs.v15i1.21805
Abstract:
Real-world decision problems often involve multiple competing objectives or a complex reward structure that violate Markov assumption. However, the existing research on sequential decision making under uncertainty primarily focused on Markov Decision Processes (MDPs) with scalar Markovian reward signals. My thesis considers settings where scalar Markovian rewards are not sufficient to produce desired behaviors. The first part of my thesis develops algorithms to optimize lexicographically ordered objectives. The second part considers autonomous agents which incorporate the perspective of their observer. As the perspective of the observer can depend on how the agents behaved so far, rewards in this setting can depend on histories (non-Markovian). In the final part of my thesis, I hope to characterize when rewards beyond scalar Markovian signals are needed from the decision theoretic perspective