Proceedings:
No. 2 (2017): The Twenty-Ninth Innovative Applications of Artificial Intelligence Conference
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 31
Track:
IAAI Emerging Application Papers
Downloads:
Abstract:
In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary Ñ an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.
DOI:
10.1609/aaai.v31i2.19104
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 31