Proceedings:
No. 7: AAAI-22 Technical Tracks 7
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 36
Track:
AAAI Technical Track on Machine Learning II
Downloads:
Abstract:
In this work we consider the problem of regret minimization for logistic bandits. The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant that can at worst scale exponentially with the norm of the unknown parameter vector. Previous works have applied self-concordance of the logistic function to remove this worst-case dependence providing regret guarantees that move the reduce the dependence on this worst case parameter to lower order terms with only polylogarithmic dependence on the main term and as well as linear dependence on the dimension of the unknown parameter. This work improves upon the prior art by 1) removing all scaling of the worst case term on the main term and 2) reducing the dependence on the dependence to scale with the square root of dimension in the fixed arm setting by employing an experimental design procedure. Our regret bound in fact takes a tighter instance (i.e., gap) dependent regret bound for the first time in logistic bandits. We also propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general and prove that it can exponentially reduce the lower order term's dependency on the worst case parameter in some instances. Finally, we discuss the impact of the bias of the MLE on the logistic bandit problem in d dimensions, providing an example where d^2 lower order regret (cf., it is d for linear bandits) may not be improved as long as the MLE is used and how bias-corrected estimators may be used to make it closer to d.
DOI:
10.1609/aaai.v36i7.20741
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 36