An Experimental Design Approach for Regret Minimization in Logistic Bandits

Authors

Blake Mason

Rice University

Kwang-Sung Jun

University of Arizona

Lalit Jain

University of Washington

Proceedings:

No. 7: AAAI-22 Technical Tracks 7

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Machine Learning II

Downloads:

Download PDF

Abstract:

In this work we consider the problem of regret minimization for logistic bandits. The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant that can at worst scale exponentially with the norm of the unknown parameter vector. Previous works have applied self-concordance of the logistic function to remove this worst-case dependence providing regret guarantees that move the reduce the dependence on this worst case parameter to lower order terms with only polylogarithmic dependence on the main term and as well as linear dependence on the dimension of the unknown parameter. This work improves upon the prior art by 1) removing all scaling of the worst case term on the main term and 2) reducing the dependence on the dependence to scale with the square root of dimension in the fixed arm setting by employing an experimental design procedure. Our regret bound in fact takes a tighter instance (i.e., gap) dependent regret bound for the first time in logistic bandits. We also propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general and prove that it can exponentially reduce the lower order term's dependency on the worst case parameter in some instances. Finally, we discuss the impact of the bias of the MLE on the logistic bandit problem in d dimensions, providing an example where d^2 lower order regret (cf., it is d for linear bandits) may not be improved as long as the MLE is used and how bias-corrected estimators may be used to make it closer to d.

DOI:

10.1609/aaai.v36i7.20741

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.