Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

Authors

Anshuka Rangi

University of California San Diego

Long Tran-Thanh

University of Warwick

Haifeng Xu

University of Virginia

Massimo Franceschetti

University of California San Diego

Proceedings:

No. 7: AAAI-22 Technical Tracks 7

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Machine Learning II

Downloads:

Download PDF

Abstract:

This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret O(log T) can be forced to suffer a regret O(T) with an expected amount of contamination O(log T). This amount of contamination is also necessary, as we prove that there exists an O(log T) regret bandit algorithm, specifically the classical UCB, that requires Omega(log T) amount of contamination to suffer regret Omega(T). To combat such poisoning attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with O(log T) expected number of verifications, a simple modified version of the Explore-then-Commit type bandit algorithm can restore the order optimal O(log T) regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with O(log T) expected number of verifications. To derive a matching lower bound on the number of verifications, we also prove that for any order-optimal bandit algorithm, this number of verifications O(log T) is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget B, we propose a novel algorithm, Secure-BARBAR, which provably achieves O(min(C,T/sqrt(B))) regret with high probability against weak attackers (i.e., attackers who have to place the contamination before seeing the actual pulls of the bandit algorithm), where C is the total amount of contamination by the attacker, which breaks the known Omega(C) lower bound of the non-verified setting if C is large.

DOI:

10.1609/aaai.v36i7.20777

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.