Backdoor Attacks on the DNN Interpretation System

Authors

Shihong Fang

New York University Tandon School of Engineering

Anna Choromanska

New York University Tandon School of Engineering

Proceedings:

No. 1: AAAI-22 Technical Tracks 1

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Computer Vision I

Downloads:

Download PDF

Abstract:

Interpretability is crucial to understand the inner workings of deep neural networks (DNNs). Many interpretation methods help to understand the decision-making of DNNs by generating saliency maps that highlight parts of the input image that contribute the most to the prediction made by the DNN. In this paper we design a backdoor attack that alters the saliency map produced by the network for an input image with a specific trigger pattern while not losing the prediction performance significantly. The saliency maps are incorporated in the penalty term of the objective function that is used to train a deep model and its influence on model training is conditioned upon the presence of a trigger. We design two types of attacks: a targeted attack that enforces a specific modification of the saliency map and a non-targeted attack when the importance scores of the top pixels from the original saliency map are significantly reduced. We perform empirical evaluations of the proposed backdoor attacks on gradient-based interpretation methods, Grad-CAM and SimpleGrad, and a gradient-free scheme, VisualBackProp, for a variety of deep learning architectures. We show that our attacks constitute a serious security threat to the reliability of the interpretation methods when deploying models developed by untrusted sources. We furthermore show that existing backdoor defense mechanisms are ineffective in detecting our attacks. Finally, we demonstrate that the proposed methodology can be used in an inverted setting, where the correct saliency map can be obtained only in the presence of a trigger (key), effectively making the interpretation system available only to selected users.

DOI:

10.1609/aaai.v36i1.19935

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.