DOI:
10.1609/aaai.v34i10.7234
Abstract:
We synthetically add data leakage to well-known image datasets, which results in predictions of convolutional neural networks trained naively on these spoiled datasets becoming wildly inaccurate. We propose a method, dubbed Mask-Enhanced Training, that automatically identifies the possible leakage and makes the classifier robust. The method enables the model to focus on all features needed to solve the task, making its predictions on the original validation set accurate, even if the whole training dataset is spoiled with the leakage.