Deep Discriminative CNN with Temporal Ensembling for Ambiguously-Labeled Image Classification

Authors

  • Yao Yao NJUST
  • Jiehui Deng NJUST
  • Xiuhua Chen NJUST
  • Chen Gong NJUST
  • Jianxin Wu NJU
  • Jian Yang NJUST

DOI:

https://doi.org/10.1609/aaai.v34i07.6959

Abstract

In this paper, we study the problem of image classification where training images are ambiguously annotated with multiple candidate labels, among which only one is correct but is not accessible during the training phase. Due to the adopted non-deep framework and improper disambiguation strategies, traditional approaches are usually short of the representation ability and discrimination ability, so their performances are still to be improved. To remedy these two shortcomings, this paper proposes a novel approach termed “Deep Discriminative CNN” (D2CNN) with temporal ensembling. Specifically, to improve the representation ability, we innovatively employ the deep convolutional neural networks for ambiguously-labeled image classification, in which the well-known ResNet is adopted as our backbone. To enhance the discrimination ability, we design an entropy-based regularizer to maximize the margin between the potentially correct label and the unlikely ones of each image. In addition, we utilize the temporally assembled predictions of different epochs to guide the training process so that the latent groundtruth label can be confidently highlighted. This is much superior to the traditional disambiguation operations which treat all candidate labels equally and identify the hidden groundtruth label via some heuristic ways. Thorough experimental results on multiple datasets firmly demonstrate the effectiveness of our proposed D2CNN when compared with other existing state-of-the-art approaches.

Downloads

Published

2020-04-03

How to Cite

Yao, Y., Deng, J., Chen, X., Gong, C., Wu, J., & Yang, J. (2020). Deep Discriminative CNN with Temporal Ensembling for Ambiguously-Labeled Image Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12669-12676. https://doi.org/10.1609/aaai.v34i07.6959

Issue

Section

AAAI Technical Track: Vision