Proceedings:
No. 1: Thirty-First AAAI Conference On Artificial Intelligence
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 31
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
Fine-grained image classification is challenging due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Since two different sub-categories is distinguished only by the subtle differences in some specific parts, semantic part localization is crucial for fine-grained image classification. Most previous works improve the accuracy by looking for the semantic parts, but rely heavily upon the use of the object or part annotations of images whose labeling are costly. Recently, some researchers begin to focus on recognizing sub-categories via weakly supervised part detection instead of using the expensive annotations. However, these works ignore the spatial relationship between the object and its parts as well as the interaction of the parts, both of them are helpful to promote part selection. Therefore, this paper proposes a weakly supervised part selection method with spatial constraints for fine-grained image classification, which is free of using any bounding box or part annotations. We first learn a whole-object detector automatically to localize the object through jointly using saliency extraction and co-segmentation. Then two spatial constraints are proposed to select the distinguished parts. The first spatial constraint, called box constraint, defines the relationship between the object and its parts, and aims to ensure that the selected parts are definitely located in the object region, and have the largest overlap with the object region. The second spatial constraint, called parts constraint, defines the relationship of the object's parts, is to reduce the parts' overlap with each other to avoid the information redundancy and ensure the selected parts are the most distinguishing parts from other categories. Combining two spatial constraints promotes parts selection significantly as well as achieves a notable improvement on fine-grained image classification. Experimental results on CUB-200-2011 dataset demonstrate the superiority of our method even compared with those methods using expensive annotations.
DOI:
10.1609/aaai.v31i1.11223
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 31