Proceedings:
No. 3: AAAI-22 Technical Tracks 3
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 36
Track:
AAAI Technical Track on Computer Vision III
Downloads:
Abstract:
The task of few-shot fine-grained recognition is to classify images belonging to subordinate categories merely depending on few examples. Due to the fine-grained nature, it is desirable to capture subtle but discriminative part-level patterns from limited training data, which makes it a challenging problem. In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions. Specifically, by producing attention guidance from deep activations of input images, our hard-attention is realized by keeping a few useful deep descriptors and forming them as a bag of multi-instance learning. Since these deep descriptors could correspond to objects' parts, the advantage of modeling as a multi-instance bag is able to exploit inherent correlation of these fine-grained parts. On the other side, a soft attended activation representation can be obtained by applying attention guidance upon original activations, which brings comprehensive attention information as the counterpart of hard-attention. After that, both outputs of dual branches are aggregated as a holistic image embedding w.r.t. input images. By performing meta-learning, we can learn a powerful image embedding in such a metric space to generalize to novel classes. Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods.
DOI:
10.1609/aaai.v36i3.20196
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 36