Proceedings:
No. 1: AAAI-19, IAAI-19, EAAI-20
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 33
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.
DOI:
10.1609/aaai.v33i01.33018690
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 33