GTNet: Generative Transfer Network for Zero-Shot Object Detection

Authors

  • Shizhen Zhao Huazhong University of Science and Technology
  • Changxin Gao Huazhong University of Science and Technology
  • Yuanjie Shao Huazhong University of Science and Technology
  • Lerenhan Li Huazhong University of Science and Technology
  • Changqian Yu Huazhong University of Science and Technology
  • Zhong Ji Tianjin University
  • Nong Sang Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v34i07.6996

Abstract

We propose a Generative Transfer Network (GTNet) for zero-shot object detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge Transfer Module. The Object Detection Module can learn large-scale seen domain knowledge. The Knowledge Transfer Module leverages a feature synthesizer to generate unseen class features, which are applied to train a new classification layer for the Object Detection Module. In order to synthesize features for each unseen class with both the intra-class variance and the IoU variance, we design an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN consists of three unit models: Class Feature Generating Unit (CFU), Foreground Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU). CFU generates unseen features with the intra-class variance conditioned on the class semantic embeddings. FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively. We evaluate our method on three public datasets and the results demonstrate that our method performs favorably against the state-of-the-art ZSD approaches.

Downloads

Published

2020-04-03

How to Cite

Zhao, S., Gao, C., Shao, Y., Li, L., Yu, C., Ji, Z., & Sang, N. (2020). GTNet: Generative Transfer Network for Zero-Shot Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12967-12974. https://doi.org/10.1609/aaai.v34i07.6996

Issue

Section

AAAI Technical Track: Vision