Proceedings:
No. 3: AAAI-21 Technical Tracks 3
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 35
Track:
AAAI Technical Track on Computer Vision II
Downloads:
Abstract:
Acquiring sufficient ground-truth supervision to train deep vi- sual models has been a bottleneck over the years due to the data-hungry nature of deep learning. This is exacerbated in some structured prediction tasks, such as semantic segmen- tation, which requires pixel-level annotations. This work ad- dresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level anno- tations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models se- mantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models. In particular, we devise a graph neural network (GNN) for group-wise semantic min- ing, wherein input images are represented as graph nodes, and the underlying relations between a pair of images are char- acterized by an efficient co-attention mechanism. Moreover, in order to prevent the model from paying excessive atten- tion to common semantics only, we further propose a graph dropout layer, encouraging the model to learn more accurate and complete object responses. The whole network is end-to- end trainable by iterative message passing, which propagates interaction cues over the images to progressively improve the performance. We conduct experiments on the popular PAS- CAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance. Our code is available at: https://github.com/Lixy1997/Group-WSSS.
DOI:
10.1609/aaai.v35i3.16294
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 35