Proceedings:
No. 1: AAAI-19, IAAI-19, EAAI-20
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 33
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
Benefiting from the rapid development of Convolutional Neural Networks (CNNs), some salient object detection methods have achieved remarkable results by utilizing multi-level convolutional features. However, the saliency training datasets is of limited scale due to the high cost of pixel-level labeling, which leads to a limited generalization of the trained model on new scenarios during testing. Besides, some FCN-based methods directly integrate multi-level features, ignoring the fact that the noise in some features are harmful to saliency detection. In this paper, we propose a novel approach that transforms prior information into an embedding space to select attentive features and filter out outliers for salient object detection. Our network firstly generates a coarse prediction map through an encorder-decorder structure. Then a Feature Embedding Network (FEN) is trained to embed each pixel of the coarse map into a metric space, which incorporates much attentive features that highlight salient regions and suppress the response of non-salient regions. Further, the embedded features are refined through a deep-to-shallow Recursive Feature Integration Network (RFIN) to improve the details of prediction maps. Moreover, to alleviate the blurred boundaries, we propose a Guided Filter Refinement Network (GFRN) to jointly optimize the predicted results and the learnable guidance maps. Extensive experiments on five benchmark datasets demonstrate that our method outperforms state-of-the-art results. Our proposed method is end-to-end and achieves a realtime speed of 38 FPS.
DOI:
10.1609/aaai.v33i01.33019340
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 33