Proceedings:
No. 2: AAAI-22 Technical Tracks 2
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 36
Track:
AAAI Technical Track on Computer Vision II
Downloads:
Abstract:
Occlusion is common in the actual 3D scenes, causing the boundary ambiguity of the targeted object. This uncertainty brings difficulty for labeling and learning. Current 3D detectors predict the bounding box directly, regarding it as Dirac delta distribution. However, it does not fully consider such ambiguity. To deal with it, distribution learning is used to efficiently represent the boundary ambiguity. In this paper, we revise the common regression method by predicting the distribution of the 3D box and then present a distribution-aware regression (DAR) module for box refinement and localization quality estimation. It contains scale adaptive (SA) encoder and joint localization quality estimator (JLQE). With the adaptive receptive field, SA encoder refines discriminative features for precise distribution learning. JLQE provides a reliable location score by further leveraging the distribution statistics, correlating with the localization quality of the targeted object. Combining DAR module and the baseline VoteNet, we propose a novel 3D detector called DAVNet. Extensive experiments on both ScanNet V2 and SUN RGB-D datasets demonstrate that the proposed DAVNet achieves significant improvement and outperforms state-of-the-art 3D detectors.
DOI:
10.1609/aaai.v36i2.20049
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 36