Proceedings:
No. 3: AAAI-22 Technical Tracks 3
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 36
Track:
AAAI Technical Track on Computer Vision III
Downloads:
Abstract:
Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RG- B and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGB- D features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of- the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.
DOI:
10.1609/aaai.v36i3.20156
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 36