Head pose estimation from RGB images without depth information is a challenging task due to the loss of spatial information as well as large head pose variations in the wild. The performance of existing landmark-free methods remains unsatisfactory as the quality of estimated pose is inferior. In this paper, we propose a novel three-branch network architecture, termed as Feature Decoupling Network (FDN), a more powerful architecture for landmark-free head pose estimation from a single RGB image. In FDN, we first propose a feature decoupling (FD) module to explicitly learn the discriminative features for each pose angle by adaptively recalibrating its channel-wise responses. Besides, we introduce a cross-category center (CCC) loss to constrain the distribution of the latent variable subspaces and thus we can obtain more compact and distinct subspaces. Extensive experiments on both in-the-wild and controlled environment datasets demonstrate that the proposed method outperforms other state-of-the-art methods based on a single RGB image and behaves on par with approaches based on multimodal input resources.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved