Proceedings:
No. 2: AAAI-21 Technical Tracks 2
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 35
Track:
AAAI Technical Track on Computer Vision I
Downloads:
Abstract:
Despite of the recent great progress on multi-person pose estimation, existing solutions still remain challenging under the condition of "crowded scenes'', where RGB images capture complex real-world scenes with highly-overlapped people, severe occlusions and diverse postures. In this work, we focus on two main problems: 1) how to design an effective pipeline for crowded scenes pose estimation; and 2) how to equip this pipeline with the ability of relation modeling for interference resolving. To tackle these problems, we propose a new pipeline named Relation based Skeleton Graph Network (RSGNet). Unlike existing works that directly predict joints-of-target by labeling joints-of-interference as false positive, we first encourage all joints to be predicted. And then, a Target-aware Relation Parser (TRP) is designed to model the relation over all predicted joints, resulting in a target-aware encoding. This new pipeline will largely relieve the confusion of the joints estimation model when seeing identical joints with totally distinct labels (e.g., the identical hand exists in two bounding boxes). Furthermore, we introduce a Skeleton Graph Machine (SGM) to model the skeleton-based commonsense knowledge, aiming to estimate the target pose with the constraint of human body structure. Such skeleton-based constraint can help to deal with the challenges in crowded scenes from a reasoning perspective. Solid experiments on pose estimation benchmarks demonstrate that our method outperforms existing state-of-the-art methods.
DOI:
10.1609/aaai.v35i2.16206
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 35