Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation

  • Wei Feng SenseTime
  • Wentao Liu SenseTime
  • Tong Li SenseTime
  • Jing Peng Sensetime
  • Chen Qian SenseTime
  • Xiaolin Hu Tsinghua University


Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects’ localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.