Frame-Guided Region-Aligned Representation for Video Person Re-Identification

Zengqun Chen; Zhiheng Zhou; Junchu Huang; Pengyu Zhang; Bo Li

doi:10.1609/aaai.v34i07.6632

Authors

Zengqun Chen South China University of Technology
Zhiheng Zhou South China University of Technology
Junchu Huang South China University of Technology
Pengyu Zhang South China University of Technology
Bo Li South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v34i07.6632

Abstract

Pedestrians in videos are usually in a moving state, resulting in serious spatial misalignment like scale variations and pose changes, which makes the video-based person re-identification problem more challenging. To address the above issue, in this paper, we propose a Frame-Guided Region-Aligned model (FGRA) for discriminative representation learning in two steps in an end-to-end manner. Firstly, based on a frame-guided feature learning strategy and a non-parametric alignment module, a novel alignment mechanism is proposed to extract well-aligned region features. Secondly, in order to form a sequence representation, an effective feature aggregation strategy that utilizes temporal alignment score and spatial attention is adopted to fuse region features in the temporal and spatial dimensions, respectively. Experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed method to solve the misalignment problem and the superiority of the proposed method to the existing video-based person re-identification methods.

Frame-Guided Region-Aligned Representation for Video Person Re-Identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription