Published:
2018-02-08
Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 32
Volume
Issue:
Thirty-Second AAAI Conference on Artificial Intelligence 2018
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
We propose a new neural network called Temporal-enhanced Convolutional Network (T-CN) for video-based person reidentification. For each video sequence of a person, a spatial convolutional subnet is first applied to each frame for representing appearance information, and then a temporal convolutional subnet links small ranges of continuous frames to extract local motion information. Such spatial and temporal convolutions together construct our T-CN based representation. Finally, a recurrent network is utilized to further explore global dynamics, followed by temporal pooling to generate an overall feature vector for the whole sequence. In the training stage, a Siamese network architecture is adopted to jointly optimize all the components with losses covering both identification and verification. In the testing stage, our network generates an overall discriminative feature representation for each input video sequence (whose length may vary a lot) in a feed-forward way, and even a simple Euclidean distance based matching can generate good re-identification results. Experiments on the most widely used benchmark datasets demonstrate the superiority of our proposal, in comparison with the state-of-the-art.
DOI:
10.1609/aaai.v32i1.12264
AAAI
Thirty-Second AAAI Conference on Artificial Intelligence 2018
ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.