The problem of visual object tracking has traditionally been handled by variant tracking paradigms, either learning a model of the object's appearance exclusively online or matching the object with the target in an offline-trained embedding space. Despite the recent success, each method agonizes over its intrinsic constraint. The online-only approaches suffer from a lack of generalization of the model they learn thus are inferior in target regression, while the offline-only approaches (e.g., convolutional siamese trackers) lack the target-specific context information thus are not discriminative enough to handle distractors, and robust enough to deformation. Therefore, we propose an online module with an attention mechanism for offline siamese networks to extract target-specific features under L2 error. We further propose a filter update strategy adaptive to treacherous background noises for discriminative learning, and a template update strategy to handle large target deformations for robust learning. Effectiveness can be validated in the consistent improvement over three siamese baselines: SiamFC, SiamRPN++, and SiamMask. Beyond that, our model based on SiamRPN++ obtains the best results over six popular tracking benchmarks and can operate beyond real-time.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved