Spatio-temporal associative representation for video person re-identification

Wu, G; Zhu, X; Gong, S

View/Open

Accepted version
Embargoed until: 5555-01-01
Reason: Version not permitted.

Metadata

Abstract

Learning discriminative spatio-temporal representation is the key for solving video re-identification (re-id) challenges. Most existing methods focus on learning appearance features and/or selecting image frames, but ignore optimising the compatibility and interaction of appearance and motion attentive information. To address this limitation, we propose a novel model to learning Spatio-Temporal Associative Representation (STAR). We design local frame-level spatio-temporal association to learn discriminative attentive appearance and short-term motion features, and global video-level spatio-temporal association to form compact and discriminative holistic video representation. We further introduce a pyramid ranking regulariser for facilitating end-to-end model optimisation. Extensive experiments demonstrate the superiority of STAR against state-of-the-art methods on four video re-id benchmarks, including MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID-2011.

Authors

Wu, G; Zhu, X; Gong, S

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/90325

Collections

Electronic Engineering and Computer Science [3475]