Person Recognition in Low-Quality Imagery.

Cheng, Zhiyi

View/Open

PhD thesis (14.32Mb)

Publisher

Queen Mary University of London.

Metadata

Show full item record

Abstract

Person recognition aims to recognise and track the same individuals over space and time with subtle identity class information in automatically detected person images captured by unconstrained camera views. There are multi-source visual biometrical cues for person identity recognition. Specifically, compared to other widely-used cues that tend to easily change over time and space, the facial appearance is considered as a more reliable non-intrusive visual cue. Person recognition, especially the person face recognition, enables a wide range of practical applications, ranging from law enforcement and information security to business, entertainment and e-commerce. However, person recognition under realistic application scenarios remains significantly challenging, mainly due to the usual low resolutions (LR) of the images captured by low-quality cameras with unconstrained distances between cameras and people. Compared to the high-resolution (HR) images, the LR person images contain much less fine-grained discriminative details for robust identity recognition. To tackle the challenge of person recognition on low-resolution imagery data, one effective approach is to utilise the super resolution (SR) methods to recover or enhance the image details that are beneficial for identity recognition. However, this thesis reveals that conventional SR models suffer from significant performance drop when applied to low-quality LR person images, especially the natively captured surveillance facial images. Moreover, as the SR and identity recognition models advance independently, direct super resolution is less compatible with identity recognition, and hence has minor benefit or even negative effect for low-resolution person recognition. To tackle the above problems, this thesis explores person recognition methods with improved generalisation ability to realistic low-quality person images, by adopting dedicated superresolution algorithms. More specifically, this thesis addresses the issues for person face recognition and body recognition in low-resolution images as follows: Chapter 3 Whilst recent person face recognition techniques have made significant progress on recognising constrained high-resolution web images, the same cannot be said on natively unconstrained low-resolution images at large scales. This chapter examines systematically this under-studied person face recognition problem, and introduce a novel Complement Super-Resolution and Identity (CSRI) joint deep learning method with a unified end-to-end network architecture. The proposed learning mechanism is dedicated to overcome the inherent challenge of genuine low-resolution, concerning with the absence of HR facial images coupled with native LR faces, typically required for optimising image super-resolution models. This is realised by transferring the super-resolving knowledge from good-quality HR web images to the genuine LR facial data subject to the face identity label constraints of native LR faces in every mini-batch training. This chapter further constructs a new large-scale dataset TinyFace of native unconstrained low-resolution face images from selected public datasets. The extensive experiments show that there is a significant gap between the reported person face recognition performances on popular benchmarks and the results on TinyFace, and the advantages of the proposed CSRI over a variety of state-of-the-art face recognition and super-resolution deep models on solving this largely ignored person face recognition scenario. However, the lack of supervision in pixel space leads to the low-fidelity super-resolved images. which may hinder the further downstream facial analysis applications. Chapter 4 Although with a more advanced joint-learning scheme for person face recognition by super resolution (introduced in Chapter 3), by no-means one can claim that the proposed method solves the real-world low-resolution face recognition problem, which remains a significantly challenging task. In terms of human understanding, when people are faced with a challenging face identity recognition task, they often make decisions by selecting discriminative facial features. If a recognition model can be optimised with results that can be explained in a human-understandable way, such an interpretable model may have the potential to shed light on discriminative facial features selection for better identity recognition. To achieve this, recognising faces from high-fidelity super-resolved outputs could be a viable approach. However, existing facial super-resolution methods focus mostly on improving “artificially down-sampled” low-resolution (LR) imagery. Such SR models, although strong at handling artificial LR images, often suffer from significant performance drop on genuine LR test data. Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data as well as cycle consistency loss formulation. However, this renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution. Importantly, this makes the end-to-end model training ineffective due to the difficulty of back-propagating gradients through two concatenated CNNs. To solve this problem, in this chapter, a method that joins the advantages of conventional SR and UDA models is formulated. Specifically, the optimisations for characteristics consistifying and image super-resolving are separated and controlled by introducing Characteristic Regularisation (CR) between them. This task split makes the model training more effective and computationally tractable, and enables the high-fidelity super resolution process on genuine low-resolution faces. Chapter 5 Although the facial appearance is a more reliable visual cue for person recognition, it is often challenging or even impossible to detect the facial region in images captured by unconstrained low-quality cameras, where the faces can be of extreme poses, blur, distortion, or even invisible in the human back-view images. In such cases, the person body recognition is an important aspect for identity recognition and tracking. However, person images captured by unconstrained surveillance cameras often have low resolutions (LR). This causes the resolution mismatch problem when matched against the high-resolution (HR) gallery images, negatively affecting the performance of person body recognition. An effective approach is to leverage image super-resolution (SR) along with body recognition in a joint learning manner. However, this scheme is limited due to dramatically more difficult gradients backpropagation during training. This chapter introduces a novel model training regularisation method, called Inter-Task Association Critic (INTACT), to address this fundamental problem. Specifically, INTACT discovers the underlying association knowledge between image SR and person body recognition, and leverages it as an extra learning constraint for enhancing the compatibility of SR model with person body recognition in HR image space. This is realised by parameterising the association constraint, which can be automatically learned from the training data. Extensive experiments validate the superiority of INTACT over the state-of-the-art approaches on the cross-resolution person body recognition task using five standard datasets. Chapter 6 draws conclusions and suggests future works on open questions arising from the studies of this thesis.

Authors

Cheng, Zhiyi

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/72479

Collections

Theses [4221]