Self-Supervised Facial Representation Learning with Facial Region Awareness

Gao, Z; Patras, I; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

dc.contributor.author	Gao, Z	en_US
dc.contributor.author	Patras, I	en_US
dc.contributor.author	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)	en_US
dc.date.accessioned	2024-03-15T15:38:54Z
dc.date.available	2024-02-26	en_US
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/95406
dc.description.abstract	Self-supervised pre-training has been proved to be effective in learning transferable representations that benefit various visual tasks. This paper asks this question: can self-supervised pre-training learn general facial representations for various facial analysis tasks? Recent efforts toward this goal are limited to treating each face image as a whole, i.e., learning consistent facial representations at the image-level, which overlooks the “consistency of local facial representations” (i.e., facial regions like eyes, nose, etc). In this work, we make a first attempt to propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations, Facial Region Awareness (FRA). Specifically, we explicitly enforce the consistency of facial regions by matching the local facial representations across views, which are extracted with learned heatmaps highlighting the facial regions. Inspired by the mask prediction in supervised semantic segmentation, we obtain the heatmaps via cosine similarity between the per-pixel projection of feature maps and “facial mask embeddings” computed from learnable positional embeddings, which leverage the attention mechanism to globally look up the facial image for facial regions. To learn such heatmaps, we formulate the learning of facial mask embeddings as a deep clustering problem by assigning the pixel features from the feature maps to them. The transfer learning results on facial classification and regression tasks show that our FRA outperforms previous pre-trained models and more importantly, using ResNet as the unified backbone for various tasks, our FRA achieves comparable or even better performance compared with SOTA methods in facial analysis tasks.
dc.title	Self-Supervised Facial Representation Learning with Facial Region Awareness	en_US
dc.type	Conference Proceeding
dc.rights.holder	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2024-02-26	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Gao Self-Supervised Facial 2023 ...
Size:: 316.2Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3442]

Show simple item record