HyperReenact: one-shot reenactment via jointly learning to refine and retarget faces
Abstract
In this paper, we present our method for neural face
reenactment, called HyperReenact, that aims to generate
realistic talking head images of a source identity, driven
by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that
learn to synthesize realistic facial images, yet producing
reenacted faces that are prone to significant visual artifacts,
especially under the challenging condition of extreme head
pose changes, or requiring expensive few-shot fine-tuning
to better preserve the source identity characteristics. We
propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting
the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source
frame) and allows for cross-subject reenactment, without
requiring any subject-specific fine-tuning. We compare
our method both quantitatively and qualitatively against
several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme
head pose changes. We make the code and the pretrained
models publicly available at: https://github.com/
StelaBou/HyperReenact.