Show simple item record

dc.contributor.authorKim, Jen_US
dc.contributor.authorOh, Cen_US
dc.contributor.authorDo, Hen_US
dc.contributor.authorKim, Sen_US
dc.contributor.authorSohn, Ken_US
dc.contributor.authorIEEE/CVF International Conference on Computer Vision and Pattern Recognition 2024en_US
dc.date.accessioned2024-05-17T15:08:15Z
dc.date.available2024-02-26en_US
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/96952
dc.description.abstractWe present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We validate our method by using pre-trained 2D and 3D GANs, and our results outperform existing methods. Our project page is available at https://github.com/1211sh/Diffusiondriven_GAN-Inversion/.
dc.titleDiffusion-driven GAN Inversion for Multi-Modal Face Image Generationen_US
dc.typeConference Proceeding
dc.rights.holder© 2024 IEEE.
pubs.notesNot knownen_US
pubs.publication-statusAccepteden_US
dcterms.dateAccepted2024-02-26en_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record