Sound-and-Image-informed Music Artwork Generation Using Text-to-Image Models
View/ Open
Published version
Embargoed until: 2099-01-01
Reason: Not yet published.
Embargoed until: 2099-01-01
Reason: Not yet published.
Editors
Ferraro, A
Knees, P
Quadrana, M
Ye, T
Gouyon, F
Publisher
Location
Metadata
Show full item recordAbstract
While some artists are involved in both domains, the creation of music and artwork require different skill sets. The development of deep generative models for music and image generation has potential to democratise these mediums and make multi-modal creation more accessible for casual creators and other stakeholders. In this work, we propose a co-creative pipeline for the generation of images to accompany a musical piece. This pipeline utilises state-of-the-art models for music-to-text, image-to-text, and subsequently text-to-image generation to recommend, via generation, visuals for a piece of music that are informed not only by the audio of a musical piece, but also a user-recommended corpus of artworks and prompts to give a meaningful grounding in the generated material. We demonstrate the potential of our pipeline using a corpus of material from artists with strongly connected visual and musical identities, and make it available in the form of a Python notebook for users to easily generate their own musical and visual compositions using their chosen corpus - available here: https://github.com/alexjameswilliams/Music-Text-To-Image-Generation