Sound-and-Image-informed Music Artwork Generation Using Text-to-Image Models

Williams, A; Lattner, S; Barthet, M; Music Recommender Systems Workshop at the 17th ACM Conference on Recommender Systems

View/Open

Published version
Embargoed until: 2099-01-01
Reason: Not yet published.

Editors

Ferraro, A

Knees, P

Quadrana, M

Ye, T

Gouyon, F

Publisher

ACM

Location

New York, NY, USA

Metadata

Show full item record

Abstract

While some artists are involved in both domains, the creation of music and artwork require different skill sets. The development of deep generative models for music and image generation has potential to democratise these mediums and make multi-modal creation more accessible for casual creators and other stakeholders. In this work, we propose a co-creative pipeline for the generation of images to accompany a musical piece. This pipeline utilises state-of-the-art models for music-to-text, image-to-text, and subsequently text-to-image generation to recommend, via generation, visuals for a piece of music that are informed not only by the audio of a musical piece, but also a user-recommended corpus of artworks and prompts to give a meaningful grounding in the generated material. We demonstrate the potential of our pipeline using a corpus of material from artists with strongly connected visual and musical identities, and make it available in the form of a Python notebook for users to easily generate their own musical and visual compositions using their chosen corpus - available here: https://github.com/alexjameswilliams/Music-Text-To-Image-Generation

Authors

Williams, A; Lattner, S; Barthet, M; Music Recommender Systems Workshop at the 17th ACM Conference on Recommender Systems

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/91539

Collections

Electronic Engineering and Computer Science [3475]