MusiLingo: bridging music and text with pre-trained language models for music captioning and query response

Deng, Z; Ma, Y; Liu, Y; Guo, R; Zhang, G; Chen, W; Huang, W; Benetos, E; 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

dc.contributor.author	Deng, Z
dc.contributor.author	Ma, Y
dc.contributor.author	Liu, Y
dc.contributor.author	Guo, R
dc.contributor.author	Zhang, G
dc.contributor.author	Chen, W
dc.contributor.author	Huang, W
dc.contributor.author	Benetos, E
dc.contributor.author	2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
dc.date.accessioned	2024-04-19T09:16:45Z
dc.date.available	2024-03-13
dc.date.available	2024-04-19T09:16:45Z
dc.date.issued	2024-06-16
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/96229
dc.description.abstract	Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs.	en_US
dc.format.extent	? - ? (13)
dc.title	MusiLingo: bridging music and text with pre-trained language models for music captioning and query response	en_US
dc.type	Conference Proceeding	en_US
dc.rights.holder	© 2024 ACL
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2024-03-13
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	Resource-efficient machine listening::Royal Academy of Engineering	en_US
qmul.funder	Resource-efficient machine listening::Royal Academy of Engineering	en_US
qmul.funder	Resource-efficient machine listening::Royal Academy of Engineering	en_US

Files in this item

Name:: Benetos MusiLingo bridging music ...
Size:: 773.2Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3422]

Show simple item record