dc.contributor.author | Deng, Z | |
dc.contributor.author | Ma, Y | |
dc.contributor.author | Liu, Y | |
dc.contributor.author | Guo, R | |
dc.contributor.author | Zhang, G | |
dc.contributor.author | Chen, W | |
dc.contributor.author | Huang, W | |
dc.contributor.author | Benetos, E | |
dc.contributor.author | 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) | |
dc.date.accessioned | 2024-04-19T09:16:45Z | |
dc.date.available | 2024-03-13 | |
dc.date.available | 2024-04-19T09:16:45Z | |
dc.date.issued | 2024-06-16 | |
dc.identifier.uri | https://qmro.qmul.ac.uk/xmlui/handle/123456789/96229 | |
dc.description.abstract | Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. | en_US |
dc.format.extent | ? - ? (13) | |
dc.title | MusiLingo: bridging music and text with pre-trained language models for music captioning and query response | en_US |
dc.type | Conference Proceeding | en_US |
dc.rights.holder | © 2024 ACL | |
pubs.notes | Not known | en_US |
pubs.publication-status | Accepted | en_US |
dcterms.dateAccepted | 2024-03-13 | |
rioxxterms.funder | Default funder | en_US |
rioxxterms.identifier.project | Default project | en_US |
qmul.funder | Resource-efficient machine listening::Royal Academy of Engineering | en_US |
qmul.funder | Resource-efficient machine listening::Royal Academy of Engineering | en_US |
qmul.funder | Resource-efficient machine listening::Royal Academy of Engineering | en_US |