Search
Now showing items 111-120 of 148
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
(37th Conference on Neural Information Processing Systems (NeurIPS), 2023)
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is ...
The Song Describer dataset: a corpus of audio captions for music-and-language evaluation
(NeurIPS Machine Learning for Audio Workshop, 2023-12-16)
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural ...
MERTech: instrument playing technique detection using self-supervised pretrained model with multi-task finetuning
(IEEE, 2024-04-14)
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. ...
On the effectiveness of speech self-supervised learning for music
(International Society for Music Information Retrieval Conference (ISMIR), 2023-11-05)
Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While ...
LyricWhiz: Robust Multilingual Lyrics Transcription by Whispering to ChatGPT
(International Society for Music Information Retrieval Conference (ISMIR), 2023-11-05)
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock ...
Contrastive audio-language learning for music
(ISMIR, 2022-12-04)
As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information ...