Adapting Language-Audio Models as Few-Shot Audio Learners

Liang, J; Liu, X; Liu, H; Phan, H; Benetos, E; Plumbley, M; Wang, W; 24th Annual Conference of the International Speech Communication Association (INTERSPEECH)

dc.contributor.author	Liang, J
dc.contributor.author	Liu, X
dc.contributor.author	Liu, H
dc.contributor.author	Phan, H
dc.contributor.author	Benetos, E
dc.contributor.author	Plumbley, M
dc.contributor.author	Wang, W
dc.contributor.author	24th Annual Conference of the International Speech Communication Association (INTERSPEECH)
dc.date.accessioned	2023-06-05T10:03:55Z
dc.date.available	2023-05-17
dc.date.available	2023-06-05T10:03:55Z
dc.date.issued	2023-08-20
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/88690
dc.description.abstract	Contrastive language-audio pretraining (CLAP) has become a new paradigm to learn audio concepts with audio-text pairs. CLAP models have shown unprecedented performance as zero-shot classifiers on downstream tasks. To further adapt CLAP with domain-specific knowledge, a popular method is to finetune its audio encoder with available labelled examples. However, this is challenging in low-shot scenarios, as the amount of annotations is limited compared to the model size. In this work, we introduce a Training-efficient (Treff) adapter to rapidly learn with a small set of examples while maintaining the capacity for zero-shot classification. First, we propose a cross-attention linear model (CALM) to map a set of labelled examples and test audio to test labels. Second, we find initialising CALM as a cosine measurement improves our Treff adapter even without training. The Treff adapter outperforms metric-based methods in few-shot settings and yields competitive results to fully-supervised methods.	en_US
dc.format.extent	? - ? (5)
dc.relation.isreplacedby	123456789/88692
dc.relation.isreplacedby	https://qmro.qmul.ac.uk/xmlui/handle/123456789/88692
dc.title	Adapting Language-Audio Models as Few-Shot Audio Learners	en_US
dc.type	Conference Proceeding	en_US
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2023-05-17

Files in this item

Name:: Benetos Adapting Language-Audio ...
Size:: 510.8Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3475]

Show simple item record