Language-Aware Soft Prompting: Text-to-Text Optimization for Fewand Zero-Shot Adaptation of V&L Models

Bulat, A; Tzimiropoulos, G

dc.contributor.author	Bulat, A
dc.contributor.author	Tzimiropoulos, G
dc.date.accessioned	2023-11-24T15:07:46Z
dc.date.available	2023-09-01
dc.date.available	2023-11-24T15:07:46Z
dc.date.issued	2023-10-26
dc.identifier.issn	1573-1405
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/92248
dc.description.abstract	Soft prompt learning has emerged as a promising direction for adapting V &L models to a downstream task using a few training examples. However, current methods significantly overfit the training data suffering from large accuracy degradation when tested on unseen classes from the same domain. In addition, all prior methods operate exclusively under the assumption that both vision and language data is present. To this end, we make the following 5 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we also propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) Moreover, we identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) Importantly, we show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Expanding for the first time the setting to language-only adaptation, (5) we present a novel zero-shot variant of LASP where no visual samples at all are available for the downstream task. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Finally, (c) we show that our zero-shot variant improves upon CLIP without requiring any extra data. Code will be made available.	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	International Journal of Computer Vision
dc.rights	his article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.title	Language-Aware Soft Prompting: Text-to-Text Optimization for Fewand Zero-Shot Adaptation of V&L Models	en_US
dc.type	Article	en_US
dc.rights.holder	© 2021 The Author(s), published by Springer Nature
pubs.notes	Not known	en_US
pubs.publication-status	Published	en_US
dcterms.dateAccepted	2023-09-01
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Tzimiropoulos Language-Aware ...
Size:: 1.134Mb
Format:: application/
Description:: Published version

View/Open

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3490]

Show simple item record

his article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Except where otherwise noted, this item's license is described as his article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.