Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

Hu, J; Lin, J; Gong, S; Cai, W

dc.contributor.author	Hu, J
dc.contributor.author	Lin, J
dc.contributor.author	Gong, S
dc.contributor.author	Cai, W
dc.date.accessioned	2024-05-17T14:50:26Z
dc.date.available	2024-05-17T14:50:26Z
dc.date.issued	2024-03-25
dc.identifier.issn	2159-5399
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/96948
dc.description.abstract	Camouflaged object detection (COD) approaches heavily rely on pixel-level annotated datasets. Weakly-supervised COD (WSCOD) approaches use sparse annotations like scribbles or points to reduce annotation efforts, but this can lead to decreased accuracy. The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasible, as it may not be accessible in real-world application. Additionally, it only provides localization information instead of semantic one, which can intrinsically cause ambiguity in interpreting targets. In this work, we aim to eliminate the need for manual prompt. The key idea is to employ Cross-modal Chains of Thought Prompting (CCTP) to reason visual prompts using the semantic information given by a generic text prompt. To that end, we introduce a test-time instance-wise adaptation mechanism called Generalizable SAM (GenSAM) to automatically generate and optimize visual prompts from the generic task prompt for WSCOD. In particular, CCTP maps a single generic text prompt onto image-specific consensus foreground and background heatmaps using vision-language models, acquiring reliable visual prompts. Moreover, to test-time adapt the visual prompts, we further propose Progressive Mask Generation (PMG) to iteratively reweight the input image, guiding the model to focus on the targeted region in a coarse-to-fine manner. Crucially, all network parameters are fixed, avoiding the need for additional training. Experiments on 3 benchmarks demonstrate that GenSAM outperforms point supervision approaches and achieves comparable results to scribble supervision ones, solely relying on general task descriptions. Our codes is in https://github.com/jyLin8100/GenSAM.	en_US
dc.format.extent	12511 - 12518
dc.publisher	Association for the Advancement of Artificial Intelligence	en_US
dc.rights	This is a pre-copyedited, author-produced version accepted for publication in Proceedings of the AAAI Conference on Artificial Intelligence following peer review. The version of record is available at https://ojs.aaai.org/index.php/AAAI/article/view/29144
dc.title	Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects	en_US
dc.type	Conference Proceeding	en_US
dc.rights.holder	© 2024, Association for the Advancement of Artificial Intelligence
dc.identifier.doi	10.1609/aaai.v38i11.29144
pubs.issue	11	en_US
pubs.notes	Not known	en_US
pubs.publication-status	Published	en_US
pubs.volume	38	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Gong Relax Image-Specific 2024 ...
Size:: 2.725Mb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3334]

Show simple item record