Leveraging label hierarchies for few-shot everyday sound recognition

Liang, J; Phan, QH; Benetos, E; 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

View/Open

Accepted version (1.003Mb)

Pagination

? - ? (5)

Publisher URL

https://dcase.community/workshop2022/

Metadata

Show full item record

Abstract

Everyday sounds cover a considerable range of sound categories in our daily life, yet for certain sound categories it is hard to collect sufficient data. Although existing works have applied few-shot learning paradigms to sound recognition successfully, most of them have not exploited the relationship between labels in audio taxonomies. This work adopts a hierarchical prototypical network to leverage the knowledge rooted in audio taxonomies. Specifically, a VGG-like convolutional neural network is used to extract acoustic features. Prototypical nodes are then calculated in each level of the tree structure. A multi-level loss is obtained by multiplying a weight decay with multiple losses. Experimental results demonstrate our hierarchical prototypical networks not only outperform prototypical networks with no hierarchy information but yield a better result than other state-of-the art algorithms. Our code is available in: https://github.com/JinhuaLiang/HPNs_tagging

Authors

Liang, J; Phan, QH; Benetos, E; 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/82109

Collections

Electronic Engineering and Computer Science [3125]