ATGNN: audio tagging graph neural network

Singh, S; Steinmetz, C; Benetos, E; Phan, QH; Stowell, D

dc.contributor.author	Singh, S	en_US
dc.contributor.author	Steinmetz, C	en_US
dc.contributor.author	Benetos, E	en_US
dc.contributor.author	Phan, QH	en_US
dc.contributor.author	Stowell, D	en_US
dc.date.accessioned	2024-01-10T15:45:33Z
dc.date.available	2023-12-27	en_US
dc.date.issued	2024-01-17	en_US
dc.identifier.issn	1558-2361	en_US
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/93742
dc.description.abstract	Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough to capture irregular audio objects. In this work, we treat the spectrogram in a more flexible way by considering it as graph structure and process it with a novel graph neural architecture called ATGNN. ATGNN not only combines the capability of CNNs with the global information sharing ability of Graph Neural Networks, but also maps semantic relationships between learnable class embeddings and corresponding spectrogram regions. We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset, achieving comparable results to Transformer based models with significantly lower number of learnable parameters.	en_US
dc.format.extent	? - ? (5)	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.relation.ispartof	IEEE Signal Processing Letters	en_US
dc.title	ATGNN: audio tagging graph neural network	en_US
dc.type	Article
dc.rights.holder	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.identifier.doi	10.1109/LSP.2024.3352514	en_US
pubs.notes	Not known	en_US
pubs.publication-status	Published	en_US
dcterms.dateAccepted	2023-12-27	en_US
qmul.funder	GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence::Engineering and Physical Sciences Research Council	en_US
qmul.funder	GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence::Engineering and Physical Sciences Research Council	en_US
qmul.funder	GraphNEx: Graph Neural Networks for Explainable Artificial Intelligence::Engineering and Physical Sciences Research Council	en_US

Files in this item

Name:: Benetos ATGNN: audio tagging ...
Size:: 766.8Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3387]

Show simple item record