Memory Controlled Sequential Self Attention for Sound Recognition

Pankajakshan, A; Bear, H; Subramanian, V; Benetos, E; 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020)

dc.contributor.author	Pankajakshan, A	en_US
dc.contributor.author	Bear, H	en_US
dc.contributor.author	Subramanian, V	en_US
dc.contributor.author	Benetos, E	en_US
dc.contributor.author	21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020)	en_US
dc.date.accessioned	2020-10-21T09:25:26Z
dc.date.available	2020-07-24	en_US
dc.date.issued	2020-10-25	en_US
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/67665
dc.description.abstract	In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition. We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED). Experiments on the URBAN-SED dataset demonstrate the impact of the extent of memory on sound recognition performance with the self attention induced SED model. We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values. The proposed use of memory controlled sequential self attention offers a way to induce relations among frames of sound event tokens. We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention. Index Terms: Memory controlled self attention, sound recognition, multi-head attention.	en_US
dc.format.extent	? - ? (5)	en_US
dc.publisher	International Speech and Communication Association (ISCA)	en_US
dc.rights	This is a pre-copyedited, author-produced version of an article accepted for publication in 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) following peer review.
dc.title	Memory Controlled Sequential Self Attention for Sound Recognition	en_US
dc.type	Conference Proceeding
dc.rights.holder	© 2020 International Speech and Communication Association (ISCA)
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2020-07-24	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	New Frontiers in Music Information Processing (MIP-Frontiers)::European Commission	en_US

Files in this item

Name:: Pankajakshan Memory Controlled ...
Size:: 190.9Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3475]

Show simple item record