dc.contributor.author | Pankajakshan, A | en_US |
dc.contributor.author | Bear, H | en_US |
dc.contributor.author | Subramanian, V | en_US |
dc.contributor.author | Benetos, E | en_US |
dc.contributor.author | 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) | en_US |
dc.date.accessioned | 2020-10-21T09:25:26Z | |
dc.date.available | 2020-07-24 | en_US |
dc.date.issued | 2020-10-25 | en_US |
dc.identifier.uri | https://qmro.qmul.ac.uk/xmlui/handle/123456789/67665 | |
dc.description.abstract | In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition. We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED). Experiments on the URBAN-SED dataset demonstrate the impact of the extent of memory on sound recognition performance with the self attention induced SED model. We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values. The proposed use of memory controlled sequential self attention offers a way to induce relations among frames of sound event tokens. We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention. Index Terms: Memory controlled self attention, sound recognition, multi-head attention. | en_US |
dc.format.extent | ? - ? (5) | en_US |
dc.publisher | International Speech and Communication Association (ISCA) | en_US |
dc.rights | This is a pre-copyedited, author-produced version of an article accepted for publication in 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) following peer review. | |
dc.title | Memory Controlled Sequential Self Attention for Sound Recognition | en_US |
dc.type | Conference Proceeding | |
dc.rights.holder | © 2020 International Speech and Communication Association (ISCA) | |
pubs.notes | Not known | en_US |
pubs.publication-status | Accepted | en_US |
dcterms.dateAccepted | 2020-07-24 | en_US |
rioxxterms.funder | Default funder | en_US |
rioxxterms.identifier.project | Default project | en_US |
qmul.funder | New Frontiers in Music Information Processing (MIP-Frontiers)::European Commission | en_US |