Show simple item record

dc.contributor.authorWang, Len_US
dc.contributor.authorCavallaro, Aen_US
dc.date.accessioned2020-09-02T10:34:53Z
dc.date.available2020-07-27en_US
dc.date.issued2020-08-24en_US
dc.identifier.issn2471-285Xen_US
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/66727
dc.description.abstractThis article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.en_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.ispartofIEEE Transactions on Emerging Topics in Computational Intelligenceen_US
dc.rightsThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.titleDeep learning assisted time-frequency processing for speech enhancement on dronesen_US
dc.typeArticle
dc.rights.holder© 2020 The Author(s)
dc.identifier.doi10.1109/TETCI.2020.3014934en_US
pubs.notesNot knownen_US
pubs.publication-statusPublisheden_US
dcterms.dateAccepted2020-07-27en_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Except where otherwise noted, this item's license is described as This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.