Deep learning assisted time-frequency processing for speech enhancement on drones

Wang, L; Cavallaro, A

dc.contributor.author	Wang, L	en_US
dc.contributor.author	Cavallaro, A	en_US
dc.date.accessioned	2020-09-02T10:34:53Z
dc.date.available	2020-07-27	en_US
dc.date.issued	2020-08-24	en_US
dc.identifier.issn	2471-285X	en_US
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/66727
dc.description.abstract	This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.relation.ispartof	IEEE Transactions on Emerging Topics in Computational Intelligence	en_US
dc.rights	This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.title	Deep learning assisted time-frequency processing for speech enhancement on drones	en_US
dc.type	Article
dc.rights.holder	© 2020 The Author(s)
dc.identifier.doi	10.1109/TETCI.2020.3014934	en_US
pubs.notes	Not known	en_US
pubs.publication-status	Published	en_US
dcterms.dateAccepted	2020-07-27	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Wang Deep learning assisted 2020 ...
Size:: 4.494Mb
Format:: application/
Description:: Accepted version

View/Open

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3475]

Show simple item record

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Except where otherwise noted, this item's license is described as This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.