Posterior Variance-Parameterised Gaussian Dropout: Improving Disentangled Sequential Autoencoders for Zero-Shot Voice Conversion

Luo, Y-J; Dixon, S; ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

dc.contributor.author	Luo, Y-J
dc.contributor.author	Dixon, S
dc.contributor.author	ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.date.accessioned	2024-07-09T10:25:12Z
dc.date.available	2024-07-09T10:25:12Z
dc.date.issued	2024-04-14
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/97926
dc.description.abstract	The class of disentangled sequential auto-encoders factorises speech into time-invariant (global) and time-variant (local) representations for speaker identity and linguistic content, respectively. Many of the existing models employ this assumption to tackle zero-shot voice conversion (VC), which converts speaker characteristics of any given utterance to any novel speakers while preserving the linguistic content. However, balancing capacity between the two representations is intricate, as the global representation tends to collapse due to its lower information capacity along the time axis than that of the local representation. We propose a simple and effective dropout technique that applies an information bottleneck to the local representation via multiplicative Gaussian noise, in order to encourage the usage of the global one. We endow existing zero-shot VC models with the proposed method and show significant improvements in speaker conversion in terms of speaker verification acceptance rate and comparable or better intelligibility measured in character error rate.	en_US
dc.format.extent	11676 - 11680
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.title	Posterior Variance-Parameterised Gaussian Dropout: Improving Disentangled Sequential Autoencoders for Zero-Shot Voice Conversion	en_US
dc.type	Conference Proceeding	en_US
dc.rights.holder	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.identifier.doi	10.1109/icassp48485.2024.10447835
pubs.notes	Not known	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	UKRI Centre for Doctoral Training in Artificial Intelligence and Music::Engineering and Physical Sciences Research Council	en_US
rioxxterms.funder.project	b215eee3-195d-4c4f-a85d-169a4331c138	en_US

Files in this item

Name:: Dixon Posterior Variance-Param ...
Size:: 281.4Kb
Format:: application/
Description:: Accepted Version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3490]

Show simple item record