Show simple item record

dc.contributor.authorLuo, Y-J
dc.contributor.authorDixon, S
dc.contributor.authorICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.date.accessioned2024-07-09T10:25:12Z
dc.date.available2024-07-09T10:25:12Z
dc.date.issued2024-04-14
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/97926
dc.description.abstractThe class of disentangled sequential auto-encoders factorises speech into time-invariant (global) and time-variant (local) representations for speaker identity and linguistic content, respectively. Many of the existing models employ this assumption to tackle zero-shot voice conversion (VC), which converts speaker characteristics of any given utterance to any novel speakers while preserving the linguistic content. However, balancing capacity between the two representations is intricate, as the global representation tends to collapse due to its lower information capacity along the time axis than that of the local representation. We propose a simple and effective dropout technique that applies an information bottleneck to the local representation via multiplicative Gaussian noise, in order to encourage the usage of the global one. We endow existing zero-shot VC models with the proposed method and show significant improvements in speaker conversion in terms of speaker verification acceptance rate and comparable or better intelligibility measured in character error rate.en_US
dc.format.extent11676 - 11680
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.titlePosterior Variance-Parameterised Gaussian Dropout: Improving Disentangled Sequential Autoencoders for Zero-Shot Voice Conversionen_US
dc.typeConference Proceedingen_US
dc.rights.holder© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.identifier.doi10.1109/icassp48485.2024.10447835
pubs.notesNot knownen_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
qmul.funderUKRI Centre for Doctoral Training in Artificial Intelligence and Music::Engineering and Physical Sciences Research Councilen_US
rioxxterms.funder.projectb215eee3-195d-4c4f-a85d-169a4331c138en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record