Time-domain music source separation for choirs and ensembles

Sarkar, S

dc.contributor.author	Sarkar, S	en_US
dc.date.accessioned	2024-03-28T09:52:45Z
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/95818
dc.description.abstract	Music source separation is the task of separating musical sources from an audio mixture. It has various direct applications including automatic karaoke generation, enhancing musical recordings, and 3D-audio upmixing; but also has implications for other downstream music information retrieval tasks such as multi-instrument transcription. However, the majority of research has focused on fixed stem separation of vocals, drums, and bass stems. While such models have highlighted capabilities of source separation using deep learning, their implications are limited to very few use cases. Such models are unable to separate most other instruments due to insufficient training data. Moreover, class-based separation inherently limits the applicability of such models to be unable to separate monotimbral mixtures. This thesis focuses on separating musical sources without requiring timbral distinction among the sources. Preliminary attempts focus on the separation of vocal harmonies from choral ensembles using time-domain models with permutation invariant training. The method performs well but fails to generalise across datasets mainly due to a lack of sizeable clean training data. Recognising the challenge of obtaining sizeable, bleed-free data for ensemble recordings, a new high-quality synthesised dataset "EnsembleSet" is presented which was used to train a monotimbral ensemble separation model for string ensembles. Moreover, training a model using permutation invariant training is found to be capable of separate mixtures of identical, distinct, and unseen timbres as well. Although models trained on EnsembleSet can separate mixtures from unseen real-world datasets, performance drops are observed for out-of-domain test data. Subsequently improving cross-dataset performance using fine-tuning is explored for time-domain and complex-domain separation models. Further investigation into the performance of these models with different training strategies and different musical contexts is investigated to achieve a better understanding of the behaviour of these timbre-agnostic separation models. The techniques developed in this work are currently being utilised in the industry for vocal harmony separation and also lay the groundwork for future exploration toward universal source separation based on monophonic sound event separation.	en_US
dc.language.iso	en	en_US
dc.title	Time-domain music source separation for choirs and ensembles	en_US
pubs.notes	Not known	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	Time-domain Music Source Separation: Developing Novel Tools for Music Production::Engineering and Physical Sciences Research Council	en_US
qmul.funder	Time-domain Music Source Separation: Developing Novel Tools for Music Production::Engineering and Physical Sciences Research Council	en_US

Files in this item

Name:: Thesis_Sarkar.pdf
Size:: 4.985Mb
Format:: application/
Description:: PhD Thesis

View/Open

This item appears in the following Collection(s)

Theses [4223]
Theses Awarded by Queen Mary University of London

Show simple item record