Show simple item record

dc.contributor.authorSarkar, Sen_US
dc.date.accessioned2024-03-28T09:52:45Z
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/95818
dc.description.abstractMusic source separation is the task of separating musical sources from an audio mixture. It has various direct applications including automatic karaoke generation, enhancing musical recordings, and 3D-audio upmixing; but also has implications for other downstream music information retrieval tasks such as multi-instrument transcription. However, the majority of research has focused on fixed stem separation of vocals, drums, and bass stems. While such models have highlighted capabilities of source separation using deep learning, their implications are limited to very few use cases. Such models are unable to separate most other instruments due to insufficient training data. Moreover, class-based separation inherently limits the applicability of such models to be unable to separate monotimbral mixtures. This thesis focuses on separating musical sources without requiring timbral distinction among the sources. Preliminary attempts focus on the separation of vocal harmonies from choral ensembles using time-domain models with permutation invariant training. The method performs well but fails to generalise across datasets mainly due to a lack of sizeable clean training data. Recognising the challenge of obtaining sizeable, bleed-free data for ensemble recordings, a new high-quality synthesised dataset "EnsembleSet" is presented which was used to train a monotimbral ensemble separation model for string ensembles. Moreover, training a model using permutation invariant training is found to be capable of separate mixtures of identical, distinct, and unseen timbres as well. Although models trained on EnsembleSet can separate mixtures from unseen real-world datasets, performance drops are observed for out-of-domain test data. Subsequently improving cross-dataset performance using fine-tuning is explored for time-domain and complex-domain separation models. Further investigation into the performance of these models with different training strategies and different musical contexts is investigated to achieve a better understanding of the behaviour of these timbre-agnostic separation models. The techniques developed in this work are currently being utilised in the industry for vocal harmony separation and also lay the groundwork for future exploration toward universal source separation based on monophonic sound event separation.en_US
dc.language.isoenen_US
dc.titleTime-domain music source separation for choirs and ensemblesen_US
pubs.notesNot knownen_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
qmul.funderTime-domain Music Source Separation: Developing Novel Tools for Music Production::Engineering and Physical Sciences Research Councilen_US
qmul.funderTime-domain Music Source Separation: Developing Novel Tools for Music Production::Engineering and Physical Sciences Research Councilen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Theses [4223]
    Theses Awarded by Queen Mary University of London

Show simple item record