ST-ITO: Controlling audio effects for style transfer with inference-time optimization

Steinmetz, C; Singh, S; Comunit�, M; Ibnyahya, I; Yuan, S; Benetos, E; Reiss, J; 25th International Society for Music Information Retrieval Conference (ISMIR)

dc.contributor.author	Steinmetz, C
dc.contributor.author	Singh, S
dc.contributor.author	Comunit�, M
dc.contributor.author	Ibnyahya, I
dc.contributor.author	Yuan, S
dc.contributor.author	Benetos, E
dc.contributor.author	Reiss, J
dc.contributor.author	25th International Society for Music Information Retrieval Conference (ISMIR)
dc.date.accessioned	2024-08-02T10:04:03Z
dc.date.available	2024-06-28
dc.date.available	2024-08-02T10:04:03Z
dc.date.issued	2024-11-10
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/98593
dc.description.abstract	Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized training techniques. In this work, we introduce ST-ITO, Style Transfer with Inference-Time Optimization, an approach that instead searches the parameter space of an audio effect chain at inference. This method enables control of arbitrary audio effect chains, including unseen and non-differentiable effects. Our approach employs a learned metric of audio production style, which we train through a simple and scalable self-supervised pretraining strategy, along with a gradient-free optimizer. Due to the limited existing evaluation methods for audio production style transfer, we introduce a multi-part benchmark to evaluate audio production style metrics and style transfer systems. This evaluation demonstrates that our audio representation better captures attributes related to audio production and enables expressive style transfer via control of arbitrary audio effects.	en_US
dc.publisher	ISMIR	en_US
dc.title	ST-ITO: Controlling audio effects for style transfer with inference-time optimization	en_US
dc.type	Conference Proceeding	en_US
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
dcterms.dateAccepted	2024-06-28
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
qmul.funder	Resource-efficient machine listening::Royal Academy of Engineering	en_US
qmul.funder	Resource-efficient machine listening::Royal Academy of Engineering	en_US
rioxxterms.funder.project	b215eee3-195d-4c4f-a85d-169a4331c138	en_US

Files in this item

Name:: Benetos ST-ITO: Controlling audio ...
Size:: 387.0Kb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3490]

Show simple item record