Intelligent Control of Dynamic Range Compressor

Sheng, Di

View/Open

PhD Thesis (4.851Mb)

Publisher

Queen Mary University of London

Metadata

Show full item record

Abstract

Music production is an essential element in the value chain of modern music. It includes enhancing the recorded audio tracks, balancing the loudness level of multiple tracks as well as making artistic decisions to satisfy music genre, style and emotion. Similarly to related professions in creative media production, the tools for music making are now highly computerised. However, many parts of the work remain labour intensive and time consuming. The demand for intelligent tools is therefore growing. This situation encourages the emerging trend of ever increasing research into intelligent music production tools. Since audio effects are among the main tools used by music producers, there are many discussions and developments targeting the controlling mechanism of audio effects. This thesis is aiming at pushing the boundaries in this field by investigating the intelligent control of one of the essential audio effects, the dynamic range compressor. This research presents an innovative control system design. The core of this design is to learn from a reference audio, and control the dynamic range compressor to make the processed input audio sounds as close as possible to the reference. One of the proposed approaches can be divided into three stages, a feature extractor, a trained regression model, and an objective evaluation algorithm. In the feature extractor stage we firstly test feature sets using conventional audio features commonly used in speech and audio signal analyses. Substantially, we test handcrafted audio features specifically designed to characterise audio properties related to the dynamic range of audio samples. Research into feature design has been completed at different levels of complexity. A series of feature selection schemes are also assessed to select the optimal feature sets from both conventional and specifically designed audio features. In the subsequent stage of the research, feature extraction is replaced by a feature learning deep neural network (DNN). This is addressing the problem that the previous features are exclusive to each parameter, while a general feature extractor may be formed using DNN. A universal feature extractor can reduce the computational cost and become easier to adapt to more complex audio materials as well. The second stage of the control system is a trained regression model. Random forest regression is selected from several algorithms using experimental validation. Since different feature extractors are tested with increasingly complex audio material, as well as exclusive to the DRC’s parameters, e.g., attack time or compression ratio, separate models are trained and tested respectively. The third component of our approach is a method for evaluation. A computational audio similarity algorithm was designed to verify the results using auditory models. This algorithm is based on estimating the distance between two statistical models fitted on perceptually motivated audio features characterising similarity in loudness and timbre. Finally, the overall system is evaluated with both objective and subjective methods. The main contribution of this Thesis is a method for using a reference audio to control a dynamic range compressor. Besides the system design, the analysis of the evaluation provides useful insights of the relations between audio effects and audio features as well as auditory perception. The research is conducted in a way that it is possible to transfer the knowledge to other audio effects and other use case scenarios, providing an alternative research direction in the field of intelligent music production and simplifying how audio effects are controlled for end users. 2

Authors

Sheng, Di

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/69433

Collections

Theses [4122]