Intelligent Control of Dynamic Range Compressor
Abstract
Music production is an essential element in the value chain of modern music. It includes
enhancing the recorded audio tracks, balancing the loudness level of multiple tracks as well
as making artistic decisions to satisfy music genre, style and emotion. Similarly to related
professions in creative media production, the tools for music making are now highly computerised.
However, many parts of the work remain labour intensive and time consuming.
The demand for intelligent tools is therefore growing. This situation encourages the emerging
trend of ever increasing research into intelligent music production tools. Since audio
effects are among the main tools used by music producers, there are many discussions and
developments targeting the controlling mechanism of audio effects. This thesis is aiming
at pushing the boundaries in this field by investigating the intelligent control of one of the
essential audio effects, the dynamic range compressor.
This research presents an innovative control system design. The core of this design is
to learn from a reference audio, and control the dynamic range compressor to make the
processed input audio sounds as close as possible to the reference. One of the proposed
approaches can be divided into three stages, a feature extractor, a trained regression model,
and an objective evaluation algorithm. In the feature extractor stage we firstly test feature
sets using conventional audio features commonly used in speech and audio signal analyses.
Substantially, we test handcrafted audio features specifically designed to characterise audio
properties related to the dynamic range of audio samples. Research into feature design has
been completed at different levels of complexity. A series of feature selection schemes are
also assessed to select the optimal feature sets from both conventional and specifically
designed audio features. In the subsequent stage of the research, feature extraction is
replaced by a feature learning deep neural network (DNN). This is addressing the problem
that the previous features are exclusive to each parameter, while a general feature extractor
may be formed using DNN. A universal feature extractor can reduce the computational
cost and become easier to adapt to more complex audio materials as well. The second
stage of the control system is a trained regression model. Random forest regression is
selected from several algorithms using experimental validation. Since different feature
extractors are tested with increasingly complex audio material, as well as exclusive to the
DRC’s parameters, e.g., attack time or compression ratio, separate models are trained and
tested respectively. The third component of our approach is a method for evaluation. A
computational audio similarity algorithm was designed to verify the results using auditory
models. This algorithm is based on estimating the distance between two statistical models
fitted on perceptually motivated audio features characterising similarity in loudness and
timbre. Finally, the overall system is evaluated with both objective and subjective methods.
The main contribution of this Thesis is a method for using a reference audio to control
a dynamic range compressor. Besides the system design, the analysis of the evaluation
provides useful insights of the relations between audio effects and audio features as well
as auditory perception. The research is conducted in a way that it is possible to transfer
the knowledge to other audio effects and other use case scenarios, providing an alternative
research direction in the field of intelligent music production and simplifying how audio
effects are controlled for end users.
2
Authors
Sheng, DiCollections
- Theses [4122]