Object coding of music using expressive MIDI
Abstract
Structured audio uses a high level representation of a signal to produce audio output.
When it was first introduced in 1998, creating a structured audio representation
from an audio signal was beyond the state-of-the-art. Inspired by object coding and
structured audio, we present a system to reproduce audio using Expressive MIDI,
high-level parameters being used to represent pitch expression from an audio signal.
This allows a low bit-rate MIDI sketch of the original audio to be produced.
We examine optimisation techniques which may be suitable for inferring Expressive
MIDI parameters from estimated pitch trajectories, considering the effect of data
codings on the difficulty of optimisation. We look at some less common Gray codes
and examine their effect on algorithm performance on standard test problems.
We build an expressive MIDI system, estimating parameters from audio and synthesising
output from those parameters. When the parameter estimation succeeds,
we find that the system produces note pitch trajectories which match source audio to
within 10 pitch cents. We consider the quality of the system in terms of both parameter
estimation and the final output, finding that improvements to core components {
audio segmentation and pitch estimation, both active research fields { would produce
a better system.
We examine the current state-of-the-art in pitch estimation, and find that some
estimators produce high precision estimates but are prone to harmonic errors, whilst
other estimators produce fewer harmonic errors but are less precise. Inspired by this,
we produce a novel pitch estimator combining the output of existing estimators.
Authors
Welburn, Stephen J.Collections
- Theses [3706]