Music Metadata Capture in the Studio from Audio and Symbolic Data
Abstract
Music Information Retrieval (MIR) tasks, in the main, are concerned with
the accurate generation of one of a number of different types of music metadata
{beat onsets, or melody extraction, for example. Almost always,
they operate on fully mixed digital audio recordings. Commonly, this
means that a large amount of signal processing effort is directed towards
the isolation, and then identification, of certain highly relevant aspects of
the audio mix. In some cases, results of one MIR algorithm are useful, if
not essential, to the operation of another { a chord detection algorithm
for example, is highly dependent upon accurate pitch detection. Although
not clearly defined in all cases, certain rules exist which we may take from
music theory in order to assist the task { the particular note intervals
which make up a specific chord, for example.
On the question of generating accurate, low level music metadata (e.g.
chromatic pitch and score onset time), a potentially huge advantage lies
in the use of multitrack, rather than mixed, audio recordings, in which
the separate instrument recordings may be analysed in isolation.
Additionally, in MIR, as in many other research areas currently, there
is an increasing push towards the use of the Semantic Web for publishing
metadata using the Resource Description Framework (RDF). Semantic
Web technologies, though, also facilitate the querying of data via the
SPARQL query language, as well as logical inferencing via the careful
creation and use of web ontology language (OWL) ontologies. This, in
turn, opens up the intriguing possibility of deferring our decision regarding
which particular type of MIR query to ask of our low-level music
metadata until some point later down the line, long after all the heavy
signal processing has been carried out.
In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata
capture, and exploitation of open, machine-readable Semantic Web
data. Using the specific example of structural segmentation, we demonstrate
that by analysing multitrack rather than mixed audio, we are able
to achieve a significant and quantifiable increase in the accuracy of our
segmentation algorithm. We also provide details of a new multitrack audio
dataset with structural segmentation annotations, created as part of
this research, and available for public use.
Furthermore, we show that it is possible to fully implement a pair of
pattern discovery algorithms (the SIA and SIATEC algorithms { highly
applicable, but not restricted to, symbolic music data analysis) using only
SemanticWeb technologies { the SPARQL query language, acting on RDF
data, in tandem with a small OWL ontology. We describe the challenges
encountered by taking this approach, the particular solution we've arrived
at, and we evaluate the implementation both in terms of its execution time,
and also within the wider context of our vision for a new MIR paradigm.
Authors
Hargreaves, StevenCollections
- Theses [4270]