Information-Theoretic Measures of Predictability for Music Content Analysis.
Abstract
This thesis is concerned with determining similarity in musical audio, for the purpose of applications
in music content analysis. With the aim of determining similarity, we consider the
problem of representing temporal structure in music. To represent temporal structure, we propose
to compute information-theoretic measures of predictability in sequences. We apply our
measures to track-wise representations obtained from musical audio; thereafter we consider the
obtained measures predictors of musical similarity. We demonstrate that our approach benefits
music content analysis tasks based on musical similarity.
For the intermediate-specificity task of cover song identification, we compare contrasting
discrete-valued and continuous-valued measures of pairwise predictability between sequences.
In the discrete case, we devise a method for computing the normalised compression distance
(NCD) which accounts for correlation between sequences. We observe that our measure improves
average performance over NCD, for sequential compression algorithms. In the continuous
case, we propose to compute information-based measures as statistics of the prediction error
between sequences. Evaluated using 300 Jazz standards and using the Million Song Dataset,
we observe that continuous-valued approaches outperform discrete-valued approaches. Further,
we demonstrate that continuous-valued measures of predictability may be combined to improve
performance with respect to baseline approaches. Using a filter-and-refine approach, we demonstrate
state-of-the-art performance using the Million Song Dataset.
For the low-specificity tasks of similarity rating prediction and song year prediction, we propose
descriptors based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. We evaluate our descriptors
using a dataset of 15 500 track excerpts of Western popular music, for which we have 7 800
web-sourced pairwise similarity ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction.
For both tasks, analysis of selected descriptors reveals that representing features at multiple time
scales benefits prediction accuracy.
Authors
Foster, PeterCollections
- Theses [3834]