Information-Theoretic Measures of Predictability for Music Content Analysis.
MetadataShow full item record
This thesis is concerned with determining similarity in musical audio, for the purpose of applications in music content analysis. With the aim of determining similarity, we consider the problem of representing temporal structure in music. To represent temporal structure, we propose to compute information-theoretic measures of predictability in sequences. We apply our measures to track-wise representations obtained from musical audio; thereafter we consider the obtained measures predictors of musical similarity. We demonstrate that our approach benefits music content analysis tasks based on musical similarity. For the intermediate-specificity task of cover song identification, we compare contrasting discrete-valued and continuous-valued measures of pairwise predictability between sequences. In the discrete case, we devise a method for computing the normalised compression distance (NCD) which accounts for correlation between sequences. We observe that our measure improves average performance over NCD, for sequential compression algorithms. In the continuous case, we propose to compute information-based measures as statistics of the prediction error between sequences. Evaluated using 300 Jazz standards and using the Million Song Dataset, we observe that continuous-valued approaches outperform discrete-valued approaches. Further, we demonstrate that continuous-valued measures of predictability may be combined to improve performance with respect to baseline approaches. Using a filter-and-refine approach, we demonstrate state-of-the-art performance using the Million Song Dataset. For the low-specificity tasks of similarity rating prediction and song year prediction, we propose descriptors based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. We evaluate our descriptors using a dataset of 15 500 track excerpts of Western popular music, for which we have 7 800 web-sourced pairwise similarity ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.
- Theses