Automatic chord transcription from audio using computational models of musical context
This thesis is concerned with the automatic transcription of chords from audio, with an emphasis on modern popular music. Musical context such as the key and the structural segmentation aid the interpretation of chords in human beings. In this thesis we propose computational models that integrate such musical context into the automatic chord estimation process. We present a novel dynamic Bayesian network (DBN) which integrates models of metric position, key, chord, bass note and two beat-synchronous audio features (bass and treble chroma) into a single high-level musical context model. We simultaneously infer the most probable sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several experiments with real world data show that adding context parameters results in a significant increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed, most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline method for two further enhancements. Firstly, we aim to improve chord confusion behaviour by modifying the audio front end processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect of using chromagrams generated from an approximate pitch transcription method. We show that using chromagrams from approximate transcription results in the most substantial increase in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of the art. Secondly, we propose a method by which chromagram information is shared between repeated structural segments (such as verses) in a song. This can be done fully automatically using a novel structural segmentation algorithm tailored to this task. We show that the technique leads to a significant increase in accuracy and readability. The segmentation algorithm itself also obtains state-of-the-art results. A method that combines both of the above enhancements reaches an accuracy of 81%, a statistically significant improvement over the best result (74%) in the 2009 MIREX Chord Detection tasks.
- Theses