Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment
View/ Open
Published Version
Embargoed until: 5555-01-01
Reason: Version not permitted.
Embargoed until: 5555-01-01
Reason: Version not permitted.
Volume
20
Pagination
200 - 210 (11)
DOI
10.1109/TASL.2011.2159595
Journal
IEEE Transactions on Audio, Speech and Language Processing
Issue
Metadata
Show full item recordAbstract
Aligning lyrics to audio has a wide range of applications such as the automatic generation of karaoke scores, song-browsing by lyrics, and the generation of audio thumbnails. Existing methods are restricted to using only lyrics and match them to phoneme features extracted from the audio (usually mel-frequency cepstral coefficients). Our novel idea is to integrate the textual chord information provided in the paired chords-lyrics format known from song books and Internet sites into the inference procedure. We propose two novel methods that implement this idea: First, assuming that all chords of a song are known, we extend a hidden Markov model (HMM) framework by including chord changes in the Markov chain and an additional audio feature (chroma) in the emission vector; second, for the more realistic case in which some chord information is missing, we present a method that recovers the missing chord information by exploiting repetition in the song. We conducted experiments with five changing parameters and show that with accuracies of 87.5% and 76.7%, respectively, both methods perform better than the baseline with statistical significance. We introduce the new accompaniment interface Song Prompter, which uses the automatically aligned lyrics to guide musicians through a song. It demonstrates that the automatic alignment is accurate enough to be used in a musical performance.