Automatic Music Transcription using Structure and Sparsity
MetadataShow full item record
Automatic Music Transcription seeks a machine understanding of a musical signal in terms of pitch-time activations. One popular approach to this problem is the use of spectrogram decompositions, whereby a signal matrix is decomposed over a dictionary of spectral templates, each representing a note. Typically the decomposition is performed using gradient descent based methods, performed using multiplicative updates based on Non-negative Matrix Factorisation (NMF). The final representation may be expected to be sparse, as the musical signal itself is considered to consist of few active notes. In this thesis some concepts that are familiar in the sparse representations literature are introduced to the AMT problem. Structured sparsity assumes that certain atoms tend to be active together. In the context of AMT this affords the use of subspace modelling of notes, and non-negative group sparse algorithms are proposed in order to exploit the greater modelling capability introduced. Stepwise methods are often used for decomposing sparse signals and their use for AMT has previously been limited. Some new approaches to AMT are proposed by incorporation of stepwise optimal approaches with promising results seen. Dictionary coherence is used to provide recovery conditions for sparse algorithms. While such guarantees are not possible in the context of AMT, it is found that coherence is a useful parameter to consider, affording improved performance in spectrogram decompositions.
- Theses