Automatic Music Transcription using Structure and Sparsity
Abstract
Automatic Music Transcription seeks a machine understanding of a musical signal in terms of
pitch-time activations. One popular approach to this problem is the use of spectrogram decompositions,
whereby a signal matrix is decomposed over a dictionary of spectral templates, each
representing a note. Typically the decomposition is performed using gradient descent based
methods, performed using multiplicative updates based on Non-negative Matrix Factorisation
(NMF). The final representation may be expected to be sparse, as the musical signal itself is considered
to consist of few active notes. In this thesis some concepts that are familiar in the sparse
representations literature are introduced to the AMT problem. Structured sparsity assumes that
certain atoms tend to be active together. In the context of AMT this affords the use of subspace
modelling of notes, and non-negative group sparse algorithms are proposed in order to exploit
the greater modelling capability introduced. Stepwise methods are often used for decomposing
sparse signals and their use for AMT has previously been limited. Some new approaches to
AMT are proposed by incorporation of stepwise optimal approaches with promising results seen.
Dictionary coherence is used to provide recovery conditions for sparse algorithms. While such
guarantees are not possible in the context of AMT, it is found that coherence is a useful parameter
to consider, affording improved performance in spectrogram decompositions.
Authors
O'Hanlon, KenCollections
- Theses [4403]