Neural Networks for Analysing Music and Environmental Audio

Sigtia, Siddharth

dc.contributor.author	Sigtia, Siddharth
dc.date.accessioned	2017-07-06T12:30:05Z
dc.date.available	2017-07-06T12:30:05Z
dc.date.issued	2017-02-28
dc.date.submitted	2017-07-06T12:40:20.574Z
dc.identifier.citation	Sigtia, S. 2017. Neural Networks for Analysing Music and Environmental Audio. Queen Mary University of London	en_US
dc.identifier.uri	http://qmro.qmul.ac.uk/xmlui/handle/123456789/24741
dc.description	PhD	en_US
dc.description.abstract	In this thesis, we consider the analysis of music and environmental audio recordings with neural networks. Recently, neural networks have been shown to be an effective family of models for speech recognition, computer vision, natural language processing and a number of other statistical modelling problems. The composite layer-wise structure of neural networks allows for flexible model design, where prior knowledge about the domain of application can be used to inform the design and architecture of the neural network models. Additionally, it has been shown that when trained on sufficient quantities of data, neural networks can be directly applied to low-level features to learn mappings to high level concepts like phonemes in speech and object classes in computer vision. In this thesis we investigate whether neural network models can be usefully applied to processing music and environmental audio. With regards to music signal analysis, we investigate 2 different problems. The fi rst problem, automatic music transcription, aims to identify the score or the sequence of musical notes that comprise an audio recording. We also consider the problem of automatic chord transcription, where the aim is to identify the sequence of chords in a given audio recording. For both problems, we design neural network acoustic models which are applied to low-level time-frequency features in order to detect the presence of notes or chords. Our results demonstrate that the neural network acoustic models perform similarly to state-of-the-art acoustic models, without the need for any feature engineering. The networks are able to learn complex transformations from time-frequency features to the desired outputs, given sufficient amounts of training data. Additionally, we use recurrent neural networks to model the temporal structure of sequences of notes or chords, similar to language modelling in speech. Our results demonstrate that the combination of the acoustic and language model predictions yields improved performance over the acoustic models alone. We also observe that convolutional neural networks yield better performance compared to other neural network architectures for acoustic modelling. For the analysis of environmental audio recordings, we consider the problem of acoustic event detection. Acoustic event detection has a similar structure to automatic music and chord transcription, where the system is required to output the correct sequence of semantic labels along with onset and offset times. We compare the performance of neural network architectures against Gaussian mixture models and support vector machines. In order to account for the fact that such systems are typically deployed on embedded devices, we compare performance as a function of the computational cost of each model. We evaluate the models on 2 large datasets of real-world recordings of baby cries and smoke alarms. Our results demonstrate that the neural networks clearly outperform the other models and they are able to do so without incurring a heavy computation cost.	en_US
dc.language.iso	en	en_US
dc.publisher	Queen Mary University of London	en_US
dc.rights	The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author
dc.subject	Electronic Engineering and Computer Science	en_US
dc.subject	neural networks	en_US
dc.subject	music signal analysis	en_US
dc.subject	acoustic event detection	en_US
dc.title	Neural Networks for Analysing Music and Environmental Audio	en_US
dc.type	Thesis	en_US

Files in this item

Name:: SIGTIA_Sid_Final_160217.pdf
Size:: 5.683Mb
Format:: application/

View/Open

This item appears in the following Collection(s)

Theses [4209]
Theses Awarded by Queen Mary University of London

Show simple item record