Show simple item record

dc.contributor.authorSigtia, S
dc.contributor.authorBENETOS, E
dc.contributor.authorDixon, S
dc.date.accessioned2016-04-01T10:39:40Z
dc.date.available2016-04-01T10:39:40Z
dc.date.issued2016-02
dc.date.submitted2016-02-24T10:23:30.194Z
dc.identifier.citationSigtia, Siddharth, Emmanouil Benetos, and Simon Dixon, "An End-To-End Neural Network For Polyphonic Piano Music Transcription", IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (2016), 927-939 <http://dx.doi.org/10.1109/taslp.2016.2533858>en_US
dc.identifier.urihttp://qmro.qmul.ac.uk/xmlui/handle/123456789/11596
dc.description.abstractWe present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network-based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yield the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.en_US
dc.publisherIEEEen_US
dc.relation.isreplacedby123456789/17623
dc.relation.isreplacedbyhttp://qmro.qmul.ac.uk/xmlui/handle/123456789/17623
dc.rights“The final publication is available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7416164&tag=1”
dc.subjectAutomatic Music Transcriptionen_US
dc.subjectDeep Learningen_US
dc.subjectRecurrent Neural Networksen_US
dc.subjectMusic Language Modelsen_US
dc.titleAn End-to-End Neural Network for Polyphonic Piano Music Transcriptionen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/TASLP.2016.2533858
dc.relation.isPartOfIEEE/ACM Transactions on Audio, Speech, and Language Processing
pubs.publication-statusAccepted


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record