A study on LSTM networks for polyphonic music sequence modelling

Ycart, A; Benetos, E; 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

View/Open

Accepted version (575.8Kb)

Pagination

421 - 427 (7)

Publisher

ISMIR

Publisher URL

https://ismir2017.smcnus.org/

Metadata

Show full item record

Abstract

Neural networks, and especially long short-term memory networks (LSTM), have become increasingly popular for sequence modelling, be it in text, speech, or music. In this paper, we investigate the predictive power of simple LSTM networks for polyphonic MIDI sequences, using an empirical approach. Such systems can then be used as a music language model which, combined with an acoustic model, can improve automatic music transcription (AMT) performance. As a first step, we experiment with synthetic MIDI data, and we compare the results obtained in various settings, throughout the training process. In particular, we compare the use of a fixed sample rate against a musically-relevant sample rate. We test this system both on synthetic and real MIDI data. Results are compared in terms of note prediction accuracy. We show that the higher the sample rate is, the better the prediction is, because self transitions are more frequent. We suggest that for AMT, a musically-relevant sample rate is crucial in order to model note transitions, beyond a simple smoothing effect.

Authors

Ycart, A; Benetos, E; 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

URI

http://qmro.qmul.ac.uk/xmlui/handle/123456789/24946

Collections

Electronic Engineering and Computer Science [3475]

Licence information

Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Adrien Ycart and Emmanouil Benetos. “A study on LSTM networks for polyphonic music sequence modelling”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.