The effect of spectrogram reconstructions on automatic music transcription: an alternative approach to improve transcription accuracy

Benetos, E; Luo, Y; Cheuk, KW; Herremans, D; 25th International Conference on Pattern Recognition (ICPR2020)

dc.contributor.author	Benetos, E	en_US
dc.contributor.author	Luo, Y	en_US
dc.contributor.author	Cheuk, KW	en_US
dc.contributor.author	Herremans, D	en_US
dc.contributor.author	25th International Conference on Pattern Recognition (ICPR2020)	en_US
dc.date.accessioned	2020-10-23T09:13:09Z
dc.date.available	2020-10-11	en_US
dc.date.issued	2021-01-10	en_US
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/67744
dc.description.abstract	Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks. In this paper, we do not aim at achieving state-of-the-art transcription accuracy, instead, we explore the effect that spectrogram reconstruction has on our AMT model. Our proposed model consists of two U-nets: the first U-net transcribes the spectrogram into a posteriorgram, and a second U-net transforms the posteriorgram back into a spectrogram. A reconstruction loss is applied between the original spectrogram and the reconstructed spectrogram to constrain the second U-net to focus only on reconstruction. We train our model on three different datasets: MAPS, MAESTRO, and MusicNet. Our experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part. Moreover, it can also boost the frame-level precision to be higher than the state-of-the-art models. The feature maps learned by our U-net contain gridlike structures (not present in the baseline model) which implies that with the presence of the reconstruction loss, the model is probably trying to count along both the time and frequency axis, resulting in a higher note-level transcription accuracy.	en_US
dc.format.extent	? - ? (8)	en_US
dc.title	The effect of spectrogram reconstructions on automatic music transcription: an alternative approach to improve transcription accuracy	en_US
dc.type	Conference Proceeding
dc.rights.holder	© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
pubs.notes	Not known	en_US
pubs.publication-status	Accepted	en_US
pubs.publisher-url	https://www.micc.unifi.it/icpr2020/	en_US
dcterms.dateAccepted	2020-10-11	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: Benetos The effect of 2021 ...
Size:: 3.426Mb
Format:: application/
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3472]

Show simple item record