Abstract
Metadata such as mean opinion score (MOS) quality ratings are critical to improve the usability and accessibility of music archive collections. Developing a non-intrusive objective quality metric that predicts MOS of archive music collections is challenging, since it requires labeling large datasets made of real-world recordings, which currently do not exist for this task. In this paper, we show that the self-supervised learning (SSL) model wav2vec 2.0 can be successfully used to predict the perceived audio quality of archive music collections. Using vinyl recordings, we evaluated wav2vec 2.0 on a new dataset of 620 tracks labeled with crowdsourcing. The proposed model shows superior performance to perceptual measures adapted from speech quality prediction. Finally, we propose a new evaluation metric called pairwise ranking accuracy (PRA) that takes into account subjective rater uncertainty by measuring the ability of an objective metric to rank pairs with high-confidence labels.
Licence information
This item is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Attribution 3.0 United States
Copyright statements
© 2023 The Author(s). Published by IEEE