Analysis and classification of phonation modes in singing

STOLLER, D; Dixon, S; 17th International Society for Music Information Retrieval Conference (ISMIR 2016)

View/Open

Metadata

Abstract

Phonation mode is an expressive aspect of the singing voice and can be described using the four categories neutral, breathy, pressed and flow. Previous attempts at automatically classifying the phonation mode on a dataset containing vowels sung by a female professional have been lacking in accuracy or have not sufficiently investigated the characteristic features of the different phonation modes which enable successful classification. In this paper, we extract a large range of features from this dataset, including specialised descriptors of pressedness and breathiness, to analyse their explanatory power and robustness against changes of pitch and vowel. We train and optimise a feed-forward neural network (NN) with one hidden layer on all features using cross validation to achieve a mean F-measure above 0.85 and an improved performance compared to previous work. Applying feature selection based on mutual information and retaining the nine highest ranked features as input to a NN results in a mean F-measure of 0.78, demonstrating the suitability of these features to discriminate between phonation modes. Training and pruning a decision tree yields a simple rule set based only on cepstral peak prominence (CPP), temporal flatness and average energy that correctly categorises 78% of the recordings.

Authors

STOLLER, D; Dixon, S; 17th International Society for Music Information Retrieval Conference (ISMIR 2016)

URI

http://qmro.qmul.ac.uk/xmlui/handle/123456789/13500

Collections

Electronic Engineering and Computer Science [3475]

Licence information

http://wp.nyu.edu/ismir2016/