Vocal imitation of synthesised sounds varying in pitch, loudness and spectral centroid
783 - 796 (15)
Journal of the Acoustical Society of America
MetadataShow full item record
Vocal imitations are often used to convey sonic ideas [Lemaitre, Dessein, Susini, and Aura. (2011). Ecol. Psych. 23 (4), 267–307]. For computer based systems to interpret these vocalisations, it is advantageous to apply knowledge of what happens when people vocalise sounds where the acoustic features have different temporal envelopes. In the present study, 19 experienced musi- cians and music producers were asked to imitate 44 sounds with one or two feature envelopes applied. The study addresses two main questions: (1) How accurately can people imitate ramp and modulation envelopes for pitch, loudness, and spectral centroid?; (2) What happens to this accu- racy when people are asked to imitate two feature envelopes simultaneously? The results show that experienced musicians can imitate pitch, loudness, and spectral centroid accurately, and that imitation accuracy is generally preserved when the imitated stimuli combine two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously.