Recognition of Sound Sources and Acoustic Events in Music and Environmental Audio
MetadataShow full item record
Hearing, together with other senses, enables us to perceive the surrounding world through sensory data we constantly receive. The information carried in this data allow us to classify the environment and the objects in it. In modern society the loud and noisy acoustic environment that surrounds us makes the task of "listening" quite challenging, probably more so than ever before. There is a lot of information that has to be filtered to separate the sounds we want to hear at from unwanted noise and interference. And yet, humans, as other living organisms, have a remarkable ability to identify and track the sounds they want, irrespectively of the number of them, the degree of overlap and the interference that surrounds them. To this day, the task of building systems that try to "listen" to the surrounding environment and identify sounds in it the same way humans do is a challenging one, and even though we have made steps towards reaching human performance we are still a long way from building systems able to identify and track most if not all the different sounds within an acoustic scene. In this thesis, we deal with the tasks of recognising sound sources or acoustic events in two distinct cases of audio – music and more generic environmental sounds. We reformulate the problem and redefine the task associated with each case. Music can also be regarded as a multisound source environment where the different sound sources (musical instruments) activate at different times, and the task of recognising the musical instruments is then a central part of the more generic process of automatic music transcription. The principal question we address is whether we could develop a system able to recognise musical instruments in a multi-instrument scenario where many different instruments are active at the same time, and for that we draw influence from human performance. The proposed system is based on missing feature theory and we find that the method is able to retain high performance even under the most adverse of listening conditions (i.e. low signal-to-noise ratio). Finally, we propose a technique to fuse this system with another that deals with automatic music transcription in an attempt to inform and improve the overall performance. For a more generic environmental audio scene, things are less clear and the amount of research conducted in the area is still scarce. The central issue here, is to formulate the problem of sound recognition, define the subtasks and associated difficulties. We have set up and run a worldwide challenge and created datasets that is intended to enable researchers to perform better quality research in the field. We have also developed proposed systems that could serve as baseline techniques for future research and also compared existing state-of-the-art algorithms to one another, and also against human performance, in an effort to highlight strengths and weaknesses of existing methodologies.
- Theses