Analysing the predictions of a CNN-based replay spoofing detection system
92 - 97
MetadataShow full item record
Playing recorded speech samples of an enrolled speaker - "replay attack" - is a simple approach to bypass an automatic speaker verification (ASV) system. The vulnerability of ASV systems to such attacks has been acknowledged and studied, but there has been no research into what spoofing detection systems are actually learning to discriminate. In this paper, we analyse the local behaviour of a replay spoofing detection system based on convolutional neural networks (CNNs) adapted from a state-of-the-art CNN (LCNN-FFT) submitted at the ASVspoof 2017 challenge. We generate temporal and spectral explanations for predictions of the model using the SLIME algorithm. Our findings suggest that in most instances of spoofing the model is using information in the first 400 milliseconds of each audio instance to make the class prediction. Knowledge of the characteristics that spoofing detection systems are exploiting can help build less vulnerable ASV systems, other spoofing detection systems, as well as better evaluation databases.