A Study On Convolutional Neural Network Based End-To-End Replay Anti-Spoofing
MetadataShow full item record
The second Automatic Speaker Verification Spoofing and Countermeasures challenge (ASVspoof 2017) focused on "replay attack" detection. The best deep-learning systems to compete in ASVspoof 2017 used Convolutional Neural Networks (CNNs) as a feature extractor. In this paper, we study their performance in an end-to-end setting. We find that these architectures show poor generalization in the evaluation dataset, but find a compact architecture that shows good generalization on the development data. We demonstrate that for this dataset it is not easy to obtain a similar level of generalization on both the development and evaluation data. This leads to a variety of open questions about what the differences are in the data; why these are more evident in an end-to-end setting; and how these issues can be overcome by increasing the training data.