DUAL ATTENTION POOLING NETWORK FOR RECORDING DEVICE CLASSIFICATION USING NEUTRAL AND WHISPERED SPEECH

Abinay Reddy Naini, Prasanta Kumar Ghosh, Bhavuk Singhal

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:19

13 May 2022

In this work, we proposed a method to identify the recording device using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving other speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We have also demonstrated illustratively the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.

Tags:

whispered speech

dual attention pooling network

recording device

DUAL ATTENTION POOLING NETWORK FOR RECORDING DEVICE CLASSIFICATION USING NEUTRAL AND WHISPERED SPEECH

Abinay Reddy Naini, Prasanta Kumar Ghosh, Bhavuk Singhal

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

Sorry, no results were found

Join the IEEE Signal Processing Society