Robust Voice Activity Detection Using A Masked Auditory Encoder Based Convolutional Neural Network

Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li, Rui Wang, Meng Ge, Jianwu Dang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:21

10 Jun 2021

Voice activity detection (VAD) based on deep learning has achieved remarkable success. However, when the traditional features (e.g., raw waveforms and MFCCs) are directly fed to the deep neural network model, the performance decreases because of noise interference. Here, we propose a robust VAD approach using a masked auditory encoder based convolutional neural network (M-AECNN). First, we analyze the effectiveness of using auditory features as deep learning encoder. These features can roughly simulate the transmission of sound to human inner-ear hair cells; thus, they are more robust than the raw waveform and frequency domain features designed as encoders. Second, similar to the human ear’s masking effect for different speech frequencies, the proposed auditory encoder can further improve the robustness of VAD by increasing the gain for cleaner speech frequencies. Extensive experimental results demonstrate that this approach achieves about 10.5% absolute improvement in the area under the curve on the AURORA-2J dataset compared with a VAD method based on a CNN and MFCCs.

Chairs:

Douglas O&#039,Shaughnessy

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Robust Voice Activity Detection Using A Masked Auditory Encoder Based Convolutional Neural Network

Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li, Rui Wang, Meng Ge, Jianwu Dang

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Navigating the Transition to Sustainable Energy Solutions in a Power-Hungry World

Panel: Leveraging Technology to Achieve Carbon Neutrality of Buildings and Factories

Panel: Charting the Course for Future-Ready Data Centers in the Era of Sustainability

Join the IEEE Signal Processing Society