A Low Complexity Long Short-Term Memory Based Voice Activity Detection

Ruiting Yang, Jie Liu, Xiang Deng, Zhuochao Zheng

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 09:57

24 Sep 2020

Voice Activity Detection (VAD) plays an important role in audio processing, but it is also a common challenge when a voice signal is corrupted with strong and transient noise. In this paper, an accurate and causal VAD module using a long short-term memory (LSTM) deep neural network is proposed. A set of features including Gammatone cepstral coefficients (GTCC) and selected spectral features are used. The low complex structure allows it can be easily implemented in speech processing algorithms and applications. With carefully pre-processing and labeling the collected training data in the classes of speech or non-speech and training on the LSTM net, experiments show the proposed VAD is able to distinguish speech from different types of noisy background effectively. Its robustness against changes including varying frame length, moving speech sources and speaking in different languages, are further investigated.

Tags:

sps conference

virtual workshop

mmsp 2020

September 2020

A Low Complexity Long Short-Term Memory Based Voice Activity Detection

Ruiting Yang, Jie Liu, Xiang Deng, Zhuochao Zheng

Value-Added Bundle(s) Including this Product

MMSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society