A Low Complexity Long Short-Term Memory Based Voice Activity Detection
Ruiting Yang, Jie Liu, Xiang Deng, Zhuochao Zheng
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 09:57
Voice Activity Detection (VAD) plays an important role in audio processing, but it is also a common challenge when a voice signal is corrupted with strong and transient noise. In this paper, an accurate and causal VAD module using a long short-term memory (LSTM) deep neural network is proposed. A set of features including Gammatone cepstral coefficients (GTCC) and selected spectral features are used. The low complex structure allows it can be easily implemented in speech processing algorithms and applications. With carefully pre-processing and labeling the collected training data in the classes of speech or non-speech and training on the LSTM net, experiments show the proposed VAD is able to distinguish speech from different types of noisy background effectively. Its robustness against changes including varying frame length, moving speech sources and speaking in different languages, are further investigated.