Real-Time Speech Enhancement Using Equilibriated Rnn
Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Noboru Harada, Yasuhiro Oikawa
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:24
We propose a speech enhancement method using a causal deep neural network (DNN) for real-time applications. DNN has been widely used for estimating a time-frequency (T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term memory (LSTM) is often used to alleviate the vanishing/exploding gradient problem which makes the training of an RNN difficult. However, the number of parameters of LSTM is increased as the price of mitigating the difficulty of training, which requires more computational resources. For real-time speech enhancement, it is preferable to use a smaller network without losing the performance. In this paper, we propose to use the equilibriated recurrent neural network (ERNN) for avoiding the vanishing/exploding gradient problem without increasing the number of parameters. The proposed DNN also has the causal structure which uses no future information to apply in real-time. Compared to the uni- and bi-directional LSTM networks, the proposed method achieved similar performance with much fewer parameters.