A PRIORI SNR ESTIMATION FOR SPEECH ENHANCEMENT BASED ON PESQ-INDUCED REINFORCEMENT LEARNING
Tong Lei, Haoxin Ruan, Kai Chen, Jing Lu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:13:45
Perceptual evaluation of speech quality (PESQ) is widely accepted as an effective objective metric closely related to the speech quality sensed by human listening perception. Due to its evaluation complexity and non-differentiability, PESQ is difficult to include in the cost function for deep learning-based speech enhancement. In this paper, we focus on introducing PESQ to improve Deep Xi, a recently proposed minimum mean square error (MMSE) based speech enhancement with a priori signal-to-ratio (SNR) estimated by a deep neural network. Regarding discrete a priori SNR as actions, we apply reinforcement learning (RL) to select the optimal SNR at the frame level through the reward function associated with PESQ. The experimental results show that the RL-trained network is able to achieve a better PESQ score, especially in low SNR conditions.