Mitigating Domain Dependency for Improved Speech Enhancement via SNR Loss Boosting
Lili Yin ( Xinjiang University); Di Wu (Xinjiangdaxue); Zhibin Qiu (XinJiang University); Hao Huang (Xinjiang University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Current supervised speech enhancement methods based on deep learning typically utilize amplitude-based loss functions for optimization, such as Mean Absolute Error (MAE) or Mean Square Error (MSE) loss, which measures the difference between the amplitudes of the estimated and clean speech signals. However, models trained with these losses heavily depend on specific domain properties, i.e. speaker, noise type, and signal-to-noise ratio (SNR). In this paper, we first validate this assumption by visually analyzing the model's internal representation, and these dependencies result in severe performance degradation in unseen situations. Given that the SNR is irrelevant to speakers and noise types, we propose a simple but effective novel objective function by minimizing the discrepancy between indirectly estimated SNR and true SNR over time-frequency units to alleviate the model's reliance on those domain properties. Experimental results demonstrate that our proposed method outperforms other prevalent loss functions in terms of both performance gain and generalization capability.