Learning interpretable filters in Wav-Unet for speech enhancement
Félix MATHIEU (Telecom Paris); Thomas Courtat (Thales); Gaël Richard (Telecom Paris, Institut polytechnique de Paris); Geoffroy Peeters (LTCI - Télécom Paris, IP Paris)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Due to their performances, deep neural networks have emerged as a major method in nearly all modern audio processing applications. Deep neural networks can be used to estimate some parameters or hyperparameters of a model, or in some cases the entire model in an end-to-end fashion. Although Deep learning can lead to state of the art performances, they also suffer from inherent weaknesses as they usually remain complex and non interpretable to a large extent. For instance, the internal filters used in each layers are chosen in an adhoc manner with only a loose relation with the nature of the processed signal.
We propose in this paper an approach to learn interpretable filters within a specific neural architecture which allow to better understand the behaviour of the neural network and to reduce its complexity. We validate the approach on a task of speech enhancement and show that the gain in interpretability does not degrade the performance of the model.