Artificial Bandwidth Extension Using Conditional Variational Auto-Encoders And Adversarial Learning
Pramod Bachhav, Nicholas Evans, Massimiliano Todisco
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:29
Artificial bandwidth extension (ABE) algorithms have been developed to estimate missing highband frequency components (4-8kHz) to improve quality of narrowband (0-4kHz) telephone calls. Most ABE solutions employ deep neural networks (DNNs) due to their well-known ability to model highly complex, non-linear relationship between narrowband and highband features. Generative models such as conditional variational auto-encoders (CVAEs) are capable of modelling complex data distributions via latent representation learning. This paper reports their application to ABE. CVAEs, form of directed, graphical models, are exploited to model the probability distribution of highband features conditioned on narrowband features. While CVAEs are trained with the standard mean square criterion (MSE), their combination with adversarial learning give further improvements. When compared to results obtained with the baseline approach, the wideband PESQ is improved significantly by 0.21 points. The performance is also compared on an automatic speech recognition (ASR) task on the TIMIT dataset where word error rate (WER) is decreased by an absolute value of 0.3%.