From Universal Approximation to Deep Regression: Theory and Practices

Chin-Hui Lee

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:02:57

Keynote Speech 17 Dec 2023

Many classical speech processing problems, such as enhancement, source separation, dereverberation, and bandwidth expansion, can be formulated as finding mapping functions to transform input to output spectra. Leveraging upon machine learning and big data paradigms, we cast these spectral mapping problems as learnable deep regression. Based on Komogorov’s Representation Theorem (1957), a multivariate scalar function can be expressed exactly as a superposition of a finite number of outer functions with another linear combination of inner functions embedded within. Cybenko (1989) developed a universal approximation theorem showing such a scalar function can be approximated by a superposition of sigmoid functions, inspiring a new wave of neural network algorithms. Barron (1993) later proved that the error in approximation can be tightly bounded and related to the representation power in learning theory. In this talk, we first present four new theorems to generalize the universal approximation theorems from sigmoid to deep neural networks (DNNs) and from vector-to-scalar to vector-to-vector regression. We also show that the generalization loss or regression error in machine learning theory can be decomposed into three terms, namely: approximation, estimation and optimization errors, such that each error term can be tightly bounded, separately.In practice, our developed theorems provide some guidelines for architecture selections in DNN designs. In a series of experiments for high-dimensional nonlinear regression, we validate our theory in terms of representation and generalization powers and demonstrate that, under adverse acoustic conditions, deep regression achieves a good speech quality and clear intelligibility for microphone-array based speech enhancement, separation and dereverberation. As a result, our proposed deep regression framework was also tested on many recent challenging tasks, including CHiME-2, CHiME-4, CHiME-5, CHiME-6, REVERB and DIHARD III. Our teams scored the lowest error rates in almost all the above-mentioned open evaluation scenarios. Finally, we believe a theoretical understanding of deep classification will be needed in order to advance automatic speech recognition and understanding (ASRU) technologies to the next level of performance and robustness.

Tags:

IEEE ASRU 2023

automatic speech recognition

Deep regression

From Universal Approximation to Deep Regression: Theory and Practices

Chin-Hui Lee

More Like This

End-to-End Automatic Speech Recognition

Neural Signal Interpretation for Spoken Communication

Towards a Speech Version of ChatGPT

Join the IEEE Signal Processing Society