What Does A Network Layer Hear? Analyzing Hidden Representations Of End-To-End Asr Through Speech Synthesis

Hung-yi Lee, Chung-Yi Li, Pei-Chieh Yuan

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:21

04 May 2020

End-to-end speech recognition systems have achieved competitive results compared to traditional systems. However, the complex transformations involved between layers given highly variable acoustic signals are hard to analyze. In this paper, we present our ASR probing model, which synthesizes speech from hidden representations of end-to-end ASR to examine the information maintained after each layer calculation. Listening to the synthesized speech, we observe gradual removal of speaker variability and noise as the layer goes deeper, which aligns with the previous studies on how deep network functions in speech recognition. This paper is the first study analyzing the end-to-end speech recognition model by demonstrating what each layer hears. Speaker verification and speech enhancement measurements on synthesized speech are also conducted to confirm our observation further.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

What Does A Network Layer Hear? Analyzing Hidden Representations Of End-To-End Asr Through Speech Synthesis

Hung-yi Lee, Chung-Yi Li, Pei-Chieh Yuan

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society