A Robust Audio-Visual Speech Enhancement Model

Wupeng Wang, Chao Xing, Dong Wang, Xiao Chen, Fengyu Sun

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 11:21

04 May 2020

Most existing audio-visual speech enhancement (AVSE) methods work well in conditions with strong noise,however when applied to conditions with a medium SNR, serious performance degradations are often observed. These degradations can be partly attributed to the feature-fusion(early fusion etc.) architecture that tightly couples the audio information that is very strong and the visual information that is relatively weak. In this paper, we present a safe AVSE approach that can make the visual stream contribute to audio speech enhancment(ASE) safely in conditions of various SNRs by late fusion.The key novelty is two-fold: Firstly, we define power binary masks (PBMs) as a rough representation of speech signals. This rough representation admits the weakness of the visual information and so can be easily predicted from the visual stream. Secondly, we design a posterior augmentation architecture that integrate the visual-derived PBMs to the audioderived masks via a gating network. By this architecture, the entire performance is lower-bounded by the audio-based component. Our experiments on the Grid dataset demonstrated that this new approach consistently outperforms the audiobased system in all noise conditions, confirming that it is a safe way to incorporate visual knowledge in speech enhancement.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

A Robust Audio-Visual Speech Enhancement Model

Wupeng Wang, Chao Xing, Dong Wang, Xiao Chen, Fengyu Sun

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society