A Visual-Pilot Deep Fusion For Target Speech Separation In Multi-Talker Noisy Environment

Yun Li, Zhang Liu, Yueyue Na, Ziteng Wang, Biao Tian, Qiang Fu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 11:55

04 May 2020

Separating the target speech in multi-talker noisy environment is a challenging problem for audio-only source separation algorithms. The major problem behind is that the separated speech from the same talker can switch among the outputs across consecutive segments, causing the talker permutation issue. In this paper, we deploy face tracking and propose the low-dimension hand-crafted visual features and the low-cost deep fusion architectures to separate the unseen but visible target sources in multi-talker noisy environment. It is shown that our approach is not only capable of addressing the talker permutation issue but also producing additional separation improvement in challenging mixtures such as the same-gender overlapping ones on the public dataset. We also show that the significant improvement of the target speech recognition is achieved on the simulated real-world dataset. Our training is independent of the number of visible sources providing flexibility in deployment.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

A Visual-Pilot Deep Fusion For Target Speech Separation In Multi-Talker Noisy Environment

Yun Li, Zhang Liu, Yueyue Na, Ziteng Wang, Biao Tian, Qiang Fu

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society