Audio-Visual Recognition Of Overlapped Speech For The Lrs2 Dataset

Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Bo Wu, Shiyin Kang, Helen Meng, Xunying Liu, Dong Yu, Shahram Ghorbani, Shansong Liu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:15

04 May 2020

Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89\% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Audio-Visual Recognition Of Overlapped Speech For The Lrs2 Dataset

Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Bo Wu, Shiyin Kang, Helen Meng, Xunying Liu, Dong Yu, Shahram Ghorbani, Shansong Liu

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society