LIPREADING MODEL BASED ON WHOLE-PART COLLABORATIVE LEARNING

Weidong Tian, Housen Zhang, Chen Peng, Zhong-Qiu Zhao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:30

11 May 2022

Lipreading is a task to recognize speech content from visual information of the speaker?s lips movements. Recently, some work has focused more on how to adequately extract temporal information, and spatial information is simply used after extraction. In this paper, we focus on the full use of spatial information in lipreading tasks. The whole lip represents global spatial information, while the parts of the lip contain fine-grained spatial information. We propose the lipreading model based on whole-part collaborative learning (WPCL), which can help this model make full use of both global and fine-grained spatial information of the lip. WPCL contains two branches, which deal with the whole and the part features respectively and are trained jointly by collaborative learning. Further, in order to highlight the different importance of part features when fusing them, we propose an adaptive part features fusion module (APFF) to fusion part features. Finally, we prove our viewpoints and evaluate our WPCL by severed experiments. Experiments on LRW and CAS-VSR-W1k datasets demonstrate that our approach achieves state-of-the-art performance.

Tags:

lipreading

collaborative learning

feature fusion

whole-part

LIPREADING MODEL BASED ON WHOLE-PART COLLABORATIVE LEARNING

Weidong Tian, Housen Zhang, Chen Peng, Zhong-Qiu Zhao

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

LEVERAGING EFFICIENT TRAINING AND FEATURE FUSION IN TRANSFORMERS FOR MULTIMODAL CLASSIFICATION

LOW-SAMPLING-FREQUENCY PLANE WAVE MEDICAL ULTRASOUND IMAGING BASED ON ADVERSARIAL LEARNING

INFRARED SMALL TARGET DETECTION BASED ON SALIENCY GUIDED MULTI-TASK LEARNING

Join the IEEE Signal Processing Society