Skip to main content

LIPREADING MODEL BASED ON WHOLE-PART COLLABORATIVE LEARNING

Weidong Tian, Housen Zhang, Chen Peng, Zhong-Qiu Zhao

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:08:30
11 May 2022

Lipreading is a task to recognize speech content from visual information of the speaker?s lips movements. Recently, some work has focused more on how to adequately extract temporal information, and spatial information is simply used after extraction. In this paper, we focus on the full use of spatial information in lipreading tasks. The whole lip represents global spatial information, while the parts of the lip contain fine-grained spatial information. We propose the lipreading model based on whole-part collaborative learning (WPCL), which can help this model make full use of both global and fine-grained spatial information of the lip. WPCL contains two branches, which deal with the whole and the part features respectively and are trained jointly by collaborative learning. Further, in order to highlight the different importance of part features when fusing them, we propose an adaptive part features fusion module (APFF) to fusion part features. Finally, we prove our viewpoints and evaluate our WPCL by severed experiments. Experiments on LRW and CAS-VSR-W1k datasets demonstrate that our approach achieves state-of-the-art performance.