Skip to main content

CHANNEL-WISE AV-FUSION ATTENTION FOR MULTI-CHANNEL AUDIO-VISUAL SPEECH RECOGNITION

Gaopeng Xu, Song Yang, Wei Li, Sang Wang, Wei Guo, Junfeng Yuan, Jie Gao

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:35
07 May 2022

In this paper, we present our work for automatic speech recognition (ASR) in the Multimodal Information Based Speech Processing (MISP) Challenge 2021. We proposed a combination of the guided source separation-based (GSS) speech enhancement technique and a novel Channel-wise Av-fusion encoder (CAE) based acoustic model and found that a kindly combination of these techniques provided essential accuracy improvements. Our ASR system reduces the Chinese Character Error Rate (CCER) by 37.67% absolute compared to the baseline in track 2, achieving first place in the evaluation period with the CCER of 25.07%.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00