THE NIO SYSTEM FOR AUDIO-VISUAL DIARIZATION AND RECOGNITION IN MISP CHALLENGE 2022
Gaopeng Xu (nio); Xianliang Wang (nio); Sang Wang (nio); junfeng yuan (nio); Wei Guo (nio); Wei Li (nio); Jie Gao (nio)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper describes NIO system for audio-visual diarization and recognition in the Multimodal Information Based Speech Processing (MISP) Challenge 2022. In our system, we proposed combining end-to-end audio-visual neural speaker diarization model and Channel-wise Av-fusion encoder with speaker signature for multi-channel audio-visual speech diarization and recognition. Our system reduces the concatenated minimum permutation character error rate(cpCER) by 34.36% absolute compared to the baseline in track 2.