ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN
Heming Wang, DeLiang Wang, Xueliang Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:42
Bone-conduction (BC) microphones capture speech signals by converting the vibrations of the human skull into electrical signals. BC sensors are insensitive to acoustic noise, but limited in bandwidth. On the other hand, conventional or air-conduction (AC) microphones are capable of capturing full-band speech, but are susceptible to background noise. We propose to combine the strengths of AC and BC microphones by employing a convolutional recurrent network that performs complex spectral mapping. To better utilize signals from both kinds of microphone, we employ attention-based fusion with early-fusion and late-fusion strategies. Experiments demonstrate the superiority of the proposed method over other recent speech enhancement methods combining BC and AC signals. In addition, our enhancement performance is significantly better than conventional speech enhancement counterparts, especially in low signal-to-noise ratio scenarios.