Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 15:24
04 May 2020

In this paper, we design a novel front-end processing system for speaker diarization under realistic conditions with challenging background noises. To cope with diversified environments, we first extend our previously proposed progressive learning based speech enhancement model by adding multi-task learning in each intermediate layer. The corresponding progressive multi-target (PMT) in various layers includes both progressive ratio mask (PRM) and progressively enhanced log-power spectra (PELPS) with specified signal-to-noise-ratios (SNRs). Speech distortions are commonly introduced during the front-end processing, which often deteriorate the back-end performance. However, the proposed speech enhancement model can be regarded as a bagging of models with multiple learning objectives, which provides flexibility for selecting the most appropriate output for robust speaker diarization. In addition, a global SNR estimation is performed using the results of deep neural network (DNN) based speech activity detection (SAD) to decide whether the audio should be enhanced. We evaluate the speaker diarization performance on the second DIHARD dataset which includes several different realistic conditions. Compared with the original data, experiments demonstrate that the enhanced data processed by our proposed method can effectively avoid the performance loss of every single domain, and achieve consistent improvements in most domains.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00