Skip to main content

Pitch Mark Detection from Noisy Speech Waveform using Wave-U-Net

Hyun-Joon Nam (Pohang University of Science and Technology); Hong-June Park (Pohang University of Science and Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Pitch mark (PM) is a time point corresponding to the closing time of vocal fold in voiced speech. PMs are useful for real-life speech processing because of their noise immunity. Wave-U-PM, a Wave-U-Net based neural network, is proposed to detect PMs from noisy speech. The ground truth PMs are generated from clean speech by using REAPER; this increases the available speech dataset for training to 100 hours, while the dataset for the electroglottograph (EGG) based PM detection is less than 5 hours. Wave-U-PM has an encoder and two decoders. The first decoder generates a sinusoidal PM waveform, whose positive peak times represent the PMs. The second decoder generates a combined pitch and formant waveform below 1000Hz. Wave-U-PM outperforms previous PM detection works by 11% and 31% for the voiced and the entire speech intervals, respectively, in the identification rate (IDR) at 0 dB SNR. The second decoder enhances IDR by 2.5% for the entire speech interval.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00