Skip to main content

EFFICIENT STUTTERING EVENT DETECTION USING SIAMESE NETWORKS

Payal Mohapatra (Northwestern University); Bashima Islam (Worcester Polytechnic Institute); MD Tamzeed Islam (Amazon); Ruochen Jiao (Northwestern University); Zhu Qi (Northwestern University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Speech disfluency research is pivotal as conversational technology and voice assistants have become commonplace. Stutter detection is critical in accommodating atypical speakers in current automatic speech recognition systems. However, the lack of publicly available labeled and unlabeled datasets is a significant bottleneck to this research. While many works use pseudo dysfluency data with proxy labels and formulate a self-supervised task, we see merit in using real-world data. We consolidate the corpora of publicly available speech disfluency datasets with and without labels and propose DisfluentSiam - a simple siamese network-based small-scale pretraining pipeline using task-specific data from multiple domains with only 10M trainable parameters. We show that with DisfluentSiam, we achieve an average of 15% boost in performance across five types of dysfluency event detection compared to direct wav2vec 2.0 embeddings. Especially with only 4-5 mins of labeled data for fine-tuning, the DisfluentSiam demonstrates the advantage of task-specific pretraining with up to 25% higher accuracy.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00