Improving Disfluency Detection with Multi-scale Self Attention and Contrastive Learning
Peiying Wang (JD AI); Chaoqun Duan (JD AI Research); Meng Chen (JD AI); Xiaodong He (JDT)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Disfluency detection aims to recognize disfluencies in sentences. Existing works usually adopt a sequence labeling model to tackle this task. They also attempt to integrate into models the feature that the disfluencies are similar to the correct phrase, the so-called "rough copy". However, they heavily rely on hand-craft features or word-to-word match patterns, which are insufficient to precisely capture such rough copy and cause under-tagging and over-tagging problems. To alleviate these problems, we propose a multi-scale self-attention mechanism (MSAT) and design contrastive learning (CL) loss for this task. Specifically, the MSAT leverages token representations to learn representations for different scales of phrases, and then compute similarity among them. The CL adopts the fluent version of the input to build the positive and negative samples and encourages the model to keep the fluent version consistent with the input in semantics. We conduct experiments on a public English dataset Switchboard, and an in-house Chinese dataset Waihu, which is derived from an online conversation bot. Results show that our method outperforms the baselines and achieves superior performance on both datasets.