Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:48
12 May 2022

Phoneme segmentation plays an important role in various speech processing applications such as keyword spotting, automatic pronunciation assessment, and automatic speech recognition. In this paper, we propose a method for phoneme segmentation based on a regularized attention mechanism. Specifically, the representations of speech utterance for each frame are extracted from a pre-trained acoustic encoder and combined with presumed phoneme sequences based on the attention mechanism. By fusing acoustic representations with these aligned phoneme representations, we learn phoneme labeling for each frame to obtain final segmentation. For better alignment between the pronounced phoneme sequence and utterance, we regularize the attention matrix utilizing an extra attention loss. The whole network is optimized by a multi-task learning framework (MTL). Experimental results based on the TIMIT and Buckeye corpora show the proposed method is superior to the previous baselines and reaches the state-of-the-art (SOTA) performance in F1 score and R-value.

More Like This

01 Feb 2024

P4.15-Attention Mechanism

1.00 pdh 0.10 ceu
  • SPS
    Members: Free
    IEEE Members: Free
    Non-members: Free
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00