Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:13:59
10 Jun 2021

We pursue an interpretable pitch tracking model and a jointly trained tone model for Mandarin tone classification. For pitch tracking, present deep learning based pitch model structure seldom considers the Viterbi decoding commonly implemented in prevalent manually designed pitch tracking algorithms. We propose RNN based Encoder-Decoder framework with gating mechanism which underlying models both the state cost estimation and Viterbi back-tracing pass implemented in the RAPT algorithm. Then we apply the pitch extractor to a down-stream Mandarin tone classification task. The basic motivation is to combine together the two conventional components in tone classification (i.e., the pitch extractor and tone classifier) and then the whole network are trained simultaneously in an end-to-end fashion. Various cascade methods are evaluated. We carry out pitch extraction and tone classification experiments on Mandarin continuous speech database to show the superiority of the proposed models. Experimental results on pitch extraction show proposed pitch tracking model outperforms the DNN-RNN and bi-directional variants. Tone classification experimental results show the composite model outperforms the traditional cascade tone classification framework which makes use of pitch related feature and a back-end classifier.

Chairs:
Torbjørn Svendsen

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00