Skip to main content

ADAPTIVE TIME-SCALE MODIFICATION FOR IMPROVING SPEECH INTELLIGIBILITY BASED ON PHONEME CLUSTERING FOR STREAMING SERVICES

Sohee Jang (Hanyang University); Jiye Kim (Hanyang University); Yeon-Ju Kim (Hanyang University); Joon-Hyuk Chang (Hanyang University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Time-scale modification (TSM) is important in streaming services, including over-the-top (OTT) platforms, audiobooks, and online lectures. Although TSM modifies the speed of audio while maintaining other audio attributes such as the pitch and timbre of the speaker, it unnaturally distorts audio signals and makes spoken content difficult to understand. This study proposes an adaptive time-scale modification algorithm (ATSM); that adaptively varies the speaking rate for each phoneme cluster of speech to improve speech intelligibility. The proposed algorithm performs forced alignment using Montreal forced aligner and time-scale reconstruction using an adaptive speaking rate based on dynamic time warping. To validate the proposed algorithm, the diagnostic rhyme test (DRT) score, comparison mean opinion score (CMOS), and fast dynamic time warping (FastDTW) score of ATSM are compared with those of conventional TSMs. The results show that the speech compressed with the proposed algorithm has improved speech intelligibility than that of speech compressed with other algorithms.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00