F0 ESTIMATION FROM TELEPHONE SPEECH USING DEEP FEATURE LOSS

Supritha M Shetty (Indian Institute of Information Technology, Dharwad); Shraddha Revankar (K L E Technological University); Nalini Iyer ("KLETech, Hubballi"); Deepak T (IIIT-Dharwad)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Accurate pitch estimation in speech signal plays a vital role in several applications. Robust pitch estimation in telephone speech is still a challenge due to the narrow bandwidth of the signal. Electroglottograph (EGG) signal is a reliable means for pitch estimation, however, it’s not practically possible to measure such a signal in many applications. In this work, a method is proposed to synthesize EGG signal from telephone speech using deep feature loss network and subsequently pitch contour is derived from synthesized EGG (SEGG) signal. In order to evaluate the proposed work, CMU-Arctic speech database is used as it contains simultaneous EGG signal recorded. The telephonic speech is derived using International Telecommunications Union ITU-T as specified in the Blizzard Challenge. The robustness of the proposed method is demonstrated under different noisy conditions. The performance of the proposed work is encouraging when compared with other state-of-the-art methods.

Tags:

Audio and speech source separation