F0 ESTIMATION FROM TELEPHONE SPEECH USING DEEP FEATURE LOSS
Supritha M Shetty (Indian Institute of Information Technology, Dharwad); Shraddha Revankar (K L E Technological University); Nalini Iyer ("KLETech, Hubballi"); Deepak T (IIIT-Dharwad)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Accurate pitch estimation in speech signal plays a vital role in several applications. Robust pitch estimation in telephone speech is still a challenge due to the narrow bandwidth of the signal. Electroglottograph (EGG) signal is a reliable means
for pitch estimation, however, it’s not practically possible to measure such a signal in many applications. In this work, a method is proposed to synthesize EGG signal from telephone speech using deep feature loss network and subsequently
pitch contour is derived from synthesized EGG (SEGG) signal. In order to evaluate the proposed work, CMU-Arctic speech database is used as it contains simultaneous EGG signal recorded. The telephonic speech is derived using International Telecommunications Union ITU-T as specified in the Blizzard Challenge. The robustness of the proposed method is demonstrated under different noisy conditions. The performance of the proposed work is encouraging when compared with other state-of-the-art methods.