REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP THEORY
Atsushi Miyashita (Nagoya University); Tomoki Toda (Nagoya University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The acoustic characteristics of phonemes vary depending on the vocal tract length (VTL) of the individual speakers. It is important to disentangle such speaker information from the linguistic information for various tasks, such as automatic speech recognition (ASR) and speaker recognition. In this paper, we focus on the property of vocal tract length transformation (VTLT) that forms a group, and derive the novel speech representation \textbf{VTL spectrum} based on group theory analysis, where only the phase of the VTL spectrum is changed by VTLT, which is a simple linear shift. Moreover, we propose a method to analytically disentangle the VTL effects on the VTL spectrum. We conducted experiments with the TIMIT dataset to clarify the property of this feature, demonstrationg that 1) for ASR, normalization of the VTL spectrum reduced the phoneme recognition error rate by 1.9 under random VTLT, and 2) for speaker recognition, removal of the amplitude component of the VTL spectrum improved speaker classification performance.