OPENFEAT: IMPROVING SPEAKER IDENTIFICATION BY OPEN-SET FEW-SHOT EMBEDDING ADAPTATION WITH TRANSFORMER
Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:58
Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics. A universal embedding space learned from a large number of speakers is not necessarily optimal for identification tasks within a household. In this work, we formulate speaker identification within a household as a few-shot open-set recognition task and propose embedding adaptation to adapt speaker representations from a universal embedding space to a household-specific embedding space through a set-to-set function for better separation. With our new algorithm, open-set Few-shot Embedding Adaptation with Transformer (openFEAT), the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative.