Auxiliary Pooling Layer For Spoken Language Understanding
Yukun Ma (Alibaba Group); Trung Hieu Nguyen (Alibaba Group); Jinjie Ni (Nanyang Technological University); Wen Wang (Alibaba Group); Qian Chen (Speech Lab, DAMO Academy, Alibaba Group); Chong Zhang (Alibaba Group); Bin Ma ("Alibaba, Singapore R&D Center")
-
SPS
IEEE Members: $11.00
Non-members: $15.00
End-to-end spoken language understanding requires speech data annotated with semantic information and may suffer from the shortage of annotated data. Recent progresses leverage unlabelled speech data to pre-train a speech encoder. However, it remains a challenge for the pre-trained speech encoder to encode semantic information. Existing works explore transferring knowledge from a pre-trained text model with different alignment losses at a fixed granularity. In this paper, we address the variable granularity in transferring knowledge from texts to speech representation via APLY, an auxiliary pooling layer, that fuses the global information with the adaptively encoded local context. We demonstrate the effectiveness of APLY on three benchmarks of spoken language understanding.