Transformer Based Unsupervised Pre-Training For Acoustic Representation Learning

Ruixiong Zhang, Haiwei Wu, Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:22

10 Jun 2021

Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech emotion recognition, sound event detection and speech translation. All the experiments have shown that pre-training using its own training data can significantly improve the performance. With a larger pre-training data combining MuST-C, Librispeech and ESC-US datasets, for speech emotion recognition, the UAR can further improve absolutely 4.3% on IEMOCAP dataset. For sound event detection, the F1 score can further improve absolutely 1.5% on DCASE2018 task5 development set and 2.1% on evaluation set. For speech translation, the BLEU score can further improve relatively 12.2% on En-De dataset and 8.4% on En-Fr dataset.

Chairs:

Torbjørn Svendsen

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Transformer Based Unsupervised Pre-Training For Acoustic Representation Learning

Ruixiong Zhang, Haiwei Wu, Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society