Efficient Use Of End-To-End Data In Spoken Language Processing

Yiting Lu, Yu Wang, Mark J. F. Gales

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:05:17

09 Jun 2021

For many challenging tasks there is often limited data to train the systems in an end-to-end fashion, which has become increasingly popular for deep-learning. However, these tasks can normally be split into multiple separate modules, with significant quantities of data associated with each module. Spoken language processing applications fit into this scenario, as they usually start with a speech recognition module, followed by multiple task specific modules to achieve the end goal. This work examines how the best use can be made of limited end-to-end training for sequence-to-sequence tasks. The key to improving the use of the data is to more tightly integrate the modules via embeddings, rather than simply propagating words between modules. In this work speech translation is considered as the spoken language application. When significant quantities of in-domain, end-to-end data is available, cascade approaches operate well. When the in-domain data is limited, however, tighter integration between modules enables better use of the data to be made. One of the challenges with tighter integration is how to ensure embedding consistency between the modules. A novel form of embedding-passing between modules is proposed that shows improved performance over both cascade and standard embedding-passing approaches for limited in-domain data.

Chairs:

Bhuvana Ramabhadran

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021