CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION
Yukun Ma, Trung Hieu Nguyen, Bin Ma
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:14:50
Speech translation models benefit from adapting multilingual pretrained language models. However, such adaptation modifies the parameters in the pretrained model to favor a specific task. Prefix-tuning, as a lightweight adaptation technique, has recently emerged as an efficient adaptation method that significantly reduces the number of trainable parameters and has demonstrated great potential in the low-resource settings. It inserts prefixes into the output of each layer of a pretrained model, without modifying its parameters. During training, only the parameters of prefixes are updated while the rest of the model are being frozen. In this paper, we improve the performance of speech translation in medium-/low-resource settings by a cross-modal prefix that bridges the gap between speech input and translation modules to reduce the information loss in the cascaded model. We show that the proposed cross-modal prefix-tuning is effective, robust and parameter-efficient for adapting a speech recognition and translation pipeline.