Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition

Wei-Cheng Lin (The University of Texas at Dallas); Carlos Busso (University of Texas at Dallas)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Chunk-level speech emotion recognition (SER) is a common modeling scheme to obtain better recognition performance than sentence level formulations. A key open question is the role of lexical boundary information in the process of splitting a sentence into small chunks. Is there any benefit in providing precise lexical boundary information to segment the speech into chunks (e.g., word-level alignments)? This study analyzes the role of lexical boundary information by exploring alternative segmentation strategies for chunk-level SER. We compare six chunk-level segmentation strategies that either consider word-level alignments or traditional time-based segmentation methods by varying the number of chunks and the duration of the chunks. We conduct extensive experiments to evaluate these chunk-level segmentation approaches using multiples corpora, and multiple acoustic feature sets. The results show a minor contribution of the word-level timing boundaries, where centering the chunks around words does not lead to significant performance gains. Instead, the critical factor to effectively segment a sentence into data chunks is to define the number of chunks according to the number of spoken words in the sentence.

Tags:

Segmentation, tagging, and parsing

Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition

Wei-Cheng Lin (The University of Texas at Dallas); Carlos Busso (University of Texas at Dallas)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ANCIENT CHINESE WORD SEGMENTATION AND PART-OF-SPEECH TAGGING USING DISTANT SUPERVISION

SIAST: A Slot Imbalance-Aware Self-Training Scheme for Semi-Supervised Slot Filling

Absolute decision corrupts absolutely: conservative online speaker diarisation

Join the IEEE Signal Processing Society