Whether Contribution of Features Differ Between Video-mediated and In-person Meetings in Important Utterance Estimation
Fumio Nihei (NTT); Ryo Ishii (NTT); Yukiko Nakano (Seikei Univeristy); Atsushi Fukayama (NTT); Takao Nakamura (NTT)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This study investigated differences in the contributions of various features to in-person (IP) and video-mediated (VM) meetings. We focused on estimating important utterances using both an IP and a VM meeting corpora as the analysis data. A transformer model with dialogue history was used to estimate important utterances, and five types of input (text, speaker's audio, others' audio, speaker's video, and others' video) were fed to the model. A comparison of the models for IP and VM revealed that the speaker's audio has a strong effect on the IP model, the video of the other participants strongly affects the VM model, and the text and others' audio strongly affects both models in estimating important utterances.