Channel-Position Self-Attention With Query Refinement Skeleton Graph Neural Network in Human Pose Estimation
Shek Wai Chu, Chaoyi Zhang, Yang Song, Weidong Cai
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:16
in this paper, we propose a new audio-visual attention dataset that records eye movement for omnidirectional videos with and without sound. We classify the videos into three types according to the number of salient objects and sound sources and analyze the impact of sound on visual attention distribution and inter-observer consistency of viewing area in different types of videos. From the quantitative and qualitative analysis, we find that visual attention will be drawn to and concentrated on the sound source with the presence of sound, especially when there are several visually salient objects and only one sound source. Also, the sound will enhance the consistency of observation areas among viewers to some extent. For more investigations on the impact of sound on visual attention and prospective audio-visual saliency model, we still need further study.