Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 07:44
07 Jul 2020

Video emotion recognition is a challenging task due to complex scenes and various forms of emotion expression. Most existing works focus on fusing multiple features over the whole video clips. According to our observations, given a long video clip, the emotion is usually presented by only several actions/objects in a few short snippets, and the meaningful cues are buried in the noisy background. When human judging the emotion in videos, we first find the informative clips and then closely look for emotional cues in the frames. In this paper, we propose Dual Focus Attention Network to mimic this process. First, three kinds of features including action, object, and scene are extracted from videos. Second, Two attention modules are used to focus on the visual features of the videos from temporal and spatial dimensions respectively. With our dual focus attention network, we can effectively discover the most emotional frames along the time dimension and the most emotional visual cues in each frame. Our experiments conducted on two widely used datasets Ekman and VideoEmotion show that our proposed approach outperforms the existing approaches.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00