Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:14:59
21 Sep 2021

Video memorability is a cornerstone in social media platform analysis, as a highly memorable video is more likely to be noticed and shared. This paper proposes a new framework to fuse multi-modal information to predict the likelihood of remembering a video. The proposed framework relies on late fusion of text, visual and motion features. Specifically, two neural networks extract features from the captions describing the videoƒ??s content; two ResNet models extract visual features from specific frames, and two 3DResNet models, combined with Fisher Vectors, extract features from the videoƒ??s motion information. The extracted features are used to compute several memorability scores via Bayesian Ridge regression, which are then fused based on a greedy search of the optimal fusion parameters. Experiments demonstrate the superiority of the proposed framework on the MediaEval2019 dataset, outperforming the state-of-the-art.

Value-Added Bundle(s) Including this Product

More Like This