Video Memorability Prediction Via Late Fusion Of Deep Multi-Modal Features

Roberto Leyva, Victor Sanchez

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:59

21 Sep 2021

Video memorability is a cornerstone in social media platform analysis, as a highly memorable video is more likely to be noticed and shared. This paper proposes a new framework to fuse multi-modal information to predict the likelihood of remembering a video. The proposed framework relies on late fusion of text, visual and motion features. Specifically, two neural networks extract features from the captions describing the videoƒ??s content; two ResNet models extract visual features from specific frames, and two 3DResNet models, combined with Fisher Vectors, extract features from the videoƒ??s motion information. The extracted features are used to compute several memorability scores via Bayesian Ridge regression, which are then fused based on a greedy search of the optimal fusion parameters. Experiments demonstrate the superiority of the proposed framework on the MediaEval2019 dataset, outperforming the state-of-the-art.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Video Memorability Prediction Via Late Fusion Of Deep Multi-Modal Features

Roberto Leyva, Victor Sanchez

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Innovating for Product Sustainability – Making Data Centers Greener

Panel: Navigating Green: Regulatory Insights and Compliance Strategies for Building a Sustainable Future

Sustainability Start-up Pitch Competition

Join the IEEE Signal Processing Society