Adopting Self-supervised Learning into Unsupervised Video Summarization through Restorative score.

Mehryar Abbasi Boroujeni, Parvaneh Saeedi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

In this paper, we present a new process for creating video summaries in an unsupervised manner. Our approach involves training a transformer encoder model to reconstruct missing frames in a video in a self-supervised way using the partially masked video as input. We then introduce an algorithm that utilizes the above-trained encoder to generate an importance score for each frame. Such frame importance scores are used to create the summary of the video. We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video. We validate the effectiveness of our approach on two benchmark datasets of TVSum and SumMe. We demonstrate that it outperforms state-of-the-art (SOTA) methods. Additionally, our approach is more stable during the training process compared to SOTA techniques based on generative adversarial learning. Our source code is publicly available.

Tags:

Unsupervised Video summarization

self-supervised learning

video generation

self-attention encoders.

Adopting Self-supervised Learning into Unsupervised Video Summarization through Restorative score.

Mehryar Abbasi Boroujeni, Parvaneh Saeedi

More Like This

Short Course Bundle: ICIP 2023 COURSE 2: Short Course: Unboxing Advancements in Biomedical Image Processing (Parts 1-4)

The Changing Landscape of Speech Foundation Models

Slides: The Changing Landscape of Speech Foundation Models

Join the IEEE Signal Processing Society