VIDEO SUMMARIZATION WITH ANCHORS AND MULTI-HEAD ATTENTION
Yi-Lin Sung, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:00
Video summarization is a challenging task that will automatically generate a representative and attractive highlight movie from the source video. Previous works explicitly exploit the hierarchical structure of video to train a summarizer. However, their method sometimes uses fixed-length segmentation, which breaks the video structure or requires additional training data to train the segmentation model. In this paper, we propose an Anchor-Based Attention RNN (ABA-RNN) for solving the video summarization problem. ABA-RNN provides two contributions. One is that we attain the frame-level and clip-level features by the anchor-based approach, and the model only needs one layer of RNN by introducing subtraction manner used in minus-LSTM. We also use multi-head attention to let the model select suitable lengths of segments. Another contribution is that we do not need any extra video preprocessing to determine shot boundaries and our architecture is end-to-end training. In experiments, we follow the standard datasetsSumMe and TVSum and achieve competitive performance against the state-of-the-art results.