A Novel Convolutional Architecture for Video-Text Retrieval

Zheng Li, Caili Guo, Bo Yang, Zerun Feng, hao zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 07:33

07 Jul 2020

The prevalent video-text retrieval methods usually use recurrent neural networks to encode sequences of frames in videos and sequences of words in text. In this paper, we introduce an encoding architecture based entirely on convolutional neural networks. Compared to recurrent models, the complexity is smaller, and computations over all elements can be fully parallelized during training to better exploit the GPU. We use the stacking of convolution kernels of different scales to realize the encoding of local and long-term features of video and text. Experiments validate that our method achieves a new state-of-the-art for the video-text retrieval on MSR-VTT and MSVD datasets with less training time.

Tags:

icme 2020

sps conference

Value-Added Bundle(s) Including this Product

21 Sep 2020

ICME 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

26 Apr 2024

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

SPS

Members: $150.00
IEEE Members: $250.00
Non-members: $350.00

19 Apr 2024

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

SPS

Members: $150.00
IEEE Members: $250.00
Non-members: $350.00

16 Oct 2022

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

SPS

Members: $150.00
IEEE Members: $250.00
Non-members: $350.00