Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

Guoqiang Gong, Liangfeng Zheng, Yadong Mu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 10:00

09 Jul 2020

Temporal action localization is a recently-emerging task, aiming to localize video segments from untrimmed videos which contain specific actions. This work proposes a novel integrated temporal scale aggregation network (TSA-Net). Our main insight is that ensembling convolution filters with different dilation rates can effectively enlarge the receptive field with low computational cost, which inspires us to devise multi-dilation temporal convolution (MDC) block. Furthermore, to tackle video action instances with different durations, TSA-Net consists of multiple branches of sub-networks. Each of them adopts stacked MDC blocks with different dilation parameters, accomplishing a temporal receptive field specially optimized for specific-duration actions. We follow the formulation of boundary point detection, novelly detecting three kinds of critical points (i.e., starting / mid-point / ending) and pairing them for proposal generation. Comprehensive evaluations are conducted on THUMOS14. Our proposed TSA-Net demonstrates clear and consistent better performances and recalibrates new state-of-the-art on THUMOS14 benchmark.

Tags:

icme 2020

sps conference

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

Guoqiang Gong, Liangfeng Zheng, Yadong Mu

Value-Added Bundle(s) Including this Product

ICME 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society