Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 10:00
09 Jul 2020

Temporal action localization is a recently-emerging task, aiming to localize video segments from untrimmed videos which contain specific actions. This work proposes a novel integrated temporal scale aggregation network (TSA-Net). Our main insight is that ensembling convolution filters with different dilation rates can effectively enlarge the receptive field with low computational cost, which inspires us to devise multi-dilation temporal convolution (MDC) block. Furthermore, to tackle video action instances with different durations, TSA-Net consists of multiple branches of sub-networks. Each of them adopts stacked MDC blocks with different dilation parameters, accomplishing a temporal receptive field specially optimized for specific-duration actions. We follow the formulation of boundary point detection, novelly detecting three kinds of critical points (i.e., starting / mid-point / ending) and pairing them for proposal generation. Comprehensive evaluations are conducted on THUMOS14. Our proposed TSA-Net demonstrates clear and consistent better performances and recalibrates new state-of-the-art on THUMOS14 benchmark.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00