One-shot Action Detection via Attention Zooming In
He-Yen Hsieh (Academia Sinica); Ding-Jie Chen (Academia Sinica); Cheng-Wei Chang (Academia Sinica); Tyng-Luh Liu (Academia Sinica)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Hinted by a modest support set, few-shot action detection (FSAD) aims at localizing the action instances of unseen classes within an untrimmed query video. Existing FSAD techniques mostly rely on generating a set of class-agnostic action proposals from the query video and then finding the most plausible ones by assessing their correlation to the support set. Such two-stage approaches are feasible but not efficient, largely due to neglecting the support information in generating the proposals. This work focuses on the one-shot image scenario and introduces the attention zooming in strategy to effectively and progressively carry out support-query cross-attention while generating proposals. The resulting one-stage model yields high-quality action proposals for boosting one-shot action detection (OSAD) performance. Our extensive experiments on the ActivityNet-1.3 and THUMOS-14 datasets demonstrate that the proposed framework can achieve state-of-the-art performance in tackling challenging image-based OSAD tasks.