AV-TAD: AUDIO-VISUAL TEMPORAL ACTION DETECTION WITH TRANSFORMER

Yangcheng Li (Shanghai Jiao Tong University); Zefang Yu (Shanghai Jiao Tong University); Suncheng Xiang (Shanghai Jiao Tong University); Ting Liu (Shanghai Jiao Tong University); Yuzhuo Fu (sjtu)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

As an important and challenging task in video understanding, Temporal Action Detection (TAD) has been deeply studied in recent years. However, current works mainly tackle this task with visual information, while neglecting to explore the potential of the audio modality. To address this challenge, in this paper, we propose a simple yet effective Audio-Visual Temporal Action Detection Transformer named AV-TAD, which performs early fusion on audio and visual modalities in an end-to-end fashion. On top of it, a novel query formulation is introduced by directly adopting temporal segment coordinates as queries in Transformer decoder, thus allowing us to perform dynamic segment update layer-by-layer. To the best of our knowledge, this is the first attempt to investigate both audio and video feature with a multi-modal Transformer in TAD task. Extensive experiments on THUMOS14 dataset demonstrate that our proposed AV-TAD can outperform the previous methods by a clear margin.

Tags:

Imaging and video networks

AV-TAD: AUDIO-VISUAL TEMPORAL ACTION DETECTION WITH TRANSFORMER

Yangcheng Li (Shanghai Jiao Tong University); Zefang Yu (Shanghai Jiao Tong University); Suncheng Xiang (Shanghai Jiao Tong University); Ting Liu (Shanghai Jiao Tong University); Yuzhuo Fu (sjtu)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Animal Re-identification Algorithm for Posture Diversity

Decomposition, Interaction, Reconstruction Meets Global Context Learning in Visual Tracking

SELF-SUFFICIENT FRAMEWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION

Join the IEEE Signal Processing Society