Skip to main content

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

Daizong Liu (Peking University); Pan Zhou (Huazhong University of Science and Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query. However, almost of existing TSLV approaches suffer from the same limitations: (1) They only focus on either frame-level or object-level visual representation learning and corresponding correlation reasoning, but fail to integrate them both; (2) They neglect to leverage the rich semantic contexts to further benefit the query reasoning. To address these issues, in this paper, we propose a novel Hierarchical Visual- and Semantic-Aware Reasoning Network (HVSARN), which enables both visual- and semantic-aware query reasoning from object-level to frame-level. Experimental results on three challenging datasets demonstrate that our HVSARN achieves a new state-of-the-art performance.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00