Skip to main content

UAVM: Towards Unifying Audio and Visual Models

Yuan Gong (Massachusetts Institute of Technology); Alexander H Liu (MIT); Andrew Rouditchenko (MIT CSAIL); James Glass (Massachusetts Institute of Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
09 Jun 2023

Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have. Code at github.com/yuangongnd/uavm .

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00