URBAN SOUND & SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING

Magdalena Fuentes, Bea Steers, Julia Wilkins, Qianyi Shi, Yao Hou, Juan Pablo Bello, Pablo Zinemanas, Xavier Serra, Martin Rocamora, Luca Bondi, Samarjit Das

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:01

08 May 2022

Automatic audio-visual urban traffic understanding is a growing area of research with many potential applications of value to industry, academia, and the public sector. Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. Urbansas consists of 12 hours of unlabeled data along with 3 hours of manually annotated data, including bounding boxes with classes and unique id of vehicles, and strong audio labels featuring vehicle types and indicating off-screen sounds. We discuss the challenges presented by the dataset and how to use its annotations for the localization of vehicles in the wild through audio models.

Tags:

dataset

traffic

urban research

audio-visual

URBAN SOUND & SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING

Magdalena Fuentes, Bea Steers, Julia Wilkins, Qianyi Shi, Yao Hou, Juan Pablo Bello, Pablo Zinemanas, Xavier Serra, Martin Rocamora, Luca Bondi, Samarjit Das

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION

THE FIRST COMPREHENSIVE DATASET WITH MULTIPLE DISTORTION TYPES FOR VISUAL JUST-NOTICEABLE DIFFERENCES

A LARGE SCALE MULTI-VIEW RGBD VISUAL AFFORDANCE LEARNING DATASET

Join the IEEE Signal Processing Society

URBAN SOUND &amp; SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING

Magdalena Fuentes, Bea Steers, Julia Wilkins, Qianyi Shi, Yao Hou, Juan Pablo Bello, Pablo Zinemanas, Xavier Serra, Martin Rocamora, Luca Bondi, Samarjit Das

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION

THE FIRST COMPREHENSIVE DATASET WITH MULTIPLE DISTORTION TYPES FOR VISUAL JUST-NOTICEABLE DIFFERENCES

A LARGE SCALE MULTI-VIEW RGBD VISUAL AFFORDANCE LEARNING DATASET

Join the IEEE Signal Processing Society

URBAN SOUND & SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING