Age-Vox-Celeb: Multi-Modal Corpus For Facial And Speech Estimation

Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:09

10 Jun 2021

Estimating a speaker's age from her speech is more challenging than age estimation from her face because of insufficiently available public corpora. To tackle this problem, we construct a new audio-visual age corpus named {\it AgeVoxCeleb} by annotating age labels to VoxCeleb2 videos. AgeVoxCeleb is the first large-scale, balanced, and multi-modal age corpus that contains both video and speech of the same speakers from a wide age range. Using AgeVoxCeleb, our paper makes the following contributions: (i) A facial age estimation model can outperform a speech age estimation model by comparing the state-of-the-art models in each task. (ii) Facial age estimation is more robust against the difference between training and test sets. (iii) We developed cross-modal transfer learning from face to speech age estimation, showing that the estimated age with a facial age estimation model can be used to train a speech age estimation model. Proposed AgeVoxCeleb will be published in https://github.com/nttcslab-sp/agevoxceleb.

Chairs:

Shi-Xiong Zhang

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Age-Vox-Celeb: Multi-Modal Corpus For Facial And Speech Estimation

Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society