Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:14:26
10 May 2022

Incorporating individualized head-related transfer functions (HRTFs) into a high fidelity sound engine can further improve the perceived quality and realism of binaurally-rendered spatial audio. Traditional methods to measure individual HRTFs tend to be cumbersome, expensive and require physical access to the subject. To address these issues, we develop a convolutional neural network model that, given a single photo of an ear, predicts pinna landmarks that can be used to extract anthropometric features commonly used for HRTF personalization, and match to a database of subjects whose HRTFs and pictures are available. We propose and evaluate a system utilizing this model to generate an individualized HRTF using a minimal set of easily obtainable measurements: single photographs of both ears, as well as head and ear scale for matching interaural time difference (ITD). To extend the reach of our database we employ ideas from Kendall shape theory to match ears non-dimensionally, match all ears to right ears, and make corresponding changes to the database HRIRs. We also apply HAT models to the HRIRs to provide better matching.