Automatic classification of vocal intensity category from speech

Manila Kodali (Aalto University); Sudarsana Reddy Kadiri (Aalto University); laura laaksonen (Huawei); Paavo Alku (Aalto University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Regulation of vocal intensity is a fundamental phenomenon in speech communication. Vocal intensity can be quantified using sound pressure level (SPL), which can be measured easily by recording a standard calibration signal with speech and by comparing the energy of the recorded speech signal with that of the calibration tone. Unfortunately, speech recordings are mostly conducted without the SPL calibration signal, and speech signals are saved to databases using arbitrary amplitude scales. Therefore, neither the SPL nor the intensity category (e.g. soft or loud phonation) of a saved speech signal can be determined afterwards. Even though the original level information of speech is lost when the signal is presented on arbitrary amplitude scales, the speech signal contains other acoustic cues of vocal intensity. In the current study, we study machine learning and deep learning -based methods in automatic classification of vocal intensity category when the input speech is expressed using an arbitrary amplitude scale. A new gender-balanced database consisting of speech produced in four vocal intensity categories (soft, normal, loud, and very loud) was first recorded. Support vector machine and deep neural network (DNN) models were used to develop automatic classification systems using spectrograms, mel-spectrograms, and mel-frequency cepstral coefficients as features. The DNN classifier using the mel-spectrogram showed the best classification accuracy of about 90 %. The database is made publicly available at \url{https://bit.ly/3tLPGRx}.

Tags:

Speech production, perception and psychoacoustics

Automatic classification of vocal intensity category from speech

Manila Kodali (Aalto University); Sudarsana Reddy Kadiri (Aalto University); laura laaksonen (Huawei); Paavo Alku (Aalto University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Location Estimates from Channel State Information Via Binary Programming

Coded Illumination for Improved Lensless Imaging

Join the IEEE Signal Processing Society