Skip to main content

Audio-based Emotion Recognition enhancement through Progressive GAN

Christos Athanasiadis, Enrique Hortal, Stylianos Asteriadis

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:29
27 Oct 2020

Training large-scale architectures such as Generative Adversarial Networks (GANs) in order to investigate audio-visual relations in emotion-enriched interactions is a challenging task. This procedure is hindered by the high complexity as well as the mode collapse phenomenon. Sufficiently training these architectures requires a massive amount of data. Furthermore, creating extensive audio-visual datasets for specific tasks, like emotion recognition, is a complicate task handicapped by the annotation cost and labelling ambiguities. On the other hand, it is much more forthright to get access to unlabeled audio-visual datasets due mainly to the easy access to online multimedia content. In this work, a progressive process for training GANs was conducted. The first step, leverages enormous audio-visual unlabeled datasets to expose concealed cross-modal relationships. Meanwhile in the second step, a calibration of the weights by employing a limited amount of emotion annotated data was performed. Through experimentation, it was shown that our progressive GANs schema leads to a more efficient optimization of the whole network, and the generated samples from the target domain, when fused with the authentic ones, provides better emotion recognition results

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00