Skip to main content

IMPROVING AUDIO CAPTIONING USING SEMANTIC SIMILARITY METRICS

Rehana Mahfuz (Qualcomm); Yinyi Guo (Qualcomm); Erik Visser (Qualcomm)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Audio captioning quality metrics which are typically borrowed from the machine translation and image captioning areas measure the degree of overlap between predicted tokens and gold reference tokens. In this work, we consider a metric measuring semantic similarities between predicted and reference captions instead of measuring exact word overlap. We first evaluate its ability to capture similarities among captions corresponding to the same audio file and compare it to other established metrics. We then propose a fine-tuning method to directly optimize the metric by backpropagating through a sentence embedding extractor and audio captioning network. Such fine-tuning results in an improvement in predicted captions as measured by both traditional metrics and the proposed semantic similarity captioning metric.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00