A NOVEL METRIC FOR EVALUATING AUDIO CAPTION SIMILARITY
Swapnil P Bhosale (TCS Research and Innovation); Rupayan Chakraborty (TCS Research); Sunil Kumar Kopparapu (TCS Research)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Automatic Audio Captioning ( AAC ) refers to the task of describing an audio sample in a natural language (NL) text. Unlike NL text generation tasks, which rely on lexical semantic metrics like BLEU for evaluation, the AAC evaluation metric requires acoustic semantics to map NL text corresponding to similar sounds in addition to lexical semantics. In this paper, we propose a novel metric based on Text-to-Audio Grounding ( TAG ), to incorporate acoustic semantics. Experiments demonstrate our evaluation metric to perform better compared to existing metrics used in NL text and image captioning literature for
AAC.