Spectrogram Analysis Via Self-Attention For Realizing Cross-Model Visual-Audio Generation
Huadong Tan, Guang Wu, Pengcheng Zhao, Yanxiang Chen
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:47
Human cognition is supported by the combination of multi-modal information from different sources of perception. The two most important modalities are visual and audio. Cross-modal visual-audio generation enables the synthesis of data from one modality following the acquisition of data from another. This brings about the full experience that can only be achieved through the combination of the two. In this paper,the Self-Attention mechanism is applied to cross-modal visual-audio generation for the first time. This technique is implemented to assist in the analysis of the structural characteristics of the spectrogram. A series of experiments are conducted to discover the best performing configuration. The post-experimental comparison shows that the Self-Attention module greatly improves the generation and classification of audio data. Furthermore, the presented method achieves results that are superior to existing cross-modal visual-audio generative models.