Deformable Alignment and Scale-Adaptive Feature Extraction Network For Continuous-Scale Satellite Video Super-Resolution
Ning Ni, Hanlin Wu, Libao Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:59
Recent deep learning approaches have achieved impressive performance in visually-guided sound source separation tasks. However, due to the lack of real-world mixed/separated audio sample pairs, most methods seriously rely on the "Mix-and-Separate" manner to learn sound source separation, often unsuitable for real-world mixtures. To address this issue, we utilize a semi-supervised learning technique --- preserving audio-visual consistency --- to improve the separation performance of real-world scenarios. in this way, our network is trained jointly by artificial and real-world mixtures. To the best of our knowledge, this could be the first attempt to improve real-world generalization. We also design a category-guided audio-visual fusion module to learn audio-visual matching. Comparative experiments are performed on two publicly-available datasets, MUSIC and AudioSet. Experiment results demonstrate that our method could often outperform other state-of-the-art ones in visual sound separation.