ASVFI: Audio-driven Speaker Video Frame Interpolation

Qianrui Wang, Dengshi Li, Liang Liao, Hao Song, Wei Li, Jing Xiao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 11 Oct 2023

Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.

Tags:

Speaker video

video frame interpolation

audio

ASVFI: Audio-driven Speaker Video Frame Interpolation

Qianrui Wang, Dengshi Li, Liang Liao, Hao Song, Wei Li, Jing Xiao

More Like This

ST-MFNET MINI: KNOWLEDGE DISTILLATION-DRIVEN FRAME INTERPOLATION

EFFICIENT CONVOLUTION AND TRANSFORMER-BASED NETWORK FOR VIDEO FRAME INTERPOLATION

AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO

Join the IEEE Signal Processing Society