How Accurate Is Passive Stereo For 3D Face Reconstruction?
Jiahao Luo, Eric Sandoval Ruezga, James Davis
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:05:20
Predicting multiple future frames from a given video is a challenging problem due to several factors such as changing camera, dynamically moving objects, occlusions, etc. While recent deep learning methods have made significant progress on the video prediction problem, most methods predict the immediate or a fixed number of future frames. To obtain longer-term frame predictions, existing techniques usually process the predicted frames iteratively, resulting in blurry or inconsistent predictions. in this work, we present a new approach that can predict an \emph{arbitrary} number of future video frames with a single forward pass through the network. instead of directly predicting a fixed number of future optical flows or frames, we learn temporal motion encodings, i.e., temporal motion basis vectors and a network to predict the coefficients. The learned motion basis can be easily extended to arbitrary length at inference time, enabling us to predict an arbitrary number of future frames. Experiments on benchmark datasets show that our approach performs favorably against several competitive techniques even for the next frame prediction setting. When evaluated under 5-frame or 10-frame prediction settings, the proposed method achieves higher performance gains over the state-of-the-art techniques that iteratively process the predictions.