DEEP LEARNING AND INTERACTIVITY FOR VIDEO ROTOSCOPING
Shivam Saboo, Frederic Lefebvre, Vincent Demoulin
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:43
In this work we extend the idea of object co-segmentation to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs. We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN and PolyRNN++.