Unsupervised Key Hand Shape Discovery Of Sign Language Videos With Correspondence Sparse Autoencoders
Recep Doga Siyli, Batuhan Gundogdu, Murat Saraçlar, Lale Akarun
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:52
Recognition of sign language is a difficult task which often requires tedious annotations by sign language experts. End-to-end learning attempts that bypass frame level annotations have achieved some success in limited datasets, but it has been shown that high quality annotations improve performance drastically. Recent unsupervised learning methods using deep neural networks have achieved successes in learning feature extraction. Yet a technique for high quality frame level classification using unsupervised techniques does not exist. In this paper, we assign labels of an isolated Sign Language (SL) dataset using end-to-end neural network architectures that have proven success in unsupervised discovery of sub-word acoustic units in speech processing. We observe that key-hand-shapes (KHS), which are meaningful visual basic parts of signs in a SL dataset can be detected using unsupervised clustering techniques. Sparse autoencoders can successfully retrieve and cluster KHSs used in isolated signs. In addition, using correspondent frames in an autoencoder scheme has the power to continue the learning process.