Fast And Accurate Embedded Dcnn For Rgb-D Based Sign Language Recognition
Ching-Chen Wang, Ching-Te Chiu, Chao-Tsung Huang, Yu-Chun Ding, Li-Wei Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:50
In this paper, fast and accurate two paths CNN architecture was designed in hardware-oriented manner. Our proposed network is composed of RGB and depth path for gesture recognition by fusing RGB and depth features, following the pre-defined constraints on dedicated hardware. The RTL simulation results indicate it only takes 0.171 milliseconds to infer a single pair of RGB image and depth maps at the operational frequency of 250MHz. Compared with running the same model at Intel i7 and GTX 1080, the speedups are 593.92x and 7.68x respectively. Besides, to increase the recognition accuracy under the diversity of the circumstance, a new RGB-D dataset, captured from Kinect, with complex background was built. Moreover, the number of parameters in our model is only 0.17M and it achieves 99.79% accuracy on the ASL Finger Spelling dataset. Compared with the Gaoâs CNN gesture recognition architecture, the number of parameter of our model is 2.9 times less and the accuracy is 6.49% higher. Demonstration video for sign language recognition is provided : https://youtu.be/DvO8mI7IZ5Q