Continuous interaction with a smart speaker via low-dimensional embeddings of dynamic hand pose

songpei xu (University of Glasgow); Chaitanya Kaul (University of Glasgow); Xuri Ge (University of Glasgow); Roderick Murray-Smith (University of Glasgow)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose.

Tags:

Image and video coding

Continuous interaction with a smart speaker via low-dimensional embeddings of dynamic hand pose

songpei xu (University of Glasgow); Chaitanya Kaul (University of Glasgow); Xuri Ge (University of Glasgow); Roderick Murray-Smith (University of Glasgow)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Saliency-Driven Hierarchical Learned Image Coding for Machines

A Flow-Guided Non-Local Alignment Network for Video Compressive Sensing Reconstruction

JOINT COMPRESSION AND DEMOSAICKING FOR SATELLITE IMAGES

Join the IEEE Signal Processing Society