Multimodal Active Speaker Detection And Virtual Cinematography For Video Conferencing

Ross Cutler, Ramin Mehran, Sam Johnson, Oliver Whyte, Adarsh Kowdle, Cha Zhang, Adam Kirk

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 19:34

04 May 2020

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the experience of a video conference by automatically panning, tilting and zooming of a camera: subjectively users rate an expert video cinematographer significantly higher than the unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array, extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning. To avoid distracting the room participants the system has no moving parts â the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a system with N=100 meetings, each 2-5 minutes in length.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Multimodal Active Speaker Detection And Virtual Cinematography For Video Conferencing

Ross Cutler, Ramin Mehran, Sam Johnson, Oliver Whyte, Adarsh Kowdle, Cha Zhang, Adam Kirk

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society