Towards Large-Scale Data Annotation Of Audio From Wearables: Validating Zooniverse Annotations Of Infant Vocalization Types
Chiara Semenzin, Lisa Hamrick, Amanda Seidl, Bridgette Kelleher, Alejandrina Cristia
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:13:10
Recent developments allow the collection of audio data from lightweight wearable devices, potentially enabling us to study language use from everyday life samples. However, extracting useful information from these data is currently impossible with automatized routines, and overly expensive with trained human annotators. We explore a strategy fit to the 21st century, relying on the collaboration of citizen scientists. A large dataset of infant speech was uploaded on a citizen science platform. The same data were annotated in the laboratory by highly trained annotators. We investigate whether crowd-sourced annotations are qualitatively and quantitatively comparable to those produced by expert annotators in a dataset of children at high- and low-risk for language disorders. Our results reveal that classification of individual vocalizations on Zooniverse was overall moderately accurate compared to the laboratory gold standard. The analysis of descriptors defined at the level of individual children found strong correlations between descriptors derived from Zooniverse versus laboratory annotations.