| Literature DB >> 32823538 |
Peter Washington1, Emilie Leblanc2, Kaitlyn Dunlap2, Yordan Penev2, Aaron Kline2, Kelley Paskov3, Min Woo Sun3, Brianna Chrisman1, Nathaniel Stockham4, Maya Varma5, Catalin Voss5, Nick Haber6, Dennis P Wall2,3.
Abstract
Mobilized telemedicine is becoming a key, and even necessary, facet of both precision health and precision medicine. In this study, we evaluate the capability and potential of a crowd of virtual workers-defined as vetted members of popular crowdsourcing platforms-to aid in the task of diagnosing autism. We evaluate workers when crowdsourcing the task of providing categorical ordinal behavioral ratings to unstructured public YouTube videos of children with autism and neurotypical controls. To evaluate emerging patterns that are consistent across independent crowds, we target workers from distinct geographic loci on two crowdsourcing platforms: an international group of workers on Amazon Mechanical Turk (MTurk) (N = 15) and Microworkers from Bangladesh (N = 56), Kenya (N = 23), and the Philippines (N = 25). We feed worker responses as input to a validated diagnostic machine learning classifier trained on clinician-filled electronic health records. We find that regardless of crowd platform or targeted country, workers vary in the average confidence of the correct diagnosis predicted by the classifier. The best worker responses produce a mean probability of the correct class above 80% and over one standard deviation above 50%, accuracy and variability on par with experts according to prior studies. There is a weak correlation between mean time spent on task and mean performance (r = 0.358, p = 0.005). These results demonstrate that while the crowd can produce accurate diagnoses, there are intrinsic differences in crowdworker ability to rate behavioral features. We propose a novel strategy for recruitment of crowdsourced workers to ensure high quality diagnostic evaluations of autism, and potentially many other pediatric behavioral health conditions. Our approach represents a viable step in the direction of crowd-based approaches for more scalable and affordable precision medicine.Entities:
Keywords: autism; crowdsourcing; diagnostics; machine learning; pediatrics; telemedicine
Year: 2020 PMID: 32823538 PMCID: PMC7564950 DOI: 10.3390/jpm10030086
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1The process for calculating a probability score of autism from the categorical answers provided by crowdsourced workers. (A) Workers answer a series of multiple-choice questions per video that correspond to (B) categorical ordinal variables used in the input feature matrix to the (C) logistic regression classifier trained on electronic medical record data. This classifier emits a probability score for autism, which is the probability of the correct class when the true class is autism and 1 minus this probability when the true class is neurotypical (the latter case is depicted). (D) A vector of these probabilities is used to calculate mean worker and mean video probabilities of the correct class.
Figure 2Distribution of average classifier probability of the correct class per video with at least ten ratings from (A) MTurk workers, (B) Bangladesh Microworkers, (C) Kenya Microworkers, and (D) Philippines Microworkers. There is wide variability in the difficulty level of rated videos.
Figure 3Distribution of average probability of the correct class per (A) MTurk worker, (B) Bangladesh Microworker, (C) Kenya Microworker, and (D) Philippines Microworker who provided at least ten ratings. There is wide variability in the ability of workers to provide accurate categorical labels.
Figure 4Mean classifier confidence per video vs. per worker for (A) MTurk workers, (B) Bangladesh Microworkers, (C) Kenya Microworkers, and (D) Philippines Microworkers for videos with at least 10 ratings and workers who provided at least ten ratings. Each vertical line of points contains the difficulty levels of videos rated for one worker, visually demonstrating that workers received similar distributions of video difficulties to rate despite displaying large variation in average diagnostic confidence.
Figure 5Crowd filtration pipeline. Crowdsourced workers are first evaluated globally. The highest performers from each location are further evaluated for one or more rounds until a final skilled workforce is curated. These “super recognizers” may then be repeatedly employed in global clinical workflows.