| Literature DB >> 30416242 |
Luca Pion-Tonachini1,2,3, Scott Makeig2, Ken Kreutz-Delgado1,3.
Abstract
Large, unlabeled datasets are abundant nowadays, but getting labels for those datasets can be expensive and time-consuming. Crowd labeling is a crowdsourcing approach for gathering such labels from workers whose suggestions are not always accurate. While a variety of algorithms exist for this purpose, we present crowd labeling latent Dirichlet allocation (CL-LDA), a generalization of latent Dirichlet allocation that can solve a more general set of crowd labeling problems. We show that it performs as well as other methods and at times better on a variety of simulated and actual datasets while treating each label as compositional rather than indicating a discrete class. In addition, prior knowledge of workers' abilities can be incorporated into the model through a structured Bayesian framework. We then apply CL-LDA to the EEG independent component labeling dataset, using its generalizations to further explore the utility of the algorithm. We discuss prospects for creating classifiers from the generated labels.Entities:
Keywords: Bayesian; Crowd labelling; EEG; Generative model; Latent Dirichlet allocation
Year: 2017 PMID: 30416242 PMCID: PMC6223327 DOI: 10.1007/s10115-017-1053-1
Source DB: PubMed Journal: Knowl Inf Syst ISSN: 0219-3116 Impact factor: 2.822