Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

Literature DB >> 25592589

Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

Benjamin M Good¹, Max Nanis, Chunlei Wu, Andrew I Su.

Abstract

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 25592589 PMCID： PMC4299946

Source DB: PubMed Journal: Pac Symp Biocomput ISSN： 2335-6928

9 in total

1. BANNER: an executable survey of advances in biomedical named entity recognition.

Authors: Robert Leaman; Graciela Gonzalez
Journal: Pac Symp Biocomput Date: 2008

2. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.

Authors: Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner
Journal: J Am Med Inform Assoc Date: 2011 Sep-Oct Impact factor: 4.497

Review 3. Literature mining, ontologies and information visualization for drug repurposing.

Authors: Christos Andronis; Anuj Sharma; Vassilis Virvilis; Spyros Deftereos; Aris Persidis
Journal: Brief Bioinform Date: 2011-06-28 Impact factor: 11.622

Review 4. Crowdsourcing for bioinformatics.

Authors: Benjamin M Good; Andrew I Su
Journal: Bioinformatics Date: 2013-06-19 Impact factor: 6.937

5. Discovery of novel biomarkers and phenotypes by semantic technologies.

Authors: Carlo A Trugenberger; Christoph Wälti; David Peregrim; Mark E Sharp; Svetlana Bureeva
Journal: BMC Bioinformatics Date: 2013-02-13 Impact factor: 3.169

6. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

Authors: Haijun Zhai; Todd Lingren; Louise Deleger; Qi Li; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal: J Med Internet Res Date: 2013-04-02 Impact factor: 5.428

7. BioCreAtIvE task 1A: gene mention finding evaluation.

Authors: Alexander Yeh; Alexander Morgan; Marc Colosimo; Lynette Hirschman
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

8. PubTator: a web-based text mining tool for assisting biocuration.

Authors: Chih-Hsuan Wei; Hung-Yu Kao; Zhiyong Lu
Journal: Nucleic Acids Res Date: 2013-05-22 Impact factor: 16.971

9. Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

Authors: Antonio Jimeno Yepes; Karin Verspoor
Journal: Database (Oxford) Date: 2014-02-10 Impact factor: 3.451

9 in total

19 in total

Review 1. Crowdsourcing in biomedicine: challenges and opportunities.

Authors: Ritu Khare; Benjamin M Good; Robert Leaman; Andrew I Su; Zhiyong Lu
Journal: Brief Bioinform Date: 2015-04-17 Impact factor: 11.622

2. Harnessing the heart of big data.

Authors: Sarah B Scruggs; Karol Watson; Andrew I Su; Henning Hermjakob; John R Yates; Merry L Lindsey; Peipei Ping
Journal: Circ Res Date: 2015-03-27 Impact factor: 17.367

Review 3. Comparing Amazon's Mechanical Turk Platform to Conventional Data Collection Methods in the Health and Medical Research Literature.

Authors: Karoline Mortensen; Taylor L Hughes
Journal: J Gen Intern Med Date: 2018-01-04 Impact factor: 5.128

4. Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.

Authors: Ginger Tsueng; Max Nanis; Jennifer T Fouquier; Michael Mayers; Benjamin M Good; Andrew I Su
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937

Review 5. Crowdsourcing: an overview and applications to ophthalmology.

Authors: Xueyang Wang; Lucy Mudie; Christopher J Brady
Journal: Curr Opin Ophthalmol Date: 2016-05 Impact factor: 3.761

6. Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation.

Authors: Sirarat Sarntivijai; Drashtti Vasant; Simon Jupp; Gary Saunders; A Patrícia Bento; Daniel Gonzalez; Joanna Betts; Samiul Hasan; Gautier Koscielny; Ian Dunham; Helen Parkinson; James Malone
Journal: J Biomed Semantics Date: 2016-03-23

7. RegenBase: a knowledge base of spinal cord injury biology for translational research.

Authors: Alison Callahan; Saminda W Abeyruwan; Hassan Al-Ali; Kunie Sakurai; Adam R Ferguson; Phillip G Popovich; Nigam H Shah; Ubbo Visser; John L Bixby; Vance P Lemmon
Journal: Database (Oxford) Date: 2016-04-07 Impact factor: 3.451

8. MET network in PubMed: a text-mined network visualization and curation system.

Authors: Hong-Jie Dai; Chu-Hsien Su; Po-Ting Lai; Ming-Siang Huang; Jitendra Jonnagaddala; Toni Rose Jue; Shruti Rao; Hui-Jou Chou; Marija Milacic; Onkar Singh; Shabbir Syed-Abdul; Wen-Lian Hsu
Journal: Database (Oxford) Date: 2016-05-30 Impact factor: 3.451

9. Crowdsourcing and curation: perspectives from biology and natural language processing.

Authors: Lynette Hirschman; Karën Fort; Stéphanie Boué; Nikos Kyrpides; Rezarta Islamaj Doğan; Kevin Bretonnel Cohen
Journal: Database (Oxford) Date: 2016-08-07 Impact factor: 3.451

10. Assessing Pictograph Recognition: A Comparison of Crowdsourcing and Traditional Survey Approaches.

Authors: Jinqiu Kuang; Lauren Argo; Greg Stoddard; Bruce E Bray; Qing Zeng-Treitler
Journal: J Med Internet Res Date: 2015-12-17 Impact factor: 5.428