Literature DB >> 25592589

Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

Benjamin M Good1, Max Nanis, Chunlei Wu, Andrew I Su.   

Abstract

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.

Entities:  

Mesh:

Year:  2015        PMID: 25592589      PMCID: PMC4299946     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  9 in total

1.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

2.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.

Authors:  Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

Review 3.  Literature mining, ontologies and information visualization for drug repurposing.

Authors:  Christos Andronis; Anuj Sharma; Vassilis Virvilis; Spyros Deftereos; Aris Persidis
Journal:  Brief Bioinform       Date:  2011-06-28       Impact factor: 11.622

Review 4.  Crowdsourcing for bioinformatics.

Authors:  Benjamin M Good; Andrew I Su
Journal:  Bioinformatics       Date:  2013-06-19       Impact factor: 6.937

5.  Discovery of novel biomarkers and phenotypes by semantic technologies.

Authors:  Carlo A Trugenberger; Christoph Wälti; David Peregrim; Mark E Sharp; Svetlana Bureeva
Journal:  BMC Bioinformatics       Date:  2013-02-13       Impact factor: 3.169

6.  Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

Authors:  Haijun Zhai; Todd Lingren; Louise Deleger; Qi Li; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal:  J Med Internet Res       Date:  2013-04-02       Impact factor: 5.428

7.  BioCreAtIvE task 1A: gene mention finding evaluation.

Authors:  Alexander Yeh; Alexander Morgan; Marc Colosimo; Lynette Hirschman
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

8.  PubTator: a web-based text mining tool for assisting biocuration.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2013-05-22       Impact factor: 16.971

9.  Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

Authors:  Antonio Jimeno Yepes; Karin Verspoor
Journal:  Database (Oxford)       Date:  2014-02-10       Impact factor: 3.451

  9 in total
  19 in total

Review 1.  Crowdsourcing in biomedicine: challenges and opportunities.

Authors:  Ritu Khare; Benjamin M Good; Robert Leaman; Andrew I Su; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-04-17       Impact factor: 11.622

2.  Harnessing the heart of big data.

Authors:  Sarah B Scruggs; Karol Watson; Andrew I Su; Henning Hermjakob; John R Yates; Merry L Lindsey; Peipei Ping
Journal:  Circ Res       Date:  2015-03-27       Impact factor: 17.367

Review 3.  Comparing Amazon's Mechanical Turk Platform to Conventional Data Collection Methods in the Health and Medical Research Literature.

Authors:  Karoline Mortensen; Taylor L Hughes
Journal:  J Gen Intern Med       Date:  2018-01-04       Impact factor: 5.128

4.  Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.

Authors:  Ginger Tsueng; Max Nanis; Jennifer T Fouquier; Michael Mayers; Benjamin M Good; Andrew I Su
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

Review 5.  Crowdsourcing: an overview and applications to ophthalmology.

Authors:  Xueyang Wang; Lucy Mudie; Christopher J Brady
Journal:  Curr Opin Ophthalmol       Date:  2016-05       Impact factor: 3.761

6.  Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation.

Authors:  Sirarat Sarntivijai; Drashtti Vasant; Simon Jupp; Gary Saunders; A Patrícia Bento; Daniel Gonzalez; Joanna Betts; Samiul Hasan; Gautier Koscielny; Ian Dunham; Helen Parkinson; James Malone
Journal:  J Biomed Semantics       Date:  2016-03-23

7.  RegenBase: a knowledge base of spinal cord injury biology for translational research.

Authors:  Alison Callahan; Saminda W Abeyruwan; Hassan Al-Ali; Kunie Sakurai; Adam R Ferguson; Phillip G Popovich; Nigam H Shah; Ubbo Visser; John L Bixby; Vance P Lemmon
Journal:  Database (Oxford)       Date:  2016-04-07       Impact factor: 3.451

8.  MET network in PubMed: a text-mined network visualization and curation system.

Authors:  Hong-Jie Dai; Chu-Hsien Su; Po-Ting Lai; Ming-Siang Huang; Jitendra Jonnagaddala; Toni Rose Jue; Shruti Rao; Hui-Jou Chou; Marija Milacic; Onkar Singh; Shabbir Syed-Abdul; Wen-Lian Hsu
Journal:  Database (Oxford)       Date:  2016-05-30       Impact factor: 3.451

9.  Crowdsourcing and curation: perspectives from biology and natural language processing.

Authors:  Lynette Hirschman; Karën Fort; Stéphanie Boué; Nikos Kyrpides; Rezarta Islamaj Doğan; Kevin Bretonnel Cohen
Journal:  Database (Oxford)       Date:  2016-08-07       Impact factor: 3.451

10.  Assessing Pictograph Recognition: A Comparison of Crowdsourcing and Traditional Survey Approaches.

Authors:  Jinqiu Kuang; Lauren Argo; Greg Stoddard; Bruce E Bray; Qing Zeng-Treitler
Journal:  J Med Internet Res       Date:  2015-12-17       Impact factor: 5.428

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.