Literature DB >> 21094696

Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

Aurélie Névéol1, Rezarta Islamaj Doğan, Zhiyong Lu.   

Abstract

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. Published by Elsevier Inc.

Entities:  

Mesh:

Year:  2010        PMID: 21094696      PMCID: PMC3063330          DOI: 10.1016/j.jbi.2010.11.001

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  15 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  Mining MEDLINE for implicit links between dietary substances and diseases.

Authors:  Padmini Srinivasan; Bisharah Libbus
Journal:  Bioinformatics       Date:  2004-08-04       Impact factor: 6.937

3.  A day in the life of PubMed: analysis of a typical day's query log.

Authors:  Jorge R Herskovic; Len Y Tanaka; William Hersh; Elmer V Bernstam
Journal:  J Am Med Inform Assoc       Date:  2007-01-09       Impact factor: 4.497

Review 4.  Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

Authors:  Rainer Winnenburg; Thomas Wächter; Conrad Plake; Andreas Doms; Michael Schroeder
Journal:  Brief Bioinform       Date:  2008-12-06       Impact factor: 11.622

5.  Author Name Disambiguation in MEDLINE.

Authors:  Vetle I Torvik; Neil R Smalheiser
Journal:  ACM Trans Knowl Discov Data       Date:  2009-07-01       Impact factor: 2.713

6.  New directions in biomedical text annotation: definitions, guidelines and corpus construction.

Authors:  W John Wilbur; Andrey Rzhetsky; Hagit Shatkay
Journal:  BMC Bioinformatics       Date:  2006-07-25       Impact factor: 3.169

7.  GOAnnotator: linking protein GO annotations to evidence text.

Authors:  Francisco M Couto; Mário J Silva; Vivian Lee; Emily Dimmer; Evelyn Camon; Rolf Apweiler; Harald Kirsch; Dietrich Rebholz-Schuhmann
Journal:  J Biomed Discov Collab       Date:  2006-12-20

8.  Understanding PubMed user search behavior through log analysis.

Authors:  Rezarta Islamaj Dogan; G Craig Murray; Aurélie Névéol; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2009-11-27       Impact factor: 3.451

9.  Overview of BioCreative II gene normalization.

Authors:  Alexander A Morgan; Zhiyong Lu; Xinglong Wang; Aaron M Cohen; Juliane Fluck; Patrick Ruch; Anna Divoli; Katrin Fundel; Robert Leaman; Jörg Hakenberg; Chengjie Sun; Heng-hui Liu; Rafael Torres; Michael Krauthammer; William W Lau; Hongfang Liu; Chun-Nan Hsu; Martijn Schuemie; K Bretonnel Cohen; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

10.  Overview of BioCreative II gene mention recognition.

Authors:  Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  39 in total

1.  SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  ACM BCB       Date:  2014

2.  SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  IEEE J Biomed Health Inform       Date:  2015-04-13       Impact factor: 5.772

Review 3.  Community challenges in biomedical text mining over 10 years: success, failure and the future.

Authors:  Chung-Chi Huang; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-05-01       Impact factor: 11.622

4.  tmChem: a high performance approach for chemical named entity recognition and normalization.

Authors:  Robert Leaman; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

5.  Automatic extraction of drug indications from FDA drug labels.

Authors:  Ritu Khare; Chih-Hsuan Wei; Zhiyong Lu
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

Review 6.  Crowdsourcing in biomedicine: challenges and opportunities.

Authors:  Ritu Khare; Benjamin M Good; Robert Leaman; Andrew I Su; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-04-17       Impact factor: 11.622

7.  Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Authors:  Halil Kilicoglu
Journal:  Brief Bioinform       Date:  2018-11-27       Impact factor: 11.622

8.  tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-04-05       Impact factor: 6.937

9.  Developing topic-specific search filters for PubMed with click-through data.

Authors:  J Li; Z Lu
Journal:  Methods Inf Med       Date:  2013-05-13       Impact factor: 2.176

10.  LabeledIn: cataloging labeled indications for human drugs.

Authors:  Ritu Khare; Jiao Li; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-08-23       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.