Literature DB >> 18229722

Intrinsic evaluation of text mining tools may not predict performance on realistic tasks.

J Gregory Caporaso1, Nita Deshpande, J Lynn Fink, Philip E Bourne, K Bretonnel Cohen, Lawrence Hunter.   

Abstract

Biomedical text mining and other automated techniques are beginning to achieve performance which suggests that they could be applied to aid database curators. However, few studies have evaluated how these systems might work in practice. In this article we focus on the problem of annotating mutations in Protein Data Bank (PDB) entries, and evaluate the relationship between performance of two automated techniques, a text-mining-based approach (MutationFinder) and an alignment-based approach, in intrinsic versus extrinsic evaluations. We find that high performance on gold standard data (an intrinsic evaluation) does not necessarily translate to high performance for database annotation (an extrinsic evaluation). We show that this is in part a result of lack of access to the full text of journal articles, which appears to be critical for comprehensive database annotation by text mining. Additionally, we evaluate the accuracy and completeness of manually annotated mutation data in the PDB, and find that it is far from perfect. We conclude that currently the most cost-effective and reliable approach for database annotation might incorporate manual and automatic annotation methods.

Mesh:

Substances:

Year:  2008        PMID: 18229722      PMCID: PMC2517250     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  17 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions.

Authors:  William R Hersh; M Katherine Crabtree; David H Hickam; Lynetta Sacherek; Charles P Friedman; Patricia Tidmarsh; Craig Mosbaek; Dale Kraemer
Journal:  J Am Med Inform Assoc       Date:  2002 May-Jun       Impact factor: 4.497

3.  Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors.

Authors:  Florence Horn; Anthony L Lau; Fred E Cohen
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

Review 4.  A survey of current work in biomedical text mining.

Authors:  Aaron M Cohen; William R Hersh
Journal:  Brief Bioinform       Date:  2005-03       Impact factor: 11.622

Review 5.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

6.  Enhanced semantic access to the protein engineering literature using ontologies populated by text mining.

Authors:  Rene Witte; Thomas Kappler; Christopher J O Baker
Journal:  Int J Bioinform Res Appl       Date:  2007

7.  PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Authors:  Ian Donaldson; Joel Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D Bader; Katerina Michalickova; Tony Pawson; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-03-27       Impact factor: 3.169

8.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

9.  Automatic extraction of mutations from Medline and cross-validation with OMIM.

Authors:  Dietrich Rebholz-Schuhmann; Stephane Marcel; Sylvie Albert; Ralf Tolle; Georg Casari; Harald Kirsch
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

10.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

Authors:  Evelyn B Camon; Daniel G Barrell; Emily C Dimmer; Vivian Lee; Michele Magrane; John Maslen; David Binns; Rolf Apweiler
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

View more
  19 in total

1.  TRANSLATING BIOLOGY: TEXT MINING TOOLS THAT WORK.

Authors:  K Bretonnel Cohen; Hong Yu; Philip E Bourne; Lynette Hirschman
Journal:  Pac Symp Biocomput       Date:  2008-01-01

2.  Text mining for modeling of protein complexes enhanced by machine learning.

Authors:  Varsha D Badal; Petras J Kundrotas; Ilya A Vakser
Journal:  Bioinformatics       Date:  2021-05-01       Impact factor: 6.937

3.  Improving precision in concept normalization.

Authors:  Mayla Boguslav; K Bretonnel Cohen; William A Baumgartner; Lawrence E Hunter
Journal:  Pac Symp Biocomput       Date:  2018

Review 4.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

5.  Prospects for the automated extraction of mutation data from the scientific literature.

Authors:  Peter D Stenson; David N Cooper
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

6.  Algorithms and semantic infrastructure for mutation impact extraction and grounding.

Authors:  Jonas B Laurila; Nona Naderi; René Witte; Alexandre Riazanov; Alexandre Kouznetsov; Christopher J O Baker
Journal:  BMC Genomics       Date:  2010-12-02       Impact factor: 3.969

7.  Text mining improves prediction of protein functional sites.

Authors:  Karin M Verspoor; Judith D Cohn; Komandur E Ravikumar; Michael E Wall
Journal:  PLoS One       Date:  2012-02-29       Impact factor: 3.240

8.  Using ODIN for a PharmGKB revalidation experiment.

Authors:  Fabio Rinaldi; Simon Clematide; Yael Garten; Michelle Whirl-Carrillo; Li Gong; Joan M Hebert; Katrin Sangkuhl; Caroline F Thorn; Teri E Klein; Russ B Altman
Journal:  Database (Oxford)       Date:  2012-04-23       Impact factor: 3.451

9.  Biomedical text mining and its applications.

Authors:  Raul Rodriguez-Esteban
Journal:  PLoS Comput Biol       Date:  2009-12-24       Impact factor: 4.475

10.  Literature mining of protein-residue associations with graph rules learned through distant supervision.

Authors:  Ke Ravikumar; Haibin Liu; Judith D Cohn; Michael E Wall; Karin Verspoor
Journal:  J Biomed Semantics       Date:  2012-10-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.