Literature DB >> 21138947

Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Emily Doughty1, Attila Kertesz-Farkas, Olivier Bodenreider, Gary Thompson, Asa Adadey, Thomas Peterson, Maricel G Kann.   

Abstract

MOTIVATION: A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.
RESULTS: We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases. DISCUSSION: Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles. AVAILABILITY: Freely available at: http://bioinf.umbc.edu/EMU/ftp.

Entities:  

Mesh:

Year:  2010        PMID: 21138947      PMCID: PMC3031038          DOI: 10.1093/bioinformatics/btq667

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  30 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  Exploring semantic groups through visual approaches.

Authors:  Olivier Bodenreider; Alexa T McCray
Journal:  J Biomed Inform       Date:  2003-12       Impact factor: 6.317

3.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

4.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

5.  High-performance gene name normalization with GeNo.

Authors:  Joachim Wermter; Katrin Tomanek; Udo Hahn
Journal:  Bioinformatics       Date:  2009-02-02       Impact factor: 6.937

6.  Insulin-like growth factor (IGF)-binding protein-3 mutants that do not bind IGF-I or IGF-II stimulate apoptosis in human prostate cancer cells.

Authors:  Jiang Hong; George Zhang; Feng Dong; Matthew M Rechler
Journal:  J Biol Chem       Date:  2002-01-09       Impact factor: 5.157

7.  Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases.

Authors:  Mireille Claustres; Ourania Horaitis; Marijana Vanevski; Richard G H Cotton
Journal:  Genome Res       Date:  2002-05       Impact factor: 9.043

8.  The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community.

Authors: 
Journal:  Stud Health Technol Inform       Date:  2007

9.  MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors:  J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-05-11       Impact factor: 6.937

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

View more
  40 in total

1.  Beyond accuracy: creating interoperable and scalable text-mining web services.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-02-16       Impact factor: 6.937

2.  Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer.

Authors:  Thomas A Peterson; Nathan L Nehrt; Dohwan Park; Maricel G Kann
Journal:  J Am Med Inform Assoc       Date:  2012 Mar-Apr       Impact factor: 4.497

Review 3.  Crowdsourcing in biomedicine: challenges and opportunities.

Authors:  Ritu Khare; Benjamin M Good; Robert Leaman; Andrew I Su; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-04-17       Impact factor: 11.622

Review 4.  Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Authors:  Michael Simmons; Ayush Singhal; Zhiyong Lu
Journal:  Adv Exp Med Biol       Date:  2016       Impact factor: 2.622

5.  DES-Mutation: System for Exploring Links of Mutations and Diseases.

Authors:  Vasiliki Kordopati; Adil Salhi; Rozaimi Razali; Aleksandar Radovanovic; Faroug Tifratene; Mahmut Uludag; Yu Li; Ameerah Bokhari; Ahdab AlSaieedi; Arwa Bin Raies; Christophe Van Neste; Magbubah Essack; Vladimir B Bajic
Journal:  Sci Rep       Date:  2018-09-06       Impact factor: 4.379

6.  tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-04-05       Impact factor: 6.937

7.  Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.

Authors:  Ayush Singhal; Michael Simmons; Zhiyong Lu
Journal:  J Am Med Inform Assoc       Date:  2016-04-27       Impact factor: 4.497

8.  ResidueFinder: extracting individual residue mentions from protein literature.

Authors:  Ton E Becker; Eric Jakobsson
Journal:  J Biomed Semantics       Date:  2021-07-21

9.  tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Authors:  Chih-Hsuan Wei; Lon Phan; Juliana Feltz; Rama Maiti; Tim Hefferon; Zhiyong Lu
Journal:  Bioinformatics       Date:  2018-01-01       Impact factor: 6.937

Review 10.  Towards precision medicine: advances in computational approaches for the analysis of human variants.

Authors:  Thomas A Peterson; Emily Doughty; Maricel G Kann
Journal:  J Mol Biol       Date:  2013-08-17       Impact factor: 5.469

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.