Literature DB >> 14990452

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors.

Florence Horn1, Anthony L Lau, Fred E Cohen.   

Abstract

MOTIVATION: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families.
RESULTS: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14990452     DOI: 10.1093/bioinformatics/btg449

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  54 in total

1.  An examination of the OMIM database for associating mutation to a consensus reference sequence.

Authors:  Zuofeng Li; Beili Ying; Xingnan Liu; Xiaoyan Zhang; Hong Yu
Journal:  Protein Cell       Date:  2012-04-04       Impact factor: 14.870

2.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Authors:  Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas Peterson; Maricel G Kann
Journal:  Bioinformatics       Date:  2010-12-07       Impact factor: 6.937

Review 3.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

Review 4.  Frontiers of biomedical text mining: current progress.

Authors:  Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal:  Brief Bioinform       Date:  2007-10-30       Impact factor: 11.622

5.  Manual curation is not sufficient for annotation of genomic databases.

Authors:  William A Baumgartner; K Bretonnel Cohen; Lynne M Fox; George Acquaah-Mensah; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

6.  Prediction of functional nonsynonymous single nucleotide polymorphisms in human G-protein-coupled receptors.

Authors:  Dan Xue; Jingyuan Yin; Mingfeng Tan; Junjie Yue; Yuelan Wang; Long Liang
Journal:  J Hum Genet       Date:  2008-02-26       Impact factor: 3.172

7.  Intrinsic evaluation of text mining tools may not predict performance on realistic tasks.

Authors:  J Gregory Caporaso; Nita Deshpande; J Lynn Fink; Philip E Bourne; K Bretonnel Cohen; Lawrence Hunter
Journal:  Pac Symp Biocomput       Date:  2008

Review 8.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

9.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.

Authors:  Yael Garten; Russ B Altman
Journal:  BMC Bioinformatics       Date:  2009-02-05       Impact factor: 3.169

10.  Extraction of human kinase mutations from literature, databases and genotyping studies.

Authors:  Martin Krallinger; Jose M G Izarzugaza; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.