Literature DB >> 14704350

Automatic extraction of mutations from Medline and cross-validation with OMIM.

Dietrich Rebholz-Schuhmann1, Stephane Marcel, Sylvie Albert, Ralf Tolle, Georg Casari, Harald Kirsch.   

Abstract

Mutations help us to understand the molecular origins of diseases. Researchers, therefore, both publish and seek disease-relevant mutations in public databases and in scientific literature, e.g. Medline. The retrieval tends to be time-consuming and incomplete. Automated screening of the literature is more efficient. We developed extraction methods (called MEMA) that scan Medline abstracts for mutations. MEMA identified 24,351 singleton mutations in conjunction with a HUGO gene name out of 16,728 abstracts. From a sample of 100 abstracts we estimated the recall for the identification of mutation-gene pairs to 35% at a precision of 93%. Recall for the mutation detection alone was >67% with a precision rate of >96%. This shows that our system produces reliable data. The subset consisting of protein sequence mutations (PSMs) from MEMA was compared to the entries in OMIM (20,503 entries versus 6699, respectively). We found 1826 PSM-gene pairs to be in common to both datasets (cross-validated). This is 27% of all PSM-gene pairs in OMIM and 91% of those pairs from OMIM which co-occur in at least one Medline abstract. We conclude that Medline covers a large portion of the mutations known to OMIM. Another large portion could be artificially produced mutations from mutagenesis experiments. Access to the database of extracted mutation-gene pairs is available through the web pages of the EBI (refer to http://www.ebi. ac.uk/rebholz/index.html).

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14704350      PMCID: PMC373272          DOI: 10.1093/nar/gkh162

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  16 in total

1.  Mining literature for protein-protein interactions.

Authors:  E M Marcotte; I Xenarios; D Eisenberg
Journal:  Bioinformatics       Date:  2001-04       Impact factor: 6.937

2.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

Authors: 
Journal:  Genome Inform Ser Workshop Genome Inform       Date:  1998

3.  Disambiguating proteins, genes, and RNA in text: a machine learning approach.

Authors:  V Hatzivassiloglou; P A Duboué; A Rzhetsky
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

4.  Association of genes to genetically inherited diseases using data mining.

Authors:  Carolina Perez-Iratxeta; Peer Bork; Miguel A Andrade
Journal:  Nat Genet       Date:  2002-05-13       Impact factor: 38.330

Review 5.  Information extraction in molecular biology.

Authors:  Christian Blaschke; Lynette Hirschman; Alfonso Valencia
Journal:  Brief Bioinform       Date:  2002-06       Impact factor: 11.622

6.  Computer-assisted generation of a protein-interaction database for nuclear receptors.

Authors:  Sylvie Albert; Sylvain Gaudan; Heidrun Knigge; Andreas Raetsch; Asuncion Delgado; Bettina Huhse; Harald Kirsch; Michael Albers; Dietrich Rebholz-Schuhmann; Manfred Koegl
Journal:  Mol Endocrinol       Date:  2003-05-08

7.  Guidelines for human gene nomenclature (1997). HUGO Nomenclature Committee.

Authors:  J A White; P J McAlpine; S Antonarakis; H Cann; J T Eppig; K Frazer; J Frezal; D Lancet; J Nahmias; P Pearson; J Peters; A Scott; H Scott; N Spurr; C Talbot; S Povey
Journal:  Genomics       Date:  1997-10-15       Impact factor: 5.736

8.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.

Authors:  R Sachidanandam; D Weissman; S C Schmidt; J M Kakol; L D Stein; G Marth; S Sherry; J C Mullikin; B J Mortimore; D L Willey; S E Hunt; C G Cole; P C Coggill; C M Rice; Z Ning; J Rogers; D R Bentley; P Y Kwok; E R Mardis; R T Yeh; B Schultz; L Cook; R Davenport; M Dante; L Fulton; L Hillier; R H Waterston; J D McPherson; B Gilman; S Schaffner; W J Van Etten; D Reich; J Higgins; M J Daly; B Blumenstiel; J Baldwin; N Stange-Thomann; M C Zody; L Linton; E S Lander; D Altshuler
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

Review 9.  Automated extraction of information in molecular biology.

Authors:  M A Andrade; P Bork
Journal:  FEBS Lett       Date:  2000-06-30       Impact factor: 4.124

10.  EDGAR: extraction of drugs, genes and relations from the biomedical literature.

Authors:  T C Rindflesch; L Tanabe; J N Weinstein; L Hunter
Journal:  Pac Symp Biocomput       Date:  2000
View more
  36 in total

1.  An examination of the OMIM database for associating mutation to a consensus reference sequence.

Authors:  Zuofeng Li; Beili Ying; Xingnan Liu; Xiaoyan Zhang; Hong Yu
Journal:  Protein Cell       Date:  2012-04-04       Impact factor: 14.870

2.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Authors:  Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas Peterson; Maricel G Kann
Journal:  Bioinformatics       Date:  2010-12-07       Impact factor: 6.937

Review 3.  Medical informatics and bioinformatics: a bibliometric study.

Authors:  J Y Bansard; D Rebholz-Schuhmann; G Cameron; D Clark; E van Mulligen; E Beltrame; E Barbolla; F Del Hoyo Martin-Sanchez; L Milanesi; I Tollis; J van der Lei; J L Coatrieux
Journal:  IEEE Trans Inf Technol Biomed       Date:  2007-05

4.  Intrinsic evaluation of text mining tools may not predict performance on realistic tasks.

Authors:  J Gregory Caporaso; Nita Deshpande; J Lynn Fink; Philip E Bourne; K Bretonnel Cohen; Lawrence Hunter
Journal:  Pac Symp Biocomput       Date:  2008

5.  Gene-L'EXPO: a tool to extract knowledge From transcriptomes and find 'Literature-Sparse' relationships between genes and tissues.

Authors:  Teruyoshi Hishiki; Issei Tamada; Kousaku Okubo
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

6.  Improved mutation tagging with gene identifiers applied to membrane protein stability prediction.

Authors:  Rainer Winnenburg; Conrad Plake; Michael Schroeder
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

7.  Sequence and structure signatures of cancer mutation hotspots in protein kinases.

Authors:  Anshuman Dixit; Lin Yi; Ragul Gowthaman; Ali Torkamani; Nicholas J Schork; Gennady M Verkhivker
Journal:  PLoS One       Date:  2009-10-16       Impact factor: 3.240

8.  Extraction of human kinase mutations from literature, databases and genotyping studies.

Authors:  Martin Krallinger; Jose M G Izarzugaza; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

9.  Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Authors:  Kevin Nagel; Antonio Jimeno-Yepes; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

10.  Between proteins and phenotypes: annotation and interpretation of mutations.

Authors:  Christopher J O Baker; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.