Literature DB >> 30815103

A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature.

Antonio Jimeno Yepes1, Andrew MacKinlay1, Natalie Gunn1, Christine Schieber1, Noel Faux1, Matthew Downton1, Benjamin Goudey1, Richard L Martin2.   

Abstract

As the cost of DNA sequencing continues to fall, an increasing amount of information on human genetic variation is being produced that could help progress precision medicine. However, information about such mutations is typically first made available in the scientific literature, and is then later manually curated into more standardized genomic databases. This curation process is expensive, time-consuming and many variants do not end up being fully curated, if at all. Detecting mutations in the literature is the first key step towards automating this process. However, most of the current methods have focused on identifying mutations that follow existing nomenclatures. In this work, we show that there is a large number of mutations that are missed by using this standard approach. Furthermore, we implement the first mutation annotator to cover an extended mutation landscape, and we show that its F1 performance is the same performance as human annotation (F1 78.29 for manual annotation vs F1 79.56 for automatic annotation).

Entities:  

Mesh:

Year:  2018        PMID: 30815103      PMCID: PMC6371299     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  12 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

Review 2.  Cytogenetic Nomenclature: Changes in the ISCN 2013 Compared to the 2009 Edition.

Authors:  A Simons; L G Shaffer; R J Hastings
Journal:  Cytogenet Genome Res       Date:  2013       Impact factor: 1.636

3.  tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-04-05       Impact factor: 6.937

4.  SETH detects and normalizes genetic variants in text.

Authors:  Philippe Thomas; Tim Rocktäschel; Jörg Hakenberg; Yvonne Lichtblau; Ulf Leser
Journal:  Bioinformatics       Date:  2016-06-02       Impact factor: 6.937

5.  Gene: a gene-centered information resource at NCBI.

Authors:  Garth R Brown; Vichet Hem; Kenneth S Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D Pruitt; Donna R Maglott; Terence D Murphy
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

6.  MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors:  J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-05-11       Impact factor: 6.937

7.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors:  Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal:  Nucleic Acids Res       Date:  2010-10-15       Impact factor: 16.971

8.  UniProt: a hub for protein information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

9.  Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.

Authors:  Antonio Jimeno Yepes; Karin Verspoor
Journal:  F1000Res       Date:  2014-01-21

10.  Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Authors:  Kevin Nagel; Antonio Jimeno-Yepes; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

View more
  2 in total

1.  Accelerated variant curation from scientific literature using biomedical text mining.

Authors:  Rishab Mallick; Valerio Arnaboldi; Paul Davis; Stavros Diamantakis; Magdalena Zarowiecki; Kevin Howe
Journal:  MicroPubl Biol       Date:  2022-06-01

2.  Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology.

Authors:  Shray Alag
Journal:  PLoS One       Date:  2020-05-27       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.