Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature.

Literature DB >> 30815103

A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature.

Antonio Jimeno Yepes¹, Andrew MacKinlay¹, Natalie Gunn¹, Christine Schieber¹, Noel Faux¹, Matthew Downton¹, Benjamin Goudey¹, Richard L Martin².

Abstract

As the cost of DNA sequencing continues to fall, an increasing amount of information on human genetic variation is being produced that could help progress precision medicine. However, information about such mutations is typically first made available in the scientific literature, and is then later manually curated into more standardized genomic databases. This curation process is expensive, time-consuming and many variants do not end up being fully curated, if at all. Detecting mutations in the literature is the first key step towards automating this process. However, most of the current methods have focused on identifying mutations that follow existing nomenclatures. In this work, we show that there is a large number of mutations that are missed by using this standard approach. Furthermore, we implement the first mutation annotator to cover an extended mutation landscape, and we show that its F1 performance is the same performance as human annotation (F1 78.29 for manual annotation vs F1 79.56 for automatic annotation).

Entities: Species

Mesh：

Year: 2018 PMID： 30815103 PMCID： PMC6371299

Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN： 1559-4076

12 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

Review 2. Cytogenetic Nomenclature: Changes in the ISCN 2013 Compared to the 2009 Edition.

Authors: A Simons; L G Shaffer; R J Hastings
Journal: Cytogenet Genome Res Date: 2013 Impact factor: 1.636

3. tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors: Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal: Bioinformatics Date: 2013-04-05 Impact factor: 6.937

4. SETH detects and normalizes genetic variants in text.

Authors: Philippe Thomas; Tim Rocktäschel; Jörg Hakenberg; Yvonne Lichtblau; Ulf Leser
Journal: Bioinformatics Date: 2016-06-02 Impact factor: 6.937

5. Gene: a gene-centered information resource at NCBI.

Authors: Garth R Brown; Vichet Hem; Kenneth S Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D Pruitt; Donna R Maglott; Terence D Murphy
Journal: Nucleic Acids Res Date: 2014-10-29 Impact factor: 16.971

6. MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors: J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal: Bioinformatics Date: 2007-05-11 Impact factor: 6.937

7. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors: Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal: Nucleic Acids Res Date: 2010-10-15 Impact factor: 16.971

2. Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology.

Authors: Shray Alag
Journal: PLoS One Date: 2020-05-27 Impact factor: 3.240

2 in total

A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature.

1. dbSNP: the NCBI database of genetic variation.

Review 2. Cytogenetic Nomenclature: Changes in the ISCN 2013 Compared to the 2009 Edition.

3. tmVar: a text mining approach for extracting sequence variants in biomedical literature.

4. SETH detects and normalizes genetic variants in text.

5. Gene: a gene-centered information resource at NCBI.

6. MutationFinder: a high-performance system for extracting point mutation mentions from text.

7. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

8. UniProt: a hub for protein information.

9. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.

10. Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

1. Accelerated variant curation from scientific literature using biomedical text mining.

2. Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology.