Literature DB >> 28968638

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Chih-Hsuan Wei1, Lon Phan1, Juliana Feltz1, Rama Maiti1, Tim Hefferon1, Zhiyong Lu1.   

Abstract

Motivation: Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data.
Results: We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research. Availability and implementation: The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/. Contact: zhiyong.lu@nih.gov. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

Entities:  

Mesh:

Year:  2018        PMID: 28968638      PMCID: PMC5860583          DOI: 10.1093/bioinformatics/btx541

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  34 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Authors:  Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas Peterson; Maricel G Kann
Journal:  Bioinformatics       Date:  2010-12-07       Impact factor: 6.937

Review 3.  Literature mining for the biologist: from information retrieval to biological discovery.

Authors:  Lars Juhl Jensen; Jasmin Saric; Peer Bork
Journal:  Nat Rev Genet       Date:  2006-02       Impact factor: 53.242

4.  Knowledge environments representing molecular entities for the virtual physiological human.

Authors:  Martin Hofmann-Apitius; Juliane Fluck; Laura Furlong; Oriol Fornes; Corinna Kolárik; Susanne Hanser; Martin Boeker; Stefan Schulz; Ferran Sanz; Roman Klinger; Theo Mevissen; Tobias Gattermayer; Baldo Oliva; Christoph M Friedrich
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2008-09-13       Impact factor: 4.226

5.  MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors:  J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-05-11       Impact factor: 6.937

6.  Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers.

Authors:  Philippe E Thomas; Roman Klinger; Laura I Furlong; Martin Hofmann-Apitius; Christoph M Friedrich
Journal:  BMC Bioinformatics       Date:  2011-07-05       Impact factor: 3.169

7.  Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.

Authors:  Antonio Jimeno Yepes; Karin Verspoor
Journal:  F1000Res       Date:  2014-01-21

8.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors:  Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

9.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Authors:  Ayush Singhal; Michael Simmons; Zhiyong Lu
Journal:  PLoS Comput Biol       Date:  2016-11-30       Impact factor: 4.475

10.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

View more
  35 in total

1.  PubTator central: automated concept annotation for biomedical full text articles.

Authors:  Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

2.  Assisting document triage for human kinome curation via machine learning.

Authors:  Yi-Yu Hsu; Chih-Hsuan Wei; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

3.  ResidueFinder: extracting individual residue mentions from protein literature.

Authors:  Ton E Becker; Eric Jakobsson
Journal:  J Biomed Semantics       Date:  2021-07-21

4.  RegEl corpus: identifying DNA regulatory elements in the scientific literature.

Authors:  Samuele Garda; Freyda Lenihan-Geels; Sebastian Proft; Stefanie Hochmuth; Markus Schülke; Dominik Seelow; Ulf Leser
Journal:  Database (Oxford)       Date:  2022-06-27       Impact factor: 4.462

5.  BERN2: an advanced neural biomedical named entity recognition and normalization tool.

Authors:  Mujeen Sung; Minbyul Jeong; Yonghwa Choi; Donghyeon Kim; Jinhyuk Lee; Jaewoo Kang
Journal:  Bioinformatics       Date:  2022-10-14       Impact factor: 6.931

6.  Identification of lncRNA Biomarkers and LINC01198 Promotes Progression of Chronic Rhinosinusitis with Nasal Polyps through Sponge miR-6776-5p.

Authors:  Xueping Wang; Xiaoyuan Zhu; Li Peng; Yulin Zhao
Journal:  Biomed Res Int       Date:  2022-05-06       Impact factor: 3.246

7.  HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning.

Authors:  Xian Zhu; Yueming Gu; Zhifeng Xiao
Journal:  Front Genet       Date:  2022-04-27       Impact factor: 4.772

8.  Next generation sequencing data for use in risk assessment.

Authors:  B Alex Merrick
Journal:  Curr Opin Toxicol       Date:  2019-03-08

9.  PGxMine: Text mining for curation of PharmGKB.

Authors:  Jake Lever; Julia M Barbarino; Li Gong; Rachel Huddart; Katrin Sangkuhl; Ryan Whaley; Michelle Whirl-Carrillo; Mark Woon; Teri E Klein; Russ B Altman
Journal:  Pac Symp Biocomput       Date:  2020

10.  Variomes: a high recall search engine to support the curation of genomic variants.

Authors:  Emilie Pasche; Anaïs Mottaz; Déborah Caucheteur; Julien Gobeill; Pierre-André Michel; Patrick Ruch
Journal:  Bioinformatics       Date:  2022-03-11       Impact factor: 6.931

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.