Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Literature DB >> 28968638

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Chih-Hsuan Wei¹, Lon Phan¹, Juliana Feltz¹, Rama Maiti¹, Tim Hefferon¹, Zhiyong Lu¹.

Abstract

Motivation: Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data.
Results: We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research. Availability and implementation: The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/. Contact: zhiyong.lu@nih.gov. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

Entities: Chemical Gene Species

Mesh：

Year: 2018 PMID： 28968638 PMCID： PMC5860583 DOI： 10.1093/bioinformatics/btx541

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

34 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Authors: Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas Peterson; Maricel G Kann
Journal: Bioinformatics Date: 2010-12-07 Impact factor: 6.937

Review 3. Literature mining for the biologist: from information retrieval to biological discovery.

Authors: Lars Juhl Jensen; Jasmin Saric; Peer Bork
Journal: Nat Rev Genet Date: 2006-02 Impact factor: 53.242

4. Knowledge environments representing molecular entities for the virtual physiological human.

Authors: Martin Hofmann-Apitius; Juliane Fluck; Laura Furlong; Oriol Fornes; Corinna Kolárik; Susanne Hanser; Martin Boeker; Stefan Schulz; Ferran Sanz; Roman Klinger; Theo Mevissen; Tobias Gattermayer; Baldo Oliva; Christoph M Friedrich
Journal: Philos Trans A Math Phys Eng Sci Date: 2008-09-13 Impact factor: 4.226

5. MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors: J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal: Bioinformatics Date: 2007-05-11 Impact factor: 6.937

6. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers.

Authors: Philippe E Thomas; Roman Klinger; Laura I Furlong; Martin Hofmann-Apitius; Christoph M Friedrich
Journal: BMC Bioinformatics Date: 2011-07-05 Impact factor: 3.169

7. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.

Authors: Antonio Jimeno Yepes; Karin Verspoor
Journal: F1000Res Date: 2014-01-21

8. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors: Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 19.160

9. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Authors: Ayush Singhal; Michael Simmons; Zhiyong Lu
Journal: PLoS Comput Biol Date: 2016-11-30 Impact factor: 4.475

10. Analysis of protein-coding genetic variation in 60,706 humans.

Authors: Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal: Nature Date: 2016-08-18 Impact factor: 49.962

35 in total

1. PubTator central: automated concept annotation for biomedical full text articles.

Authors: Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

2. Assisting document triage for human kinome curation via machine learning.

Authors: Yi-Yu Hsu; Chih-Hsuan Wei; Zhiyong Lu
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

3. ResidueFinder: extracting individual residue mentions from protein literature.

Authors: Ton E Becker; Eric Jakobsson
Journal: J Biomed Semantics Date: 2021-07-21

4. RegEl corpus: identifying DNA regulatory elements in the scientific literature.

Authors: Samuele Garda; Freyda Lenihan-Geels; Sebastian Proft; Stefanie Hochmuth; Markus Schülke; Dominik Seelow; Ulf Leser
Journal: Database (Oxford) Date: 2022-06-27 Impact factor: 4.462

5. BERN2: an advanced neural biomedical named entity recognition and normalization tool.

Authors: Mujeen Sung; Minbyul Jeong; Yonghwa Choi; Donghyeon Kim; Jinhyuk Lee; Jaewoo Kang
Journal: Bioinformatics Date: 2022-10-14 Impact factor: 6.931

6. Identification of lncRNA Biomarkers and LINC01198 Promotes Progression of Chronic Rhinosinusitis with Nasal Polyps through Sponge miR-6776-5p.

Authors: Xueping Wang; Xiaoyuan Zhu; Li Peng; Yulin Zhao
Journal: Biomed Res Int Date: 2022-05-06 Impact factor: 3.246

7. HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning.

Authors: Xian Zhu; Yueming Gu; Zhifeng Xiao
Journal: Front Genet Date: 2022-04-27 Impact factor: 4.772