| Literature DB >> 36061173 |
Miaosen Liu1, Jian Yang2, Huilong Duan2, Lan Yu3, Dingwen Wu3, Haomin Li3.
Abstract
New technologies, such as next-generation sequencing, have advanced the ability to diagnose diseases and improve prognosis but require the identification of thousands of variants in each report based on several databases scattered across places. Curating an integrated interpretation database is time-consuming, costly, and needs regular update. On the other hand, the automatic curation of knowledge sources always results in overloaded information. In this study, an automated pipeline was proposed to create an integrated visual single-nucleotide polymorphism (SNP) interpretation tool called SNPMap. SNPMap pipelines periodically obtained SNP-related information from LitVar, PubTator, and GWAS Catalog API tools and presented it to the user after extraction, integration, and visualization. Keywords and their semantic relations to each SNP are rendered into two graphs, with their significance represented by the size/width of circles/lines. Moreover, the most related SNPs for each keyword that appeared in SNPMap were calculated and sorted. SNPMap retains the advantage of an automatic process while assisting users in accessing more lucid and detailed information through visualization and integration with other materials.Entities:
Keywords: precision medicine; single-nucleotide polymorphism; variant interpretation; visualization; web application
Year: 2022 PMID: 36061173 PMCID: PMC9437274 DOI: 10.3389/fgene.2022.985500
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1The data processing workflow of SNPMap. (A) The process of obtaining keyword information and rendering graph from SNPs. (B) The process of calculating SNP or external keyword connections of keywords mentioned in SNPMap.
FIGURE 2The visualization of SNP associated keywords. (A) Show all keywords. (B) Show less keywords. Nodes represent keywords while edges represent connections. Sizes of the nodes represent the significance of keywords, while width of the edges represent the strength of connections. Colors of the nodes are assigned based on the different categories (genes, diseases, chemicals) of the keywords.
Most frequent keywords in SNPMap.
| Keyword | Frequency |
|---|---|
| Cancer | 48412 |
| Tumor | 35927 |
| Breast cancer | 20082 |
| BRCA1 | 12951 |
| BRCA2 | 11636 |
| Diabetes | 11312 |
| TP53 | 10779 |
| Colorectal cancer | 9984 |
| Lipid | 9962 |
| Alzheimer’s disease | 9851 |
| EGFR | 8442 |
| AD | 8008 |
| Toxicity | 7852 |
| Parkinson’s disease | 7733 |
| Inflammation | 7302 |
| BRAF | 7178 |
| KRAS | 7172 |
| Hypertension | 6968 |
| Lung cancer | 6879 |
| Cholesterol | 6672 |
FIGURE 3Graph containing 654 SNPMap keywords (nodes) and their internal connections (edges). Keywords with less ten total counts or 50 connections are excluded. Sizes of nodes represent the total counts of keywords. Colors of nodes represent the numbers of connections to keywords.
FIGURE 4The difference of keywords between SNPMap and ClinVar of 100 random selected SNPs. (A) Numbers of keywords mentioned only in SNPMap, only in ClinVar, or in both. (B) Numbers of keywords mentioned in SNPMap, ClinVar. (C) Numbers of keywords mentioned in only one, or both platforms. The Venn diagram is generated with jvenn (Bardou et al., 2014).
Some selected SNPs and comparisons of their concepts under SNPMap and ClinVar.
| SNPMap (diseases) | ClinVar | |
|---|---|---|
| rs146632606 |
|
|
| rs7482144 |
|
|
| rs80358086 |
|
|
| rs137853334 |
|
|
| rs199498900 |
|
|
| rs111656822 |
|
|
FIGURE 5Comparing SNPMap with LitVar. (A) rs10993994 in SNPMap. (B) rs10993994 in LitVar.
Some selected SNPs and comparisons of their concepts under LitVar and SNPMap.
| SNPMap (Top keywords) | LitVar (Top keywords) | |
|---|---|---|
| rs10993994 | Diseases: Prostatic Neoplasms (93), Neoplasms (29), Mental Disorders (15), Breast Neoplasms (9), Colorectal Neoplasms (5) | Prostate cancer (228), Cancer (89), MSMB (43), PCA (35), Prostate-specific antigen (29), Tumor (28), PSA (25), KLK3 (16), Mortality (16), Androgen receptor, (15) etc |
| Chemicals: Igsf5 protein, rat (9), Androgens (6), Calcium (4), SS-B antigen (3), Carbon (3) | ||
| rs334 | Diseases: Sickle Cell Anemia (81), Systemic carnitine deficiency (48), Malaria (43), Anemia (40), Thalassemia (16), Genetic Diseases, Inborn (13) | Malaria (93), Sickle cell disease (91), HBB (53), Anemia (45), SCD (44), Thalassemia (36), Stroke (27), Mortality (26), Hydroxyurea (21), Infection, (21) etc |
| Chemicals: Glutamic Acid (13), Valine (10), Valine-Valine-Saquinavir (9), Oxytocin, Glu (4)- (8), Adenine (5) | ||
| rs7903146 | Diseases: Diabetes Mellitus (507), Type 2 Diabetes Mellitus (237), Obesity (130), Glucose Intolerance (40), Stroke (35) | Diabetes (692), TCF7L2 (556), Type 2 Diabetes (382), Glucose (303), Insulin (251), Transcription factor 7-like 2 (206), Obesity (154), Diabetic (134), Diabetes mellitus (127), Type 2 Diabetes Mellitus, (92) etc |
| Chemicals: Glucose (195), Cholesterol (41), Triglycerides (40), Metformin (27), Carbohydrates (22) | ||
| rs112445441 | Diseases: Colorectal Neoplasms (510), Neoplasms (451), Adenomatous Polyposis (59), Carcinoma, Non-Small-Cell (47), Melanoma (44) | KRAS (1198), Cancer (1084), Tumor (1030), Colorectal cancer (1018), EGFR (490), BRAF (466), CRC (386), NRAS (266), PIK3CA (240), Epidermal Growth Factor Receptor, (237) etc |
| Chemicals: AT 61 (53), Cetuximab (52), Guanosine Triphosphate (41), Glycine (32), irinotecan (25) | ||
| rs121913500 | Diseases: Glioma (774), Neoplasms (678), Glioblastoma (340), Astrocytoma (291), Oligodendroglioma (228) | Glioma (1328), Tumor (1307), Glioblastoma (890), Cancer (842), IDH1 (578), IDH (504), GBM (290), Brain tumor (246), Astrocytoma (242), IDH1/2, (184) etc |
| Chemicals: Alpha-hydroxyglutarate (144), Isocitrates (144), Arginine Vasopressin (72), Activated-Leukocyte Cell Adhesion Molecule (67), Histidine-pyridine-histidine-3 (55) |