| Literature DB >> 34572079 |
Ning-I Yang1,2,3, Chi-Hsiao Yeh2,3,4, Tsung-Hsien Tsai5, Yi-Ju Chou6, Paul Wei-Che Hsu6, Chun-Hsien Li5, Yun-Hsuan Chan5, Li-Tang Kuo1,3, Chun-Tai Mao1,2,3, Yu-Chiau Shyu2,7, Ming-Jui Hung1,2,3, Chi-Chun Lai2,3,8, Huey-Kang Sytwu9,10, Ting-Fen Tsai6,11,12.
Abstract
Heart failure (HF) is a global pandemic public health burden affecting one in five of the general population in their lifetime. For high-risk individuals, early detection and prediction of HF progression reduces hospitalizations, reduces mortality, improves the individual's quality of life, and reduces associated medical costs. In using an artificial intelligence (AI)-assisted genome-wide association study of a single nucleotide polymorphism (SNP) database from 117 asymptomatic high-risk individuals, we identified a SNP signature composed of 13 SNPs. These were annotated and mapped into six protein-coding genes (GAD2, APP, RASGEF1C, MACROD2, DMD, and DOCK1), a pseudogene (PGAM1P5), and various non-coding RNA genes (LINC01968, LINC00687, LOC105372209, LOC101928047, LOC105372208, and LOC105371356). The SNP signature was found to have a good performance when predicting HF progression, namely with an accuracy rate of 0.857 and an area under the curve of 0.912. Intriguingly, analysis of the protein connectivity map revealed that DMD, RASGEF1C, MACROD2, DOCK1, and PGAM1P5 appear to form a protein interaction network in the heart. This suggests that, together, they may contribute to the pathogenesis of HF. Our findings demonstrate that a combination of AI-assisted identifications of SNP signatures and clinical parameters are able to effectively identify asymptomatic high-risk subjects that are predisposed to HF.Entities:
Keywords: artificial intelligence; genetic factors; heart failure; single nucleotide polymorphism
Mesh:
Year: 2021 PMID: 34572079 PMCID: PMC8470162 DOI: 10.3390/cells10092430
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Baseline characteristics of both groups.
| Stage A HF 1 | Stage B HF | ||
|---|---|---|---|
| Age, years | 68.35 ± 6.19 | 69.88 ± 6.53 | 0.156 |
| Male, n (%) | 72 (63.7) | 32 (65.3) | 1.00 |
| BMI 2, kg/m 2 | 26.60 ± 3.76 | 27.21 ± 3.80 | 0.342 |
| Smoking, n (%) | 22 (19.5) | 12 (24.5) | 0.53 |
| ASCVD 3 risk score, % | 24 ± 11.25 | 27.46 ± 11.83 | 0.087 |
| Coronary artery disease, n (%) | 34 (30.1) | 11 (22.4) | 0.35 |
| Hypertension, n (%) | 93 (82.3) | 44 (89.9) | 0.343 |
| Diabetes mellitus, n (%) | 40 (35.4) | 16 (32.7) | 0.858 |
| Urine Alb/Cre 4 (mg/g) | 67.21 ± 173.22 | 175.26 ± 379.36 | 0.084 |
| NT-proBNP 5 (pg/mL) | 78.30 ± 87.55 | 117.35 ± 114.93 | 0.040 |
| Hs-Tnt 6 (ng/L) | 10.37 ± 5.47 | 12.70 ± 9.67 | 0.128 |
| HbA1c 7 (%) | 6.44 ± 9.85 | 6.58 ± 1.56 | 0.549 |
| Creatinine (mg/dL) | 1.00 ± 0.27 | 1.06 ± 0.40 | 0.287 |
| AST 8 (U/L) | 24.79 ± 11.44 | 30.80 ± 37.52 | 0.277 |
| ALT 9 (U/L) | 29.05 ± 17.13 | 38.19 ± 66.93 | 0.351 |
| Sodium (mEq/L) | 141.41 ± 1.92 | 141.65 ± 2.56 | 0.501 |
| Potassium (mEq/L) | 4.33 ± 0.40 | 4.27 ± 0.41 | 0.393 |
| Albumin (g/dL) | 4.58 ± 0.23 | 4.52 ± 0.29 | 0.194 |
| Total bilirubin (mg/dL) | 0.68 ± 0.28 | 0.66 ± 0.25 | 0.561 |
| Total protein (g/dL) | 7.36 ± 0.40 | 7.31 ± 0.36 | 0.494 |
| Uric acid (mg/dL) | 6.03 ± 1.48 | 5.73 ± 1.55 | 0.252 |
| HDL 10 (mg/dL) | 51.83 ± 13.4 | 52.24 ± 15.28 | 0.863 |
| LDL 11 (mg/dL) | 111.71 ± 26.09 | 111.02 ± 26.66 | 0.879 |
| Total cholesterol (mg/dL) | 178.60 ± 28.71 | 180 ± 35.05 | 0.758 |
| Insulin (μIU/mL) | 14.29 ± 7.50 | 13.79 ± 8.67 | 0.715 |
| Hs-CRP (mg/L) 12 | 2.69 ± 11.00 | 2.21 ± 2.30 | 0.772 |
| Ferritin (ng/mL) | 303.39 ± 240.80 | 280.78 ± 186.70 | 0.559 |
| Adiponectin (μg/mL) | 8.74 ± 8.86 | 13.13 ± 10.32 | 0.011 |
| Leptin (ng/mL) | 11.47 ± 11.41 | 10.77 ± 9.65 | 0.708 |
1. HF: heart failure; 2 BMI: body mass index; 3 ASCVD: atherosclerotic cardiovascular disease; 4 Alb/Cre: albumin creatinine ratio; 5 NT-proBNP: N-terminal pro b-type natriuretic peptide; 6 Hs-Tnt: high-sensitivity Troponin T; 7 HbA1c: hemoglobin A1c; 8 AST: aspartate aminotransferase; 9 ALT: alanine aminotransferase; 10 HDL: high-density lipoprotein; 11 LDL: low-density lipoprotein; and 12 Hs-CRP: high-sensitivity C-reactive protein.
Figure 1Representative echocardiographic images and the NT-proBNP levels of subjects in Stage A and B heart failure (HF). (A) The NT-proBNP levels of the subjects in Stage A and B HF. * indicated p < 0.05 by independent t-test. (B) Upper panel: a 68-year-old male subject with Stage A HF (body surface area (BSA) of 1.8 m2). His echocardiographic parameters, including interventricular septum thickness (IVS; A1, 7 mm, normal range 6–10 mm), left ventricle end diastolic volume index (LVEDVI; A2, 36 mL/m2; normal range < 95 mL/m2), and left atrium volume index (LAVI; A3, 31 ml/m2; normal range < 34 mL/m2) were within normal ranges. Bottom panel: a 70-year-old male subject with asymptomatic Stage B HF (BSA 1.9 m2). The echocardiographic examination results demonstrated myocardial remodeling, including thickened IVS (B1, 13 mm) and enlarged LVEDI (B2, 102 mL/m2) and LAVI (B3, 51 mL/m2). The red arrow indicates that the echocardiographic pictures were taken at the end-diastolic phase in A2 and B2, and at the end-systolic phase in A3 and B3.
Echocardiographic parameters of both groups.
| Stage A HF 1 | Stage B HF | ||
|---|---|---|---|
| Left atrial volume index (mL/m2) | 34.61 ± 9.62 | 42.92 ± 12.32 |
|
| Left ventricle EDVI 2 (mL/m2) | 54.22 ± 12.21 | 62.39 ± 19.79 |
|
| Left ventricle ESVI 3 (mL/m2) | 21.10 ± 5.69 | 23.64 ± 9.10 | 0.075 |
| Left ventricle EF 4, 2D (%) | 63.54 ± 6.58 | 63.92 ± 6.18 | 0.736 |
| Left ventricle mass index (g/m2) | 75.76 ± 17.56 | 104.45 ± 28.73 |
|
| Relative wall thickness | 0.28 ± 0.06 | 0.29 ± 0.07 | 0.201 |
| Peak GLS 5 (%) | −17.22 ± 3.06 | −18.04 ± 3.31 | 0.134 |
| Mitral valve E 6 velocity (cm/s) | 0.69 ± 0.19 | 0.74 ± 0.20 | 0.207 |
| Mitral valve A 7 velocity (cm/s) | 0.84 ± 0.23 | 0.96 ± 0.20 |
|
| Mitral valve deceleration time (ms) | 204.45 ± 64.00 | 201.50 ± 47.24 | 0.775 |
| Mitral valve E/A ratio | 1.28 ± 3.36 | 0.78 ± 0.21 | 0.306 |
| Tissue doppler septal S’ 8 (cm/s) | 6.28 ± 1.30 | 5.98 ± 1.12 | 0.177 |
| Tissue doppler septal E’ 9 (cm/s) | 5.98 ± 1.12 | 4.95 ± 1.37 |
|
| Septal E/E’ | 11.97 ± 3.74 | 15.26 ± 3.66 |
|
| TAPSE 10 (mm) | 22.46 ± 3.22 | 27.37 ± 34.28 | 0.307 |
1. HF: heart failure; 2 EDVI: end-diastolic volume index; 3 ESVI: end-systolic volume Index; 4 EF: ejection fraction; 5 GLS: global longitudinal strain; 6 E: early diastolic transmitral flow; 7 A: late diastolic transmitral flow; 8 E’: septal annulus mitral early diastolic tissue velocity; 9 S’: septal annulus mitral systolic tissue velocity; and 10 TAPSE: tricuspid annular plane systolic excursion.
Multiple logistic regression analysis for Stage B heart failure prediction.
| Odds Ratio | 95% Confidence Interval | ||
|---|---|---|---|
| Male | 1.456 | 0.543–3.902 | 0.455 |
| Hypertension | 1.976 | 0.574–6.805 | 0.280 |
| Diabetes Mellitus | 0.851 | 0.329–2.201 | 0.740 |
| Smoking | 2.396 | 0.836–6.862 | 0.104 |
| Urine Alb/Cre 1 | 1.002 | 1.000–1.003 | 0.065 |
| NT-proBNP 2 | 1.005 | 1.000–1.010 | 0.032 |
| Adiponectin | 1.031 | 0.986–1.078 | 0.182 |
| GLS 3 | 0.929 | 0.809–1.067 | 0.298 |
1 Alb/Cre: albumin creatinine ratio; 2 NT-proBNP: N-terminal pro b-type natriuretic peptide; and 3 GLS: global longitudinal strain.
Figure 2Prediction models built based on the clinical features and genome-wide SNPs were used to classify Stage A and Stage B using machine-learning methods. (A) Machine-learning workflow to select reliable and predictable features to build prediction models for the group classification. (B) The accuracy and AUC of the SVM model with different feature selection numbers. (C) Classification performance of the four machine-learning algorithms with selected feature subsets from genome-wide SNPs. (D) The SVM confusion matrix used for prediction accuracy. (E) Classification of selected SNPs.
Top 20 SNPs of both groups.
| SNPs | Position | Gene | Allele | Stage A HF | Stage B HF | |
|---|---|---|---|---|---|---|
| rs1999241 | chr10:26277498 | GAD2 | A A/G A/G G/0 0 | 34/44/4/1 | 17/6/11/0 | <0.001 |
| rs7645985 | chr3:194777389 | LINC01968 | C C/T C/T T | 78/4/1 | 21/13/0 | <0.001 |
| rs6516709 | chr21:25893395 | APP | A A/G A/G G | 37/43/3 | 12/11/11 | <0.001 |
| rs6078354 | chr20:11820605 | LINC00687 | C C/C T/T T | 3/22/58 | 7/17/10 | <0.001 |
| rs7725201 | chr5:180102124 | RASGEF1C | A A/G A/G G | 45/38/0 | 19/9/6 | <0.001 |
| rs10859918 | chr12:95652910 | PGAM1P5 | G G/T G/T T/0 0 | 45/26/11/1 | 8/23/3/0 | <0.001 |
| rs6110516 | chr20:15097697 | MACROD2 | A G/G G/A A | 13/70/0 | 13/18/3 | <0.001 |
| rs4693641 | chr4:83755117 | None | T T/C C/C T | 83/0/0 | 27/1/6 | <0.001 |
| rs5928104 | chrX:32899133 | DMD | C C/T T/C T | 3/80/0 | 0/27/7 | <0.001 |
| rs8084397 | chr18:76251014 | LOC105372209; LOC105372210 | A A/G A/G G/0 0 | 58/19/5/1 | 10/21/3/0 | <0.001 |
| rs4715127 | chr6:49320956 | None | C C/C T/T T | 6/22/55 | 0/22/12 | <0.001 |
| rs2496369 | chr6:49156247 | None | G G/G T/T T | 1/25/57 | 4/19/11 | <0.001 |
| Affx-2716217 | chr10:126685478 | DOCK1 | C C/T C/T T | 53/23/7 | 12/21/1 | <0.001 |
| rs56352414 | chr22:48917877 | None | C C/T C/T T | 50/32/1 | 16/10/8 | <0.001 |
| rs2806810 | chr13:103997325 | None | A A/C A/C C | 38/42/3 | 9/16/9 | <0.001 |
| rs4934985 | chr10:33616333 | None | C C/C T/T T/0 0 | 5/30/48/0 | 12/13/8/1 | <0.001 |
| rs6912291 | chr6:110065139 | None | C C/C T/T T | 1/34/48 | 6/3/25 | <0.001 |
| rs201036 | chr6:6708885 | LOC101928047; LOC101928004 | C C/T C/T T | 32/39/12 | 2/17/15 | <0.001 |
| rs9965164 | chr18:76220796 | LOC105372208 | C C/T C/T T | 59/19/5 | 10/21/3 | <0.001 |
| rs3813579 | chr16:79715379 | LOC105371356 | A A/G A/G G | 43/31/9 | 5/21/8 | <0.001 |
Allele: ‘0’ is no allele appearance.
Stage A vs. Stage B: feature importance.
| Rank | SNPs | Type | Gene | Gene Type | Function | Expression |
|---|---|---|---|---|---|---|
| 1 | rs1999241 | Intron variant | GAD2 | Protein-coding | Major autoantigen in insulin-dependent diabetes | Pancreas and brain |
| 2 | rs7645985 | Intron variant | LINC01968 | lcRNA 1 | N/A | Testis and placenta |
| 3 | rs6516709 | Intron variant | APP | Protein-coding | Neurite growth, neuronal adhesion, and axonogenesis | Brain |
| 4 | rs6078354 | Intron variant | LINC00687 | lcRNA 1 | N/A | Testis |
| 5 | rs7725201 | Stop gained | RASGEF1C | Protein-coding | Guanine nucleotide exchange factor | Ubiquitous |
| 6 | rs10859918 | Intron variant | PGAM1P5 | Pseudo | Pseudogene | Ubiquitous |
| 7 | rs6110516 | Intron variant | MACROD2 | Protein-coding | Removing ADP-ribose from mono-ADP-ribosylated proteins | Ubiquitous |
| 8 | rs5928104 | Intron variant | DMD | Protein-coding |
The DGC 3 bridges the inner cytoskeleton and ECM 4 Involved in DMD 5, BMD 6, or cardiomyopathy | Ubiquitous |
| 9 | rs8084397 | Intron variant | LOC105372209 and LOC105372210 | ncRNA 2 | N/A | Heart, testis, placenta, and brain |
| 10 | Affx-2716217 | Upstream variant | DOCK1 | Protein-coding |
Dedicator of the cytokinesis protein Phagocytosis and cell migration | Ubiquitous |
| 11 | rs201036 | Non-coding transcript; intron variant | LOC101928047 and LOC101928004 | ncRNA 2 | N/A | Heart, kidney, bone marrow, and fat |
| 12 | rs9965164 | Non-coding transcript | LOC105372208 | ncRNA 2 | N/A | Heart, testis, and placenta |
| 13 | rs3813579 | Intron variant | LOC105371356 | ncRNA 2 | N/A | Liver, testis, kidney, and skin |
1 lcRNA: long intergenic non-protein-coding RNA; 2 ncRNA: uncharacterized model non-protein-coding RNA; 3 DGC: dystrophin-glycoprotein complex; 4 ECM: extracellular matrix; 5 DMD: Duchenne muscular dystrophy; and 6 BMD: Becker muscular dystrophy.
Figure 3Pie charts indicating the genotype frequencies of the SNPs identified by AI-assisted analysis using the SNP datasets obtained from Stage A and B subjects. (A) Pie charts representing the AI-assisted SNPs within the protein-coding genes, excluding the pseudogene (PGAM1P5). Abbreviations include GAD2: glutamate decarboxylase 2; APP: amyloid beta precursor protein; RASGEF1C: RasGEF domain family member 1C; MACROD2: mono-ADP ribosylhydrolase 2; DMD: dystrophin; and DOCK1: dedicator of cytokinesis 1. (B) Pie charts representing the AI-assisted SNPs within the non-coding genes. rs8084397 is the SNP of LOC105372209 and LOC105372210, and rs201036 is the SNP of LOC101928047 and LOC101928004. # indicates that the signaling by the SNP array was lower than the calling rate. ** p < 0.005 by Chi-square test.
Figure 4Protein–protein interaction network of the genes containing the AI-assisted identification of SNPs. The protein–protein interaction network was established using the protein-coding genes that contain a SNP. The red dots indicate the genes containing the AI-assisted SNPs and the blue dots indicate the proteins that are connected to the proteins containing SNPs within the network. PGAM1P5 is annotated as a pseudogene. Abbreviations include VIRMA: Vir-like M6A Methyltransferase-associated; PRKACA: protein kinase CAMP-activated catalytic subunit alpha; TERF1: telomeric repeat binding factor 1; TERF2: telomeric repeat binding factor 2; and TRIM25: tripartite motif containing 25.