| Literature DB >> 35045866 |
Xuetong Zhao1,2, Yang Sui1,2, Xiuyan Ruan1, Xinyue Wang1,2, Kunlun He3,4, Wei Dong5, Hongzhu Qu6,7, Xiangdong Fang8,9.
Abstract
BACKGROUND: Heart failure with preserved ejection fraction (HFpEF), affected collectively by genetic and environmental factors, is the common subtype of chronic heart failure. Although the available risk assessment methods for HFpEF have achieved some progress, they were based on clinical or genetic features alone. Here, we have developed a deep learning framework, HFmeRisk, using both 5 clinical features and 25 DNA methylation loci to predict the early risk of HFpEF in the Framingham Heart Study Cohort.Entities:
Keywords: DNA methylation; Deep learning; Early risk prediction; Heart failure with preserved ejection fraction
Mesh:
Year: 2022 PMID: 35045866 PMCID: PMC8772140 DOI: 10.1186/s13148-022-01232-8
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Fig. 1Overview of study population and study design. FHS Framingham Heart Study, UMN University of Minnesota, JHU Johns Hopkins University, CHF chronic heart failure, LVEF Left ventricular ejection fraction, HFpEF heart failure with preserved ejection fraction
Demographic of participants in the training set and testing set (the simplified version)
| Training set | Testing set | |||||
|---|---|---|---|---|---|---|
| No-CHF ( | HFpEF ( | No-CHF ( | HFpEF ( | |||
| Male | 268 (36.3) | 30 (50.8) | 0.037 | 100 (71.9) | 21 (65.6) | 0.62 |
| Age, years | 64.0 ± 7.93 | 73.3 ± 7.84 | < 0.001 | 68.6 ± 8.48 | 76.8 ± 7.73 | < 0.001 |
| Smoking | 63 (8.5) | 3 (5.1) | 0.496 | 5 (3.6) | 2 (6.2) | 0.85 |
| BMI, kg/m2 | 27.9 ± 5.20 | 30.1 ± 5.97 | 0.004 | 28.4 ± 4.18 | 30.5 ± 5.92 | 0.041 |
| Fasting blood glucose, mg/dL† | 104 ± 21.8 | 118 ± 36.7 | < 0.001 | 110 ± 26.7 | 111 ± 24.8 | 0.52 |
| LDL cholesterol, mg/dL† | 112 ± 30.5 | 100 ± 23.6 | 0.011 | 95.2 ± 29.8 | 87.4 ± 28.4 | 0.36 |
| HDL cholesterol, mg/dL | 58.6 ± 16.9 | 52.6 ± 16.5 | 0.007 | 53.7 ± 14.4 | 47.2 ± 16.9 | 0.0087 |
| Average diastolic blood pressure, mmHg | 74.3 ± 9.56 | 71.3 ± 11.3 | 0.044 | 73.2 ± 10.2 | 64.0 ± 10.6 | < 0.001 |
| Average systolic blood pressure, mmHg | 127 ± 16.6 | 137 ± 16.9 | < 0.001 | 130 ± 16.5 | 133 ± 23.7 | 0.49 |
| Total cholesterol, mg/dL | 194 ± 36.4 | 177 ± 29.9 | 0.002 | 170 ± 34.5 | 159 ± 44.9 | 0.089 |
| Triglycerides, mg/dL | 116 ± 60.1 | 122 ± 54.9 | 0.22 | 109 ± 56.7 | 115 ± 94.9 | 0.63 |
| Creatinine serum, mg/dL† | 0.87 ± 0.20 | 1.13 ± 0.73 | < 0.001 | 0.986 ± 0.24 | 1.38 ± 1.13 | 0.069 |
| Creatinine urine, mg/100 mL† | 101 ± 60.5 | 108 ± 56.8 | 0.25 | 113 ± 79.4 | 104 ± 62.3 | 0.92 |
| Albuminuria urine, mg/L† | 11.2 ± 38.5 | 93.0 ± 255 | < 0.001 | 11.6 ± 21.5 | 116 ± 240 | < 0.001 |
| Hemoglobin A1c, whole blood, % | 5.66 ± 0.61 | 5.99 ± 1.19 | 0.017 | 5.79 ± 0.844 | 6.14 ± 0.96 | 0.011 |
| C reactive protein, mg/L† | 3.24 ± 6.91 | 3.82 ± 4.19 | 0.004 | 2.06 ± 2.02 | 5.33 ± 9.62 | 0.0012 |
| Ejection fraction, % † | 66.6 ± 5.14 | 66.1 ± 6.57 | 0.85 | 65.6 ± 5.24 | 67.3 ± 7.44 | 0.13 |
| Ventricular rate per minute by ECG, beats/min | 62.1 ± 10.0 | 63.5 ± 10.1 | 0.22 | 59.7 ± 9.46 | 59.7 ± 13.0 | 0.85 |
| Atrial fibrillation | 14 (1.9) | 6 (10.2) | < 0.001 | 16 (11.5) | 21 (65.6) | < 0.001 |
| Stroke | 2 (0.3) | 1 (1.7) | 0.54 | 15 (10.8) | 7 (21.9) | 0.16 |
| Left ventricular hypertrophy† | 5 (0.7) | 2 (3.4) | 0.15 | 0 (0) | 0 (0) | – |
| Atrial enlargement† | 8 (1.1) | 4 (6.8) | 0.003 | 6 (4.3) | 2 (6.2) | 1 |
| Coronary heart disease | 17 (2.3) | 9 (15.3) | < 0.001 | 45 (32.4) | 16 (50.0) | 0.095 |
| Myocardial infarction | 3 (0.4) | 0 (0) | 1 | 24 (17.3) | 7 (21.9) | 0.72 |
| Right ventricular hypertrophy‡ | 0 (0) | 0 (0) | – | 0 (0) | 0 (0) | – |
| Aspirin | 239 (32.4) | 31 (52.5) | 0.003 | 89 (64.0) | 21 (65.6) | 1 |
| Folic acid | 30 (4.1) | 6 (10.2) | 0.065 | 11 (7.9) | 4 (12.5) | 0.631 |
| Statin | 220 (29.8) | 24 (40.7) | 0.11 | 94 (67.6) | 21 (65.6) | 0.993 |
| Thiazides | 86 (11.7) | 9 (15.3) | 0.54 | 22 (15.8) | 7 (21.9) | 0.575 |
| Diuretics | 17 (2.3) | 12 (20.3) | < 0.001 | 4 (2.9) | 10 (31.2) | < 0.001 |
| Potassium | 21 (2.8) | 2 (3.4) | 1 | 2 (1.4) | 0 (0) | 1 |
| Aldosterone | 6 (0.8) | 1 (1.7) | 1 | 10 (7.2) | 4 (12.5) | 0.529 |
| Amiodarone | 2 (0.3) | 0 (0) | 1 | 0 (0) | 0 (0) | – |
| Omega 3 | 73 (9.9) | 4 (6.8) | 0.583 | 24 (17.3) | 3 (9.4) | 0.404 |
| Vasodilators | 6 (0.8) | 1 (1.7) | 1 | 10 (7.2) | 4 (12.5) | 0.529 |
| Co-Q 10 | 18 (2.4) | 1 (1.7) | 1 | 4 (2.9) | 1 (3.1) | 1 |
| ß-blocker | 128 (17.3) | 23 (39.0) | < 0.001 | 61 (43.9) | 20 (62.5) | 0.0882 |
| Angiotensin II antagonists | 41 (5.6) | 10 (16.9) | 0.002 | 12 (8.6) | 5 (15.6) | 0.388 |
| ACEI | 133 (18.0) | 19 (32.2) | 0.013 | 52 (37.4) | 15 (46.9) | 0.431 |
| Warfarin | 13 (1.8) | 3 (5.1) | 0.204 | 4 (2.9) | 3 (9.4) | 0.239 |
| Clopidogrel | 4 (0.5) | 1 (1.7) | 0.824 | 9 (6.5) | 6 (18.8) | 0.062 |
Categorical variables and continuous variables with Chi-square test and Mann–Whitney U test were used for two-group comparison
Values are mean ± SD or n (%). P value is the comparison of heart failure patients versus non-heart failure controls
CHF chronic heart failure, HFpEF heart failure with preserved ejection fraction, LDL low density lipoprotein, HDL high density lipoprotein, ACEI angiotensin-converting enzyme inhibitor
†Missing sample less than 20%. ‡ Missing sample more than 20%
Fig. 230 features obtained by LASSO and XGBoost algorithms. a AUC with different number of characteristics as revealed by the LASSO model. b Misclassification error for different number of features revealed by the LASSO model. In a and b, the grey lines represent the standard error and the vertical dotted lines represent optimal values by minimum criteria (left) and the largest value of lambda such that the error is within one standard error of the minimum (right). The upper abscissa is the number of non-zero coefficients in the model at this time and the lower abscissa is log Lambda, which is the tuning parameter used for tenfold cross-validation in the LASSO model. c The intersection of non-zero coefficients in a and b. 80 non-zero coefficients are obtained in the LASSO model. d The best model features were ranked based on the gain index in xgboost model. The xgboost model further simplified the 80 features from the LASSO model, and finally, 30 valid features were obtained. The gain index represents the fractional contribution of each feature to the model based on the total gain of this feature’s splits
Fig. 3Performance of the HFmeRisk model. a AUC results of the prediction performance according to different features in the testing set. “(HFmeRisk/EHR/CpG model)” indicates the model with EHR and DNA methylation data, the model with DNA methylation data only, and the model with EHR data only, respectively. b Calibration plot of the DeepFM model in the testing set using 30 features. The Hosmer–Lemeshow statistic was 6.17, with P = 0.632. c Decision curve analyses of the HFmeRisk, 5 EHR model risk and 25 CpGs model risk in the testing cohort. d AUC results for the HFmeRisk model versus the Willliam’s model in male/female participants. e The association of CpG (cg10083824/cg03233656) and its DMG expression (GRM4/SLC1A4) in blood samples of FHS participants. X-axis is beta value of DNA methylation, Y-axis is expression value of RNA data. Rug plots display individual cases in X- and Y-axis. The smooth curve shows linear smooths in case/control status. The Pearson's correlation between CpG and DMG is driven mainly by case–control status. DMG, differentially methylated gene. The triangle represents the no-CHF participants; the dot represents the HFpEF participants
The 25 CpGs associated with HFmeRisk model
| Probe | Chr | Position | Closest gene | Distance to gene | Side | UCSC RefGene Group | Relation to UCSC CpG Island | Enhancer |
|---|---|---|---|---|---|---|---|---|
| cg00045910 | chr10 | 23,466,070 | PTF1A | 15,184 | R | IGR | S Shelf | NA |
| cg00495303 | chr18 | 3,771,110 | DLGAP1 | 0 | – | Body | N Shore | NA |
| cg00522231 | Chr2 | 9,549,277 | ITGB1BP1 | 0 | – | Body | Open sea | NA |
| cg03233656 | chr2 | 65,214,625 | SLC1A4 | 0 | – | TSS1500 | N Shore | NA |
| cg03556243 | Chr3 | 114,343,779 | ZBTB20 | 0 | – | 5'UTR;1stExon;TSS1500 | Open sea | NA |
| cg05363438 | chr1 | 224,301,382 | FBXO28 | 0 | – | TSS1500 | N Shore | NA |
| cg05481257 | chr2 | 20,870,211 | GDF7 | 0 | – | Body | Island | NA |
| cg05845376 | chr5 | 140,683,632 | SLC25A2 | 0 | – | TSS200 | Island | NA |
| cg06344265 | chr11 | 120,530,973 | GRIK4 | 0 | – | TSS200 | Open sea | NA |
| cg07041999 | chr8 | 2,178,272 | MYOM2 | − 64,796 | L | IGR | Open sea | NA |
| cg08101977 | chr16 | 1,231,407 | CACNA1H | 0 | – | Body | S Shore | NA |
| cg08614290 | chr7 | 158,938,491 | VIPR2 | 0 | – | TSS1500 | Island | NA |
| cg10083824 | chr6 | 34,102,147 | GRM4 | 0 | – | TSS1500 | Open sea | NA |
| cg10556349 | chr10 | 835,070 | DIP2C | − 99,386 | L | IGR | Open sea | NA |
| cg11853697 | chr20 | 60,510,235 | CDH4 | 0 | – | Body | N Shore | TRUE |
| cg13352914 | chr1 | 63,760,405 | FOXD3 | 28,323 | R | IGR | Open sea | TRUE |
| cg16781992 | chr4 | 20,985,623 | KCNIP4 | 0 | – | Body;5'UTR | Open sea | NA |
| cg17766026 | chr10 | 102,405,781 | HIF1AN | − 86,025 | L | IGR | Open sea | TRUE |
| cg20051875 | chr12 | 68,201,286 | DYRK2 | − 142,099 | L | IGR | Open sea | TRUE |
| cg21024264 | chr10 | 135,341,025 | CYP2E1 | 0 | – | 1stExon | N Shore | NA |
| cg21429551 | chr7 | 30,635,762 | GARS | 0 | – | Body | S Shore | NA |
| cg23299445 | chr15 | 73,113,226 | ADPGK | − 35,038 | L | IGR | Open sea | TRUE |
| cg24205914 | chr10 | 62,761,575 | RHOBTB1 | 0 | – | TSS1500 | Island | NA |
| cg25755428 | chr19 | 13,875,111 | MRI1 | 0 | – | TSS1500 | Island | NA |
| cg27401945 | chr10 | 118,919,088 | VAX1 | − 21,275 | L | IGR | N Shelf | TRUE |
Fig. 4Gene ontology categories and pathways analysis of DNA methylation loci in HFmeRisk model. a Gene ontology enrichment of CpG loci (MF molecular function, CC cellular component, BP biological process). b KEGG pathways enrichment of CpG loci. In a and b, the red line is where the − log10 P values = 1.3 (P = 0.05). c Using ReactomePA, pathways are sorted based on the fold enrichment (x-axis). Fold enrichment was defined as the ratio of two proportions, the gene ratio and the BG ratio; Gene ratio indicates the number of genes annotated to a pathway within the specific list of differential genes among the major contributors that are included in the database; BG ratio denotes the total number of genes in the gene set and the total number of all genes in any gene set. The size of the dot indicates the number of genes that are annotated to the pathway, and the color of the dot indicates the P values. From these values, the raw “P values” is calculated using a hypergeometric test. d Correspondence between genes and pathways in c