| Literature DB >> 30567402 |
Meeshanthini V Dogan1,2,3, Steven R H Beach4, Ronald L Simons5, Amaury Lendasse6,7, Brandan Penaluna8, Robert A Philibert9,10,11,12.
Abstract
An improved approach for predicting the risk for incident coronary heart disease (CHD) could lead to substantial improvements in cardiovascular health. Previously, we have shown that genetic and epigenetic loci could predict CHD status more sensitively than conventional risk factors. Herein, we examine whether similar machine learning approaches could be used to develop a similar panel for predicting incident CHD. Training and test sets consisted of 1180 and 524 individuals, respectively. Data mining techniques were employed to mine for predictive biosignatures in the training set. An ensemble of Random Forest models consisting of four genetic and four epigenetic loci was trained on the training set and subsequently evaluated on the test set. The test sensitivity and specificity were 0.70 and 0.74, respectively. In contrast, the Framingham risk score and atherosclerotic cardiovascular disease (ASCVD) risk estimator performed with test sensitivities of 0.20 and 0.38, respectively. Notably, the integrated genetic-epigenetic model predicted risk better for both genders and very well in the three-year risk prediction window. We describe a novel DNA-based precision medicine tool capable of capturing the complex genetic and environmental relationships that contribute to the risk of CHD, and being mapped to actionable risk factors that may be leveraged to guide risk modification efforts.Entities:
Keywords: biomarkers; coronary heart disease; epigenetics; genetics; machine learning; risk factors; risk prediction
Year: 2018 PMID: 30567402 PMCID: PMC6315411 DOI: 10.3390/genes9120641
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary of demographics and conventional coronary heart disease (CHD) risk factors at the eighth examination cycle for the 1180 and 524 individuals in the training and test sets, respectively.
| Training ( | Test ( | |||
|---|---|---|---|---|
| CHD 1 | No CHD 2 | CHD 1 | No CHD 2 | |
|
| ||||
| Male | 23 | 462 | 12 | 219 |
| Female | 19 | 676 | 8 | 285 |
|
| ||||
| Male | 70.8 ± 9.7 | 65.2 ± 8.1 | 66.8 ± 7.5 | 61.6 ± 8.6 |
| Female | 68.5 ± 9.0 | 65.6 ± 8.2 | 70.3 ± 10.7 | 63.8 ± 9.1 |
|
| ||||
| Male | 166 ± 50 | 178 ± 32 | 173 ± 34 | 185 ± 32 |
| Female | 204 ± 50 | 201 ± 36 | 181 ± 38 | 196 ± 33 |
|
| ||||
| Male | 48 ± 15 | 50 ± 14 | 49 ± 17 | 51 ± 15 |
| Female | 53 ± 16 | 66 ± 19 | 60 ± 19 | 66 ± 19 |
|
| ||||
| Male | 5.8 ± 0.4 | 5.7 ± 0.8 | 5.7 ± 0.7 | 5.6 ± 0.5 |
| Female | 5.9 ± 0.8 | 5.7 ± 0.5 | 5.6 ± 0.4 | 5.6 ± 0.5 |
|
| ||||
| Male | 134 ± 17 | 130 ± 17 | 138 ± 25 | 128 ± 16 |
| Female | 137 ± 19 | 128 ± 18 | 135 ± 24 | 125 ± 17 |
|
| ||||
| Male | 73 ± 11 | 77 ± 10 | 75 ± 8 | 78 ± 9 |
| Female | 77 ± 9 | 73 ± 10 | 66 ± 10 | 73 ± 10 |
|
| ||||
| Male | 1 (4%) | 29 (6%) | 3 (25%) | 12 (5%) |
| Female | 2 (11%) | 49 (7%) | 0 (0%) | 28 (10%) |
|
| ||||
| Male | 16 (70%) | 223 (48%) | 7 (58%) | 86 (39%) |
| Female | 10 (53%) | 250 (37%) | 4 (50%) | 114 (40%) |
1 Those who developed symptomatic coronary heart disease (CHD) within five years of contributing biomaterial during the Offspring cohort eighth examination cycle. 2 Those who did not develop symptomatic CHD within five years of contributing biomaterial during the Offspring cohort eighth examination cycle. HDL: high-density lipoprotein, SBP: systolic blood pressure, DBP: diastolic blood pressure.
Integrated genetic-epigenetic Random Forest model prediction performance in the training set.
| Model | OOB Error Rate | AUC | Sensitivity | Specificity |
|---|---|---|---|---|
| 1 | 0.24 | 0.76 | 0.71 | 0.67 |
| 2 | 0.22 | 0.78 | 0.67 | 0.68 |
| 3 | 0.19 | 0.81 | 0.71 | 0.71 |
| 4 | 0.20 | 0.80 | 0.76 | 0.74 |
| 5 | 0.13 | 0.87 | 0.88 | 0.74 |
| 6 | 0.22 | 0.78 | 0.69 | 0.67 |
| 7 | 0.17 | 0.83 | 0.74 | 0.72 |
| 8 | 0.10 | 0.90 | 0.81 | 0.86 |
| 9 | 0.15 | 0.85 | 0.76 | 0.74 |
| 10 | 0.11 | 0.89 | 0.88 | 0.76 |
| 11 | 0.14 | 0.86 | 0.81 | 0.74 |
| 12 | 0.14 | 0.86 | 0.74 | 0.80 |
| 13 | 0.13 | 0.87 | 0.79 | 0.77 |
| 14 | 0.20 | 0.80 | 0.74 | 0.74 |
| 15 | 0.18 | 0.82 | 0.71 | 0.77 |
| 16 | 0.22 | 0.78 | 0.71 | 0.70 |
| 17 | 0.20 | 0.80 | 0.71 | 0.76 |
| 18 | 0.24 | 0.76 | 0.67 | 0.63 |
| 19 | 0.19 | 0.81 | 0.74 | 0.76 |
| 20 | 0.23 | 0.77 | 0.74 | 0.56 |
| 21 | 0.20 | 0.80 | 0.76 | 0.70 |
| 22 | 0.18 | 0.82 | 0.79 | 0.79 |
| 23 | 0.27 | 0.73 | 0.67 | 0.72 |
| 24 | 0.10 | 0.90 | 0.86 | 0.81 |
| 25 | 0.18 | 0.82 | 0.81 | 0.67 |
| 26 | 0.17 | 0.83 | 0.79 | 0.72 |
| 27 | 0.19 | 0.81 | 0.69 | 0.74 |
OOB: out-of-bag; AUC: area under the receiver operating characteristic curve
The confusion matrix of the integrated genetic-epigenetic ensemble of 27 models on the test dataset consisting of 524 individuals for predicting the five-year risk of developing symptomatic CHD.
| Predicted | ||
|---|---|---|
| True | Not at High Risk | At High Risk |
|
| 372 | 132 |
|
| 6 | 14 |
Figure 1The performance of the integrated genetic-epigenetic tool when identifying those at high risk of symptomatic CHD within five years by age, gender, and days to event.
The nominal p-value of the main and interaction terms of genetic and epigenetic biosignatures to conventional CHD risk factors.
| Locus | Total Cholesterol | HDL Cholesterol | HbA1c | Smoking Status | SBP | DBP | Age | Gender |
|---|---|---|---|---|---|---|---|---|
| rs2599737 | 0.94 | 0.81 | 0.55 | 0.59 | 0.29 | 0.95 | 0.28 | 0.28 |
| rs6797484 | 0.27 | 0.86 | 0.59 | 0.15 | 0.006 | 0.21 | 0.06 | 0.62 |
| cg26119740 | 0.68 | 0.74 | 0.20 | 0.98 | 0.47 | 0.003 | 0.007 | 0.02 |
| cg00524912 | 0.67 | 0.91 | 0.91 | 0.88 | 0.82 | 0.33 | 0.06 | 0.05 |
| rs898550 | 0.40 | 0.17 | 0.27 | 0.80 | 0.05 | 0.75 | 0.79 | 0.25 |
| cg24221633 | 0.47 | 0.56 | 0.07 | 0.03 | 0.12 | 0.007 | 0.45 | 0.20 |
| cg08224787 | 0.18 | 0.12 | 0.71 | 0.61 | 0.85 | 0.53 | 0.66 | 0.40 |
| rs7250088 | 0.80 | 0.67 | 0.36 | 0.02 | 0.64 | 0.83 | 0.44 | 0.18 |
| rs2599737: cg26119740 | 0.04 | 0.67 | 0.55 | 0.15 | 0.41 | 0.20 | 0.59 | 0.29 |
| rs2599737: cg00524912 | 0.25 | 0.03 | 0.94 | 0.67 | 0.49 | 0.07 | 0.88 | 0.34 |
| rs2599737: cg24221633 | 0.94 | 0.32 | 0.38 | 0.06 | 0.65 | 0.002 | 0.01 | 0.09 |
| rs2599737: cg08224787 | 0.68 | 0.08 | 0.82 | 0.78 | 0.50 | 0.53 | 0.36 | 0.54 |
| rs6797484: cg26119740 | 0.41 | 0.36 | 0.01 | 0.57 | 0.68 | 0.09 | 0.12 | 0.37 |
| rs6797484: cg00524912 | 0.04 | 0.54 | 0.10 | 0.16 | 0.97 | 0.96 | 0.29 | 0.97 |
| rs6797484: cg24221633 | 0.41 | 0.50 | 0.84 | 0.05 | 0.11 | 0.56 | 0.21 | 0.97 |
| rs6797484: cg08224787 | 0.63 | 0.49 | 0.17 | 0.19 | 0.95 | 0.74 | 0.35 | 0.27 |
| rs898550: cg26119740 | 0.06 | 0.99 | 0.003 | 0.59 | 0.72 | 0.85 | 0.36 | 0.97 |
| rs898550: cg00524912 | 0.66 | 0.13 | 0.55 | 0.88 | 0.39 | 0.59 | 0.32 | 0.32 |
| rs898550: cg24221633 | 0.45 | 0.09 | 0.09 | 0.26 | 0.55 | 0.49 | 0.86 | 0.15 |
| rs898550: cg08224787 | 0.08 | 0.14 | 0.87 | 0.55 | 1.00 | 0.48 | 0.72 | 0.26 |
| rs7250088: cg26119740 | 0.35 | 0.92 | 0.71 | 0.89 | 0.13 | 0.26 | 0.98 | 0.31 |
| rs7250088: cg00524912 | 0.81 | 0.44 | 0.88 | 0.98 | 0.79 | 0.79 | 0.61 | 0.48 |
| rs7250088: cg24221633 | 0.46 | 0.05 | 0.68 | 0.35 | 0.22 | 0.63 | 0.71 | 0.20 |
| rs7250088: cg08224787 | 0.81 | 0.22 | 0.84 | 0.89 | 0.40 | 0.95 | 0.43 | 0.59 |
The confusion matrix of the Framingham risk score on the test dataset for predicting a high risk of developing symptomatic CHD within five years.
| Predicted | ||
|---|---|---|
| True | Not at High Risk | At High Risk |
|
| 423 | 20 |
|
| 12 | 3 |
The confusion matrix of the ASCVD risk estimator on the test dataset for predicting a high risk of developing symptomatic CHD within five years.
| Predicted | ||
|---|---|---|
| True | Not at High Risk | At High Risk |
|
| 382 | 69 |
|
| 10 | 6 |
Figure 2The performance of the Framingham risk score model when identifying those at high risk of symptomatic CHD within five years by age, gender, and days to event.
Figure 3The performance of the ASCVD risk estimator when identifying those at high risk of symptomatic CHD within five years by age, gender, and days to event.