| Literature DB >> 32879331 |
Rui Fan1, Ning Zhang2, Longyan Yang2, Jing Ke2, Dong Zhao3, Qinghua Cui4.
Abstract
Type 2 diabetes mellitus (T2DM) is one common chronic disease caused by insulin secretion disorder that often leads to severe outcomes and even death due to complications, among which coronary heart disease (CHD) represents the most common and severe one. Given a huge number of T2DM patients, it is thus increasingly important to identify the ones with high risks of CHD complication but the quantitative method is still not available. Here, we first curated a dataset of 1,273 T2DM patients including 304 and 969 ones with or without CHD, respectively. We then trained an artificial intelligence (AI) model using randomly selected 4/5 of the dataset and use the rest data to validate the performance of the model. The result showed that the model achieved an AUC of 0.77 (fivefold cross-validation) on the training dataset and 0.80 on the testing dataset. To further confirm the performance of the presented model, we recruited 1,253 new T2DM patients as totally independent testing dataset including 200 and 1,053 ones with or without CHD. And the model achieved an AUC of 0.71. In addition, we implemented a model to quantitatively evaluate the risk contribution of each feature, which is thus able to present personalized guidance for specific individuals. Finally, an online web server for the model was built. This study presented an AI model to determine the risk of T2DM patients to develop to CHD, which has potential value in providing early warning personalized guidance of CHD risk for both T2DM patients and clinicians.Entities:
Mesh:
Year: 2020 PMID: 32879331 PMCID: PMC7467935 DOI: 10.1038/s41598-020-71321-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Brief description of each feature in the dataset with 1,273 subjects.
| Non-CHD | CHD | |
|---|---|---|
| 969 (76.1) | 304 (23.9) | |
| Female | 468 (48.2) | 161 (47.04) |
| Smokers | 370 (38.11) | 129 (42.43) |
| Drinking alcohol | 309 (31.82) | 97 (31.91) |
| Age | 54.07 ± 14.30 | 64.91 ± 9.75 |
| Course of hypertension, y | 5.45 (4.92, 5.98) | 10.33 (9.10, 11.55) |
| Course of diabetes, y | 7.26 (6.80, 7.73) | 11.35 (10.38, 12.36) |
| Systolic pressure (mmHg) | 129 ± 18.74 | 128.68 ± 20.94 |
| Diastolic pressure (mmHg) | 77.11 ± 11.61 | 72.80 ± 12.70 |
| Heart rate (beats per minute) | 83.42 ± 12.97 | 76.94 ± 11.13 |
| Body mass index (kg/m2) | 26.12 ± 4.04 | 26.38 ± 3.72 |
| Waist hip rate (W/R) | 0.94 ± 0.07 | 0.95 ± 0.07 |
| Blood platelet (*109/L) | 225.38 ± 68.91 | 205.14 ± 82.72 |
| Hemoglobin A1c (%) | 9.84 ± 2.28 | 9.38 ± 2.20 |
| Serum creatinine (µmol/L) | 67.25 ± 27.66 | 76.21 ± 31.97 |
| Uric acid (mmol/L) | 319.74 (312.95, 326.53) | 334.91 (321.41, 346.41) |
| Serum triglyceride (mmol/L) | 2.03 (1.88, 2.17) | 1.77 (1.58, 1.95) |
| Total cholesterol (mmol/L) | 4.71 ± 1.21 | 4.13 ± 1.17 |
| LDL cholesterol (mmol/L) | 3.02 ± 0.88 | 2.57 ± 0.87 |
| HDL cholesterol (mmol/L) | 1.08 ± 0.27 | 1.04 ± 0.28 |
| Fasting blood glucose (mmol/L) | 9.07 (8.78, 9.38) | 8.61 (8.15, 9.08) |
| Insulin 0 h (unit/mL) | 17.92 (14.26, 21.60) | 29.63 (18.98, 40.30) |
| Insulin 1 h (unit/mL) | 58.56 (53.64, 64.38) | 78.72 (64.38, 93.06) |
| Insulin 2 h (unit/mL) | 65.09 (59.82, 70.36) | 84.83 (69.47, 100.19) |
| Insulin 3 h (unit/mL) | 53.35 (48.16, 58.52) | 75.46 (60.06, 90.87) |
| c-peptide 0 h (ng/mL) | 1.57 (1.51, 1.64) | 1.82 (1.68, 1.95) |
| c-peptide 1 h (ng/mL) | 2.75 (2.63, 2.88) | 2.82 (2.62, 3.01) |
| c-peptide 2 h (ng/mL) | 4.13 (3.94, 4.33) | 4.16 (3.84, 4.48) |
| c-peptide 3 h (ng/mL) | 4.24 (4.05, 4.44) | 4.50 (4.14, 4.86) |
Figure 1The importance scores of the top 8 features calculated by information entropy function-based feature selection model.
Figure 2The ROC and the AUC of the predictive model on both the fivefold cross-validation (blue) and the independent testing dataset (green) using the dataset with all features (a) and using the dataset with top 8 selected features (b).
Performance scores of the predictive model.
| ACC | TPR (recall) | FPR | Precision | F1 | |
|---|---|---|---|---|---|
| Training set | 0.7892 | 0.2041 | 0.0129 | 0.8 | 0.2712 |
| 0.8088 | 0.5306 | 0.0968 | 0.625 | 0.5618 | |
| 0.701 | 0.8367 | 0.3419 | 0.4333 | 0.5612 | |
| Testing set | 0.7922 | 0.2131 | 0.0155 | 0.7857 | 0.2933 |
| 0.7255 | 0.5082 | 0.2113 | 0.4348 | 0.4615 | |
| 0.6353 | 0.8033 | 0.4175 | 0.375 | 0.5079 |
Figure 3The ROC and AUC of the predictive model on the newly recruited dataset.
Performance scores of the predictive model on the newly recruited dataset.
| ACC | TPR (recall) | FPR | Precision | F1 | |
|---|---|---|---|---|---|
| New set | 0.834 | 0.2 | 0.0437 | 0.4524 | 0.2676 |
| 0.7558 | 0.5 | 0.1994 | 0.3257 | 0.3929 | |
| 0.5499 | 0.815 | 0.5071 | 0.2331 | 0.3605 |
Figure 4The predicted result of a case study. (a) The predicted CHD risk (red bar) or non-CHD risk (green bar). (b) The predicted feature contributions for the input individual.