| Literature DB >> 35885524 |
Jiunn-Diann Lin1,2, Dee Pei3,4, Fang-Yu Chen3,4, Chung-Ze Wu1,2, Chieh-Hua Lu5, Li-Ying Huang3,4, Chun-Heng Kuo3,4, Shi-Wen Kuo6, Yen-Lin Chen7.
Abstract
Type 2 diabetes mellitus (T2DM) patients have a high risk of coronary artery disease (CAD). Thallium-201 myocardial perfusion scan (Th-201 scan) is a non-invasive and extensively used tool in recognizing CAD in clinical settings. In this study, we attempted to compare the predictive accuracy of evaluating abnormal Th-201 scans using traditional multiple linear regression (MLR) with four machine learning (ML) methods. From the study, we can determine whether ML surpasses traditional MLR and rank the clinical variables and compare them with previous reports.In total, 796 T2DM, including 368 men and 528 women, were enrolled. In addition to traditional MLR, classification and regression tree (CART), random forest (RF), stochastic gradient boosting (SGB) and eXtreme gradient boosting (XGBoost) were also used to analyze abnormal Th-201 scans. Stress sum score was used as the endpoint (dependent variable). Our findings show that all four root mean square errors of ML are smaller than with MLR, which implies that ML is more precise than MLR in determining abnormal Th-201 scans by using clinical parameters. The first seven factors, from the most important to the least are:body mass index, hemoglobin, age, glycated hemoglobin, Creatinine, systolic and diastolic blood pressure. In conclusion, ML is not inferior to traditional MLR in predicting abnormal Th-201 scans, and the most important factors are body mass index, hemoglobin, age, glycated hemoglobin, creatinine, systolic and diastolic blood pressure. ML methods are superior in these kinds of studies.Entities:
Keywords: coronary artery disease; machine learning; thallium-201 myocardial perfusion scan; type 2 diabetes mellitus
Year: 2022 PMID: 35885524 PMCID: PMC9324130 DOI: 10.3390/diagnostics12071619
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Flowchart of sample selection from the Cardinal Tien Hospital Diabetes Study Cohort.
Variable definition.
| Variables | Description | Unit |
|---|---|---|
| V1: Sex | Male/Female | - |
| V2: Age | Patient age | year |
| V3: Body mass index | Body mass index | Kg/m2 |
| V4: Duration of diabetes | Duration of diabetes | year |
| V5: Smoking | No/Yes | - |
| V9: Glycated hemoglobin | HbA1c (Glycated hemoglobin) | % |
| V10: Triglyceride | Triglyceride baseline | mg/dL |
| V11:High density lipoprotein cholesterol | High-Density Lipoprotein Cholesterol | mg/dL |
| V12: Low density lipoprotein cholesterol | Low-Density Lipoprotein Cholesterol | mg/dL |
| V13: Alanine aminotransferase baseline | Alanine aminotransferase | U/L |
| V14: Creatinine | Creatinine | mg/dL |
| V6: Systolic blood pressure | Systolic blood pressure | mmHg |
| V7: Diastolic blood pressure | Diastolic blood pressure b | mmHg |
| V8: Hemoglobin | Hb | |
| V15: Microalbuminuria | Urine albumin to creatinine ratio = microalbumin (mg/dL)/urine creatinine(mg/dL) | mg/g |
Figure 2Proposed scheme in the study.
The Equation of Performance Metrics.
| Metrics | Calculation * |
|---|---|
| SMAPE |
|
| RAE |
|
| RRSE |
|
| RMSE |
|
SMAPE, Symmetric Mean Absolute Percentage Error; RAE, Relative Absolute Error; RRSE, Root- Relative Squared Error; RMSE, Root Mean Squared Error. * p and q represent predicted and actual values, respectively; m is the total number of data.
The demographics of enrolled type 2 diabetes patients.
| Variables | Mean ± SD | N |
|---|---|---|
| Age | 68.09 ± 10.07 | 796 |
| Body mass index | 26.17 ± 3.89 | 588 |
| Duration of diabetes | 13.81 ± 8.02 | 589 |
| Fasting plasma glucose | 150.09 ± 46.05 | 591 |
| Glycated hemoglobin | 7.68 ± 1.39 | 590 |
| Triglyceride | 123.65 ± 79.32 | 586 |
| High-density lipoprotein cholesterol | 49.53 ± 14.98 | 524 |
| Low-density lipoprotein cholesterol | 95.52 ± 26.18 | 588 |
| Alanine aminotransferase baseline | 23.66 ± 13.60 | 588 |
| Creatinine | 1.16 ± 0.99 | 587 |
| Systolic blood pressure | 131.08± 15.36 | 514 |
| Diastolic blood pressure | 73.35 ± 10.09 | 514 |
| Microalbuminuria | 196.53± 723.55 | 551 |
|
|
| |
| Sex | 796 | |
| Male | 369 (53.64%) | |
| Female | 427 (46.36%) | |
| Smoking | 329 | |
| No | 212 (64.44%) | |
| Yes | 117 (35.56%) |
SD, standard deviation.
The performance of multiple linear regression (MLR) and different machine learning methods.
| Mean (SD) | SMAPE | RAE | RRSE | RMSE |
|---|---|---|---|---|
| MLR | 1.120(0.04) | 1.049(0.06) | 1.054(0.03) | 7.760(0.39) |
| RF | 1.070(0.03) | 1.043(0.05) | 1.042(0.02) | 7.683(0.48) |
| SGB | 1.074(0.03) | 1.026(0.05) | 1.039(0.03) | 7.661(0.45) |
| CART | 1.055(0.04) | 1.031(0.06) | 1.049(0.03) | 7.736(0.56) |
| XGBoost | 1.058(0.04) | 1.017(0.05) | 1.032(0.02) | 7.613(0.58) |
RF, random forest; SGB, stochastic gradient boosting; CART, classification and regression tree; XGBoost, eXtreme gradient boosting; SMAPE, symmetric mean absolute percentage error; RAE, relative absolute error; RRSE, root relative square error; RMSE, root mean square error. The numbers in the parentheses are standard errors.
Importance ranking of each risk factor using the four convincing methods.
| Variables | RF | SGB | CART | XGBoost | Average |
|---|---|---|---|---|---|
| Sex | 5 | 14 | 6 | 14 | 9.75 |
| Age | 2 | 4 | 3 | 15 | 6 |
| Body mass index | 4 | 1 | 1 | 6 | 3 |
| Duration of diabetes | 1 | 13 | 11 | 8 | 8.25 |
| Smoking | 6 | 15 | 15 | 1 | 9.25 |
| Hemoglobin | 8 | 6 | 4 | 2 | 5 |
| Glycated hemoglobin | 9 | 2 | 5 | 10 | 6.5 |
| Triglyceride | 10 | 10 | 8 | 12 | 10 |
| High density lipoprotein cholesterol | 11 | 7 | 10 | 5 | 8.25 |
| Low density lipoprotein cholesterol | 12 | 5 | 12 | 11 | 10 |
| Alanine aminotransferase baseline | 13 | 8 | 13 | 13 | 11.75 |
| Creatinine | 14 | 3 | 2 | 9 | 7 |
| Systolic blood pressure | 7 | 11 | 9 | 4 | 7.75 |
| Diastolic blood pressure | 3 | 12 | 14 | 3 | 8 |
| Microalbuminuria | 15 | 9 | 9 | 9 | 9.5 |
RF, random forest; SGB, stochastic gradient boosting; CART, classification and regression tree; XGBoost, eXtreme gradient boosting.
Figure 3Integrated importance ranking of all risk factors.