| Literature DB >> 34782634 |
Jian Yu1, Yan Zhou2,3, Qiong Yang1, Xiaoling Liu1, Lili Huang1, Ping Yu1, Shuyuan Chu4.
Abstract
Carotid atherosclerosis (CAS) is a risk factor for cardiovascular and cerebrovascular events, but duplex ultrasonography isn't recommended in routine screening for asymptomatic populations according to medical guidelines. We aim to develop machine learning models to screen CAS in asymptomatic adults. A total of 2732 asymptomatic subjects for routine physical examination in our hospital were included in the study. We developed machine learning models to classify subjects with or without CAS using decision tree, random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM) and multilayer perceptron (MLP) with 17 candidate features. The performance of models was assessed on the testing dataset. The model using MLP achieved the highest accuracy (0.748), positive predictive value (0.743), F1 score (0.742), area under receiver operating characteristic curve (AUC) (0.766) and Kappa score (0.445) among all classifiers. It's followed by models using XGBoost and SVM. In conclusion, the model using MLP is the best one to screen CAS in asymptomatic adults based on the results from routine physical examination, followed by using XGBoost and SVM. Those models may provide an effective and applicable method for physician and primary care doctors to screen asymptomatic CAS without risk factors in general population, and improve risk predictions and preventions of cardiovascular and cerebrovascular events in asymptomatic adults.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34782634 PMCID: PMC8593081 DOI: 10.1038/s41598-021-01456-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Subjects characteristics in CAS group and non-CAS group.
| Variables | CAS group | Non-CAS group | |
|---|---|---|---|
| Gender (male) | 686 (72.8%) | 1058 (59.1%) | < 0.001 |
| Age (years) | 56.3 ± 7.4 | 49.4 ± 6.8 | < 0.001 |
| BMI (kg/m2) | 25.1 ± 3.0 | 24.7 ± 3.1 | 0.003 |
| SP (mmHg) | 132 ± 20 | 123 ± 18 | < 0.001 |
| DP (mmHg) | 80 ± 12 | 76 ± 12 | < 0.001 |
| AST (U/L) | 20.87 ± 7.86 | 20.22 ± 7.27 | 0.031 |
| ALT (U/L) | 22.94 ± 12.53 | 22.35 ± 13.38 | 0.269 |
| BUN (mmol/L) | 5.1 ± 1.5 | 4.7 ± 1.2 | < 0.001 |
| Scr (μmol/L) | 81.64 ± 21.80 | 76.81 ± 16.10 | < 0.001 |
| TG (mmol/L) | 1.93 ± 1.87 | 1.72 ± 1.48 | 0.003 |
| TC (mmol/L) | 4.89 ± 0.87 | 4.74 ± 0.83 | < 0.001 |
| LDL-C (mmol/L) | 3.29 ± 0.81 | 3.14 ± 0.81 | < 0.001 |
| HDL-C (mmol/L) | 1.24 ± 0.35 | 1.29 ± 0.33 | < 0.001 |
| UA (μmol/L) | 376.5 ± 96.9 | 352.7 ± 93.8 | < 0.001 |
| HCY (μmol/L) | 13.22 ± 5.86 | 11.70 ± 5.31 | < 0.001 |
| FPG (mmol/L) | 5.88 ± 1.68 | 5.47 ± 1.21 | < 0.001 |
| NAFID (Yes) | 285 (30.3%) | 417 (23.3%) | < 0.001 |
CAS, carotid atherosclerosis; BMI, body mass index; SP, systolic blood pressure; DP, diastolic blood pressure; AST, serum aspartate aminotransferase; ALT, serum alanine aminotransferase; BUN, blood urea nitrogen; Scr, serum creatinine; TG, triglyceride; TC, total cholesterol; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; UA, blood uric acid; HCY, homocysteine; FPG, fasting plasma glucose; NAFLD, nonalcoholic fatty liver disease.
Model performance in testing data according to ranking.
| Model | Accuracy | PPV | F1 score | Kappa score | AUC (95% CI) |
|---|---|---|---|---|---|
| MLP | 0.748 | 0.743 | 0.742 | 0.445 | 0.766 (0.754–0.769) |
| XGBoost | 0.741 | 0.736 | 0.735 | 0.429 | 0.763 (0.724–0.764) |
| SVM | 0.744 | 0.739 | 0.733 | 0.413 | 0.757 (0.718–0.757) |
| Random forest | 0.730 | 0.724 | 0.722 | 0.401 | 0.752 (0.734–0.766) |
| Decision tree | 0.726 | 0.723 | 0.706 | 0.354 | 0.741 (0.699–0.749) |
PPV, positive predictive value; AUC, area under curve; CI, confidence interval; MLP, multilayer perceptron; SVM, support vector machine; XGBoost, extreme gradient boosting.
Figure 1Receiver operator characteristic curves. (A) Decision tree. (B) Random forest. (C) Extreme gradient boosting. (D) Support vector machine. (E) Multilayer perceptron.
Figure 2Decision tree. DP, diastolic blood pressure; SP, systolic blood pressure; HCY, homocysteine; Y = Yes; N = No.
Figure 3Features importance in random forest model. FPG, fasting plasma glucose; SP, systolic blood pressure; HCY, homocysteine; UA, blood uric acid; TC, total cholesterol; DP, diastolic blood pressure; BUN, blood urea nitrogen; AST, serum aspartate aminotransferase; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; BMI, body mass index; Scr, serum creatinine; TG, triglyceride; ALT, serum alanine aminotransferase; NAFLD, nonalcoholic fatty liver disease.
Figure 4Features importance in XGBoost model. f1: age; f5: Dp; f14: HDL-C; f16: HCY; f4: Sp; f15: FPG; f0: gender.