| Literature DB >> 36028865 |
Yan Zhang1, Xiaoxu Zhang1, Jaina Razbek1, Deyang Li1, Wenjun Xia1, Liangliang Bao1, Hongkai Mao1, Mayisha Daken1, Mingqin Cao2.
Abstract
OBJECTIVE: The internal workings ofmachine learning algorithms are complex and considered as low-interpretation "black box" models, making it difficult for domain experts to understand and trust these complex models. The study uses metabolic syndrome (MetS) as the entry point to analyze and evaluate the application value of model interpretability methods in dealing with difficult interpretation of predictive models.Entities:
Keywords: Data mining; Machine learning; Metabolic syndrome; Model interpretability; Risk prediction model
Mesh:
Year: 2022 PMID: 36028865 PMCID: PMC9419421 DOI: 10.1186/s12902-022-01121-4
Source DB: PubMed Journal: BMC Endocr Disord ISSN: 1472-6823 Impact factor: 3.263
Fig. 1RFE cross-validation result curve.A point in the graph represents a variable, which is a different variable
Performance evaluation of MetS risk prediction in the test set
| Classification model | Accuracy(%) | Sensitivity(%) | Specificity(%) | Youden index | AUROC (95% |
|---|---|---|---|---|---|
| LR | 92.3 | 64.5 | 97.0 | 0.615 | 0.807(0.800 ~ 0.815)a |
| RF | 99.5 | 96.9 | 100 | 0.969 | 0.984(0.982 ~ 0.987)b |
| XGBoost | 99.7 | 98.5 | 99.9 | 0.984 |
aindicates AUROC values of the XGBoost model compared with LR, Z = 30.986,P< 0.001
bindicates AUROC values of the XGBoost model compared with RF, Z = 3.920,P< 0.001
Fig. 2ROC curve of MetS risk prediction model in the test set
Fig. 3Variable importance of XGBoost model based on training set showing the top 10 variables
Specific data for 4 subjects in the training set
| 3271 | 6506 | 25,392 | 10,557 | |
|---|---|---|---|---|
| Gender (0 = female, 1 = male) | 0 | 1 | 1 | 0 |
| Age (years) | 26 | 61 | 45 | 59 |
| eosinophil percentage | 8.3 | 3.3 | 2.6 | 1.6 |
| erythrocyte distribution width coefficient of variation | 12.4 | 14.4 | 12.9 | 12.2 |
| creatinine (μmoI/L) | 79 | 52 | 54 | 72 |
| uric acid (μmoI/L) | 386 | 260 | 304 | 373 |
| glutamyl transpeptidase (U/L) | 32 | 44 | 16 | 48 |
| alkaline phosphatase (U/L) | 48 | 115 | 56 | 69 |
| previous fatty liver (0 = no, 1 = yes) | 0 | 1 | 1 | 1 |
| previous hypertension (0 = no, 1 = yes) | 0 | 0 | 0 | 1 |
| previous diabetes (0 = no, 1 = yes) | 0 | 0 | 0 | 0 |
| WC (cm) | 72 | 91 | 84 | 90 |
| SBP (mmHg) | 139 | 154 | 122 | 169 |
| DBP (mmHg) | 71 | 85 | 83 | 109 |
| FPG (mmol/L) | 4.4 | 5.13 | 5.18 | 6.38 |
| TC (mmol/L) | 3.76 | 5.16 | 5.19 | 6.22 |
| TG (mmol/L) | 0.77 | 2.49 | 1.79 | 2.98 |
| HDL-C (mmol/L) | 1.54 | 1.22 | 1.67 | 1.33 |
| MetS (0 = no, 1 = yes) | 0 | 1 | 0 | 1 |
Fig. 4Visualized heat map of the variable combination of four medical examiners (training set) based on LIME. The direction of feature action is shown by color, blue (feature weight > 0) means the feature supports the outcome variable, red (feature weight < 0) means the feature opposes the outcome variable; the color shade refers to the degree of influence of the feature on the outcome variable, and the dark color indicates that the feature has a large influence on the metabolic syndrome
Fig. 5
Interpretation of individual prediction (training set) based on LIME diagram. The length of the bars is proportional to the strength of the characteristic effect
Fig. 6PDP diagram of important variables in the XGBoost model (training set)