| Literature DB >> 33344380 |
Yasutaka Kuniyoshi1, Haruka Tokutake1, Natsuki Takahashi1, Azusa Kamura1, Sumie Yasuda1, Makoto Tashiro1.
Abstract
We constructed an optimal machine learning (ML) method for predicting intravenous immunoglobulin (IVIG) resistance in children with Kawasaki disease (KD) using commonly available clinical and laboratory variables. We retrospectively collected 98 clinical records of hospitalized children with KD (2-109 months of age). We found that 20 (20%) children were resistant to initial IVIG therapy. We trained three ML techniques, including logistic regression, linear support vector machine, and eXtreme gradient boosting with 10 variables against IVIG resistance. Moreover, we estimated the predictive performance based on nested 5-fold cross-validation (CV). We also selected variables using the recursive feature elimination method and performed the nested 5-fold CV with selected variables in a similar manner. We compared ML models with the existing system regardless of their predictive performance. Results of the area under the receiver operator characteristic curve were in the range of 0.58-0.60 in the all-variable model and 0.60-0.75 in the select model. The specificities were more than 0.90 and higher than those in existing scoring systems, but the sensitivities were lower. Three ML models based on demographics and routine laboratory variables did not provide reliable performance. This is possibly the first study that has attempted to establish a better predictive model. Additional biomarkers are probably needed to generate an effective prediction model.Entities:
Keywords: area under the curve; extreme gradient boosting; logistic regression; nested cross-validation; predictive model; support vector machine
Year: 2020 PMID: 33344380 PMCID: PMC7744372 DOI: 10.3389/fped.2020.570834
Source DB: PubMed Journal: Front Pediatr ISSN: 2296-2360 Impact factor: 3.418
Figure 1Flow chart of the 5-fold nested cross-validation. Vali., validation.
Comparison of clinical and laboratory characteristics in IVIG-responsive and -resistant patients.
| Age, months of age, median (IQR) | 22 (9–37) | 26 (17–30) | 0.49 |
| Illness days with IVIG administration, days, median (IQR) | 5 (4–6) | 4 (3.8–5) | 0.16 |
| Gender, male, | 40 (51) | 13 (65) | 0.40 |
| White blood cell count, × 102/mm3, median (IQR) | 151 (121–175) | 144 (113–179) | 0.96 |
| Neutrophil, %, median (IQR) | 66 (59–76) | 73 (67–79) | 0.12 |
| Hematocrit, %, median (IQR) | 34 (32–36) | 35 (33–36) | 0.65 |
| Platelet count, × 104/mm3, median (IQR) | 35 (28–42) | 32 (27–38) | 0.41 |
| Aspartate aminotransferase, IU/L, median (IQR) | 30 (24–43) | 96 (34–308) | <0.001 |
| Alanine aminotransferase, IU/L, median (IQR) | 20 (12–32) | 75 (20–232) | 0.004 |
| Total bilirubin, mg/dl, median (IQR) | 0.53 (0.41–0.69) | 0.81 (0.50–1.37) | 0.36 |
| Sodium, mmol/L, median (IQR) | 133 (131–134) | 132 (131–134) | 0.88 |
| Albumin, g/dl, median (IQR) | 3.3 (3.1–3.6) | 3.4 (3.1–3.6) | 0.85 |
| C-reactive protein, mg/dl, median (IQR) | 6.3 (3.8–9.3) | 7.4 (5.3–10.6) | 0.26 |
| Coronary artery abnormalities, | 5 (6.4) | 6 (30) | × 0.007 |
IVIG, intravenous immunoglobulin; SD, standard deviation; IQR; interquartile range.
Data are analyzed by Mann–Whitney U tests for continuous variables and Chi-square tests for categorical variables.
Prediction performances of the three machine learning models and existing scoring systems.
| All-variable model | LR | All 10 variables | 0.59 ± 0.052 | 0.22 ± 0.055 | 0.94 ± 0.017 | 0.79 ± 0.021 | |
| Linear SVM | All 10 variables | 0.58 ± 0.040 | 0.20 ± 0.059 | 0.95 ± 0.014 | 0.79 ± 0.018 | ||
| XGBoost | All 10 variables | 0.60 ± 0.048 | 0.26 ± 0.095 | 0.99 ± 0.021 | 0.78 ± 0.026 | ||
| Select-variable model | LR | Model 1 | AST | 0.75 ± 0.011 | 0.15 ± 0.039 | 0.97 ± 0.006 | 0.79 ± 0.012 |
| Model 2 | WBC, AST | 0.67 ± 0.027 | 0.16 ± 0.037 | 0.97 ± 0.008 | 0.80 ± 0.011 | ||
| Model 3 | Day, WBC, PLT, AST, CRP | 0.67 ± 0.022 | 0.19 ± 0.049 | 0.96 ± 0.049 | 0.80 ± 0.010 | ||
| SVM | Model 1 | AST | 0.75 ± 0.011 | 0.14 ± 0.037 | 0.96 ± 0.012 | 0.79 ± 0.015 | |
| Model 2 | WBC, Ht, PLT, AST | 0.66 ± 0.035 | 0.16 ± 0.039 | 0.97 ± 0.010 | 0.80 ± 0.010 | ||
| Model 3 | WBC, AST | 0.68 ± 0.024 | 0.14 ± 0.035 | 0.97 ± 0.006 | 0.79 ± 0.007 | ||
| XGBoost | Model 1 | Na, AST | 0.65 ± 0.032 | 0.28 ± 0.078 | 1.00 ± 0.008 | 0.78 ± 0.021 | |
| Model 2 | Age, Day, Ht, Na, AST, CRP | 0.61 ± 0.036 | 0.31 ± 0.073 | 0.99 ± 0.008 | 0.79 ± 0.025 | ||
| Model 3 | Age, Day, Ht, Na, AST, Alb, CRP | 0.60 ± 0.035 | 0.33 ± 0.078 | 0.99 ± 0.008 | 0.79 ± 0.022 | ||
| Existing scoring systems | Kobayashi (8) system ( | NA | 0.70 | 0.62 | 0.63 | ||
| Egami (9) system ( | NA | 0.55 | 0.81 | 0.76 | |||
| Sano (10) system ( | NA | 0.41 | 0.96 | 0.81 | |||
AUC, area under the receiver operator characteristic curve; LR, logistic regression; SVM, support vector machine; XGB, eXtreme gradient boosting; Age, months of age; Day, illness days with IVIG administration; WBC, white blood cell count; Ht, hematocrit; PLT, platelet count; AST, aspartate aminotransferase; Na, sodium; Alb, albumin; CRP, C-reactive protein; NA, not applicable.