| Literature DB >> 35360734 |
Pan Ma1, Ruixiang Liu1, Wenrui Gu1, Qing Dai1, Yu Gan1, Jing Cen1, Shenglan Shang2, Fang Liu1, Yongchuan Chen1.
Abstract
Objective: To establish an optimal model to predict the teicoplanin trough concentrations by machine learning, and explain the feature importance in the prediction model using the SHapley Additive exPlanation (SHAP) method.Entities:
Keywords: SHAP; algorithm; machine learning; model explanation; precision medicine; prediction model; teicoplanin
Year: 2022 PMID: 35360734 PMCID: PMC8963816 DOI: 10.3389/fmed.2022.808969
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1The workflow of data processing and algorithm selection.
The description of the study samples.
|
|
|
| |
|---|---|---|---|
|
|
| ||
| Teicoplanin trough concentration (mg/L) | 13.32 (8.87,19.12) | 11.291 (7.62,22.11) | 0.603 |
| Teicoplanin administration | |||
| Loading dose (mg/kg) | 7.91 (6.67, 12.18) | 7.45 (6.09, 12.23) | 0.149 |
| Times of loading dose | 0.747 | ||
| <3 | 68 (30.49) | 14 (25) | |
| 3-5 | 149 (66.82) | 41 (73.21) | |
| > 5 | 6 (2.69) | 1 (1.79) | |
| Loading intervals (h) | 0.924 | ||
| 12 | 201 (90.13) | 52 (92.86) | |
| 24 | 14 (6.28) | 3 (5.36) | |
| Others | 8 (3.59) | 1 (1.79) | |
| Maintenance dose (mg/kg) | 8 (6.09, 12.18) | 7.08 (5.64, 12.23) | 0.086 |
| Maintenance intervals (h) | 0.249 | ||
| 12 | 13 (5.83) | 2 (3.57) | |
| 24 | 196 (87.89) | 47 (83.93) | |
| Others | 14 (6.28) | 7 (12.5) | |
| Total duration of treatment (day) | 5 (3, 8) | 4 (3, 6.75) | 0.079 |
| Demographic information | |||
| Age (years) | 53 (40, 66) | 52.5 (38.25, 65.75) | 0.823 |
| Height (cm) | 165 (155.07, 168.27) | 165 (155.07, 170) | 0.782 |
| Weight (kg) | 59 (53.44, 65.7) | 65 (53.44, 73.75) | 0.059 |
| Gender, male (n, %) | 80 (35.87%) | 25 (44.64%) | 0.226 |
| APACHE II | 24(20, 28) | 23(19, 26) | 0.053 |
| Laboratory parameters | |||
| ALB (g/L) | 32.32 ± 4.91 | 32.40 ± 4.59 | 0.911 |
| eGFR (ml/min/L) | 97.06 (61.84, 120.57) | 86.63 (50.61, 111.41) | 0.063 |
| Cys-C (mg/L) | 1.52 (0.97, 1.7) | 1.6 (1.05, 1.76) | 0.453 |
| CLcr | 85.55 (55.43, 125.25) | 78.23 (40.22, 123.97) | 0.317 |
| AST (IU/L) | 41.7 (22.5, 79.9) | 34.2 (19.63, 81.6) | 0.754 |
| ALT (IU/L) | 26.5 (14, 49.2) | 24.8 (10.78, 69.25) | 0.695 |
| TBIL (umol/L) | 20.4 (12.4, 48.1) | 19.25 (13.93, 50.05) | 0.984 |
| NEU% | 77.1 (64.5,87.5) | 79.5 (67.53,89.1) | 0.457 |
| PLT (109/L) | 83 (37, 196) | 135.5 (40, 252.25) | 0.142 |
| Concomitant therapy | |||
| ECMO ( | 11 (4.93%) | 2 (3.57%) | 1.000 |
| CRRT ( | 61 (27.35%) | 16 (28.57%) | 0.855 |
| Co-medication | 17 (7.62%) | 3 (5.36%) | 0.774 |
| Concomitant diseases | |||
| AML ( | 30 (13.45%) | 8 (14.29%) | 0.871 |
| Hypoproteinemia ( | 143 (64.13%) | 34 (60.71%) | 0.636 |
| Sepsis, ( | 80 (35.87%) | 14 (25%) | 0.110 |
APACHE II, acute physiology and chronic health evaluation II; ALB, albumin; Cys-C, cystatin C; eGFR, estimated glomerular clearance; CLcr, creatinine clearance rate; AST, aspartate aminotransferase; ALT, alanine aminotransferase; TBIL, total bilirubin; NEU%, the percentage of neutrophils; PLT, platelet count; ECMO, extracorporeal membrane oxygenation; CRRT, continuous renal replacement therapy, AML, acute myeloid leukemia.
Measurement data were presented as median and interquartile range (IQR) for non-normal distribution variables and mean ± SD for normal distribution variables. Categorical data were expressed as n (%).
Mann–Whitney U test.
Independent t-test.
Chi-squared test.
Fisher's exact test.
Creatinine clearance was calculated by the Cockcroft formula. CLcr = (140 – age [years]) × weight (WT, kg) × 0.85 (if female)/0.818 × SCr (μmol/L).
Comedication included Furosemide, Amikacin Sulfate, Cyclosporine, Isepamicin Sulfate, Amphotericin B liposome and Colistin Sulfate.
The model performance metrics of six different algorithms.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| SVR | 0.676 | 3.868 | 26.071 | 76.79% | 62.50% |
| GBRT | 0.670 | 4.054 | 26.568 | 69.64% | 62.50% |
| RF | 0.656 | 4.410 | 27.683 | 62.50% | 48.21% |
| Bagging | 0.652 | 4.440 | 28.059 | 64.29% | 44.64% |
| Adaboost | 0.610 | 4.743 | 31.386 | 55.36% | 48.21% |
| XGBoost | 0.551 | 4.630 | 36.186 | 60.71% | 55.36% |
SVR, Support Vector Regression; GBRT, Gradient Boosted Regression Trees; RF, Random Forest; Bagging, Boostrap aggregating; Adaboost, Adaptive Boosting; XGBoost, eXtreme Gradient Boosting.
Absolute accuracy, the predict trough concentration was within ± 5 mg/l of the observed trough concentration.
Relative accuracy, the predict trough concentration was within ± 30% of the observed trough concentration.
The model performance metrics of the ensemble model.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Testing group | 0.720 | 3.628 | 22.571 | 83.93% | 60.71% |
| Validation group | 0.686 | 3.196 | 18.260 | 77.78% | 88.89% |
Absolute accuracy, the predict trough concentration was within ± 5 mg/l of the observed trough concentration.
Relative accuracy, the predict trough concentration was within ± 30% of the observed trough concentration.
Figure 2Comparison of predicted and observed value. (A) The blue dots represented testing sample, with observed values on the x-axis and predicted values on the y-axis. The blue dots between the dotted lines indicated that the predict values were within ± 30% of the observed values. (B) The blue dots represented testing sample, with observed values on the x-axis and predicted values on the y-axis. The blue dots between the dotted lines indicated that the predict values were within ± 5 mg/l of the observed values. (C) The red dots indicated the observed values, and blue dots indicated the predicted values. The green shade represented within ± 30% of the observed values, and the red shade represented within ± 5 mg/l of the observed values.
Figure 3The model's interpretation by SHapley Additive exPlanation (SHAP). eGFR, estimated glomerular clearance; CLcr, creatinine clearance rate; ALB, albumin; Cys-C, cystatin C; APACHE II, Acute Physiology and Chronic Health Evaluation II; CRRT, continuous renal replacement therapy; ALT, alanine aminotransferase; PLT, platelet count; NEU%, the percentage of neutrophils. (A) The SHAP summary plot of the top 20 relevant variables. The SHAP value (x-axis) is a unified index responding to the effect of a variable in the ensemble model. In each variable importance row, all the patients' attributes to the outcome were plotted using different colored dots, in which the red (blue) dots represent high (low) values. The higher the SHAP value of a variable, the higher teicoplanin trough concentration. (B) The importance ranking of the top 20 variables according to the mean (|SHAP value|).
Figure 4SHAP dependence plot of model. eGFR, estimated glomerular filtration rate; CLcr, creatinine clearance rate; ALB, albumin; Cys-C, cystatin C. The SHAP dependence plot showed how the relevant variable affected the output of the ensemble prediction model. SHAP values for specific relevant variable exceed 0, representing an increased teicoplanin trough concentration.