| Literature DB >> 36238557 |
Xiuqing Zhu1,2, Jinqing Hu1,2, Tao Xiao1,3, Shanqing Huang1,2, Yuguan Wen1,2, Dewei Shang1,2.
Abstract
Background and Aim: Therapeutic drug monitoring (TDM) has evolved over the years as an important tool for personalized medicine. Nevertheless, some limitations are associated with traditional TDM. Emerging data-driven model forecasting [e.g., through machine learning (ML)-based approaches] has been used for individualized therapy. This study proposes an interpretable stacking-based ML framework to predict concentrations in real time after olanzapine (OLZ) treatment.Entities:
Keywords: drug concentration; electronic health record; interpretability; machine learning; model-informed precision dosing; olanzapine; stacking; therapeutic drug monitoring
Year: 2022 PMID: 36238557 PMCID: PMC9552071 DOI: 10.3389/fphar.2022.975855
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.988
FIGURE 1Flowchart of this study.
Brief descriptions of 10 candidate regression models, including the related packages and their parameters (default settings).
| Model | Package | Key hyperparameters |
|---|---|---|
| ETR | scikit-learn 0.23.2 (from sklearn.ensemble import ExtraTreesRegressor) | “n_estimators”: 100, “max_depth”: None, “min_samples_leaf”: 1, ‘”min_samples_split”: 2, “max_features: auto” |
| RFR | scikit-learn 0.23.2 (from sklearn.ensemble import RandomForestRegressor) | “n_estimators”: 100, “max_depth”: None, “min_samples_leaf”: 1, “min_samples_split”: 2, ‘”max_features”: auto |
| BR | scikit-learn 0.23.2 (from sklearn.ensemble import BaggingRegressor) | “n_estimators”: 10, “max_depth”: 1.0, “max_samples”: 1.0 |
| GBR | scikit-learn 0.23.2 (from sklearn.ensemble import GradientBoostingRegressor) | “n_estimators”: 100, “max_depth”: 3, “min_samples_leaf”: 1, “min_samples_split”: 2, “alpha”: 0.9, “learning_rate”: 0.1, “max_features”: None |
| ABR | scikit-learn 0.23.2 (from sklearn.ensemble import AdaBoostRegressor) | “n_estimators”: 50, “loss”: linear, “learning_rate”: 1.0 |
| XGBR | xgboost 1.3.3 (from xgboost import XGBRegressor) | “n_estimators”: 100, “max_depth”: None, “min_child_weight”: None, “gamma”: None, “colsample_bytree”: None, “subsample”: None, “reg_alpha”: None, “reg_lambda”: None, “learning_rate”: None |
| SVR | scikit-learn 0.23.2 (from sklearn.svm import SVR) | “C”: 1.0, “gamma”: scale, “epsilon”: 0.1, “kernel”: rbf |
| KNR | scikit-learn 0.23.2 (from sklearn.neighbors import KNeighborsRegressor) | “weights”: uniform, “n_neighbors”: 5, “p”: 2 |
| DTR | scikit-learn 0.23.2 (from sklearn.tree import DecisionTreeRegressor) | “criterion”: squared_error, “max_depth”: None, “min_samples_leaf”: 1, “min_samples_split”: 2, “max_features”: None |
| MLR | scikit-learn 0.23.2 (from sklearn.linear_model import LinearRegression) | “fit_intercept”: True, “normalize”: deprecated, “n_jobs”: None, “copy_X”: True |
FIGURE 2Proposed architecture of stacking with stratified five-fold cross-validation.
A summary of features in the original dataset.
| Items | Features |
|---|---|
| General patient information (four features) | Gender, age, body weight (abbreviation: BW), height |
| Substance abuse (three features) | Smoking history, drinking history, history of other substance abuse |
| Diagnosis and disorder history (three features) | Diagnosis of schizophrenia, diagnosis of bipolar affective disorder, allergic history |
| Blood types (two features) | ABO blood type, Rh blood type |
| Phenotypes, genotypes, and gene polymorphisms (13 features) |
|
| Dosage regimens (three features) | Daily dose of OLZ [abbreviation: daily dose (OLZ)], dosage regimen of OLZ, daily dose frequency of OLZ |
| Combined drugs (280 features) | Valproic acid, risperidone, rifampicin, warfarin, clozapine, sertraline, fluvoxamine, perphenazine, carbamazepine, fluoxetine, etc. |
| Results of TDM measurements of other medications (24 features) | Concentrations of valproic acid [abbreviation: C (Valproic acid)], sertraline, fluoxetine, fluvoxamine, risperidone, oxcarbazepine, venlafaxine, clozapine, lamotrigine, perphenazine, aripiprazole, etc. |
| Information relating to biochemical analyses (140 features) | Time of TDM blood sampling after first administration of OLZ (abbreviation: time of blood sampling after first administration), alanine transaminase (abbreviation: ALT), serum potassium (abbreviation: K), serum sodium (abbreviation: Na), absolute monocyte count (abbreviation: MONO#), white blood cell count (abbreviation: WBC), red blood cell count (abbreviation: RBC), serum creatinine (abbreviation: Cr), uric acid (abbreviation: UA), creatine kinase (abbreviation: CK), C-reactive protein (abbreviation: CRP), total cholesterol (abbreviation: TC), etc. |
Distribution of partial continuous and categorical data in the original dataset (n = 2,142).
| Continuous data | Values [median (min–max)] | Missing [n (%)] | Categorical data | Values | Distribution [n (%)] |
|---|---|---|---|---|---|
| C (OLZ) (ng/ml) | 26.43 (2.00–127.31) | 0 (0%) | Gender | Male | 1235 (57.66%) |
| Age (years) | 47 (12–94) | 0 (0%) | Female | 907 (42.34%) | |
| BW (kg) | 61 (26–121) | 462 (21.57%) | Smoking history | Yes | 308 (14.38%) |
| Daily dose of OLZ (mg) | 15 (1.25–30) | 0 (0%) | No | 1347 (62.89%) | |
| Time of TDM blood sampling after first administration of OLZ (days) | 18 (0–483) | 0 (0%) | Unknown | 487 (22.73%) | |
| ALT (U/L) | 18 (3–632) | 1021 (47.67%) | Diagnosis of schizophrenia | Yes | 999 (46.64%) |
| Na (mmol/L) | 140.3 (121.0–150.0) | 1046 (48.83%) | No | 1143 (53.36%) | |
| K (mmol/L) | 3.99 (2.20–5.21) | 1046 (48.83%) | Co-administration of valproic acid | Yes | 778 (36.32%) |
| Cr (μmol/L) | 70 (28–203) | 1067 (49.81%) | No | 1364 (63.68%) | |
| MONO# (×109/L) | 0.49 (0.07–2.07) | 985 (45.99%) | |||
| C (Valproic acid) (mg/L) | 0.00 (0.00–150.00) | 231 (10.78%) |
FIGURE 3Frequency histograms (A) and Q–Q plots (B) of C (OLZ) and the log-transformed C (OLZ). (C) Chart of the matrix of missing data for 54 features, with fewer than 50% missing values in the original dataset.
An overall comparison of the candidate models in terms of the MAE at a 95% confidence interval (CI) in the derivation cohort of the transformed dataset.
| Candidate models | MAE in the training set | 95% CI of MAE in the training set (+/-) | MAE in the test set | 95% CI of MAE in the test set (+/-) |
|---|---|---|---|---|
| ETR | 2.005×e−7 | 1.773×e−7 | 0.060 | 0.011 |
| XGBR | 0.008 | 0.001 | 0.066 | 0.011 |
| RFR | 0.025 | 0.001 | 0.066 | 0.008 |
| BR | 0.028 | 0.001 | 0.071 | 0.011 |
| GBR | 0.059 | 0.001 | 0.071 | 0.009 |
| SVR | 0.063 | 0.001 | 0.076 | 0.009 |
| KNR | 0.067 | 0.001 | 0.086 | 0.016 |
| DTR | 2.005×e−7 | 1.773×e−7 | 0.086 | 0.013 |
| ABR | 0.080 | 0.003 | 0.086 | 0.010 |
| MLR | 0.055 | 0.001 | 2.767×e+8 | 5.208×e+8 |
FIGURE 4(A) Evolution of prediction errors for various compositions of feature subsets selected by the random forest-based sequential forward feature selection strategy. The corresponding 95% CIs of the MAE obtained by 10-fold cross-validation are represented by the colored areas. (B) Relative feature importance of the top 10 features.
FIGURE 5Heatmap of the Pearson correlations between the log-transformed C (OLZ) and the finally selected features.
FIGURE 6Comparison of the prediction performance of our models on the validation cohorts under different conditions in terms of the MAE, R 2, MSE, MRE, and IR.
Optimized hyperparameters of each base model.
| Base model | Hyperparameters |
|---|---|
| ETR | ‘n_estimators’: 251, ‘max_depth’: 30, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘max_features’: sqrt |
| XGBR | ‘n_estimators’: 271, ‘max_depth’: 8, ‘min_child_weight’: 4, ‘gamma’: 0, ‘colsample_bytree’: 1.0, ‘subsample’: 1.0, ‘learning_rate’: 0.19 |
| RFR | ‘n_estimators’: 102, ‘max_depth’: 23, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2 |
| BR | ‘n_estimators’: 106, ‘max_features’: 0.9, ‘max_samples’: 1.0 |
| GBR | ‘n_estimators’: 178, ‘max_depth’: 8, ‘min_samples_leaf’: 1, ‘min_samples_split’: 3 |
FIGURE 7Residuals plots: Plot of residuals versus the predicted values (A), and normal plot of the residuals (B).
FIGURE 8Assessing the forecasting performance of the proposed stacking model in terms of different ranges of C (OLZ) on the validation cohort: Histograms of various metrics in the context of the low and intermediate-to-high ranges (A), and a scatterplot of the relative error (RE)% versus the observed C (OLZ) in the intermediate-to-high range (B), where the red dotted lines denote the MRE, the colored areas denote the ±30% (green color) and ± 50% (yellow color) ranges of the RE, and the dotes labeled by sample ID 1174 and ID 1570 represent the maximum RE of prediction and the maximum observed C (OLZ), respectively. Interpretation of the results of prediction of samples ID 1174 (C) and ID 1570 (D) by the LIME algorithm using different random_state values. The four views for each sample, from left to right, show the predicted values of the explanation and the stacking models, the feature coefficients (the orange and blue colors depict positive and negative relationships, respectively), the feature values in this sample, and the local explanation plot of these features.
FIGURE 9(A) One-way PDPs for features included in the stacking model. (B) Two-way PDPs of the interactions between the daily dose (OLZ) and other features.
FIGURE 10Illustration of a general framework of the self-learning and optimization processes of the ML model for a more precise, individualized dose of OLZ.