| Literature DB >> 35893785 |
Lina Keutzer1, Huifang You1, Ali Farnoud2, Joakim Nyberg3, Sebastian G Wicha4, Gareth Maher-Edwards5, Georgios Vlasakakis5, Gita Khalili Moghaddam5,6, Elin M Svensson3,7, Michael P Menden2,8,9, Ulrika S H Simonsson1.
Abstract
Pharmacometrics (PM) and machine learning (ML) are both valuable for drug development to characterize pharmacokinetics (PK) and pharmacodynamics (PD). Pharmacokinetic/pharmacodynamic (PKPD) analysis using PM provides mechanistic insight into biological processes but is time- and labor-intensive. In contrast, ML models are much quicker trained, but offer less mechanistic insights. The opportunity of using ML predictions of drug PK as input for a PKPD model could strongly accelerate analysis efforts. Here exemplified by rifampicin, a widely used antibiotic, we explore the ability of different ML algorithms to predict drug PK. Based on simulated data, we trained linear regressions (LASSO), Gradient Boosting Machines, XGBoost and Random Forest to predict the plasma concentration-time series and rifampicin area under the concentration-versus-time curve from 0-24 h (AUC0-24h) after repeated dosing. XGBoost performed best for prediction of the entire PK series (R2: 0.84, root mean square error (RMSE): 6.9 mg/L, mean absolute error (MAE): 4.0 mg/L) for the scenario with the largest data size. For AUC0-24h prediction, LASSO showed the highest performance (R2: 0.97, RMSE: 29.1 h·mg/L, MAE: 18.8 h·mg/L). Increasing the number of plasma concentrations per patient (0, 2 or 6 concentrations per occasion) improved model performance. For example, for AUC0-24h prediction using LASSO, the R2 was 0.41, 0.69 and 0.97 when using predictors only (no plasma concentrations), 2 or 6 plasma concentrations per occasion as input, respectively. Run times for the ML models ranged from 1.0 s to 8 min, while the run time for the PM model was more than 3 h. Furthermore, building a PM model is more time- and labor-intensive compared with ML. ML predictions of drug PK could thus be used as input into a PKPD model, enabling time-efficient analysis.Entities:
Keywords: feature selection; machine learning; pharmacokinetics; pharmacometrics; population pharmacokinetics; rifampicin; simulation
Year: 2022 PMID: 35893785 PMCID: PMC9330804 DOI: 10.3390/pharmaceutics14081530
Source DB: PubMed Journal: Pharmaceutics ISSN: 1999-4923 Impact factor: 6.525
Figure 1Overall proposed workflow. Blue panels indicate pharmacometrics and yellow machine learning.
Figure 2Illustration of a two-compartment pharmacokinetic model for a fictive drug. AGI, amount of drug in the gastrointestinal tract; k01, absorption rate constant; k10, elimination rate constant; k12, rate constant describing distribution from central to peripheral compartment; k21, rate constant describing distribution from peripheral to central compartment; V1, volume of central compartment (e.g., blood); V2, volume of peripheral compartment (e.g., brain tissue). Drug clearance is expressed as .
Overview of terminology commonly used by the pharmacometrics and/or machine learning community.
| Term | Description | |
|---|---|---|
| PM | ML | |
| Covariates | Features | Both terms describe predictors. Features are all input variables used to train a model. Covariates are predictors explaining variability between patients in addition to the variables already included in the structural pharmacometrics model. |
| Objective function value (OFV) | Loss | The OFV is one of the main metrics for model evaluation in pharmacometrics model building. It is proportional to −2*log likelihood that the model parameter values occur from the data [ |
| Build/Fit a model | Train a model | Both terms define the process of developing a model by determining model parameters that describe the input data in order to reach a predefined objective. |
| Validation dataset | Validation dataset | In PM, the term validation dataset is often used for external validation. In ML, the term is commonly used for the data that are held back for internal validation to evaluate model performance during training. |
| Overparameterization | Overfitting | In PM, a model can be overparameterized, meaning too many parameters are estimated in relation to the amount of information, leading to minimization issues. Overfitting in ML describes a phenomenon where the model has been trained to fit the training data too well. The model is forced to predict in a very narrow direction, which may result in poor predictive ability. |
| Model parameters | Model parameters | Even though both communities use the same term, model parameters in PM are different from parameters in ML. Model parameters in PM describe biological or pharmacological processes, such as drug clearance, drug distribution volume or rate of absorption. These parameters are directly interpretable. In ML, on the other hand, model parameters are mathematical parameters learnt during the model training process and are part of the final model describing the data. They do not provide biological interpretation in the first instance at least. |
| Model averaging | Ensemble model | An ensemble model combines multiple ML algorithms, which in most cases leads to better predictive performance compared to single algorithms [ |
| Shrinkage | Shrinkage | The term “shrinkage” has a different meaning in the PM and ML communities. In PM, shrinkage describes overparameterization, where 0 indicates very informative data and no overfit, and 1 uninformative data and overfitting. In ML, shrinkage methods in different ML models reduce the possibility of overfitting or underfitting by providing a trade-off between bias and variance. |
| Bootstrapping | Bootstrapping | Describes a random resampling method with replacement. In PM, it is used during model development and evaluation for estimation of the model performance. In ML, bootstrapping is part of some algorithms, such as XGBoost or Random Forest, and is also used to estimate the model’s predictive performance. |
| Cross-validation | Cross-validation | In PM, cross-validation is used occasionally, for example, in covariate selection procedures in order to assess the true alpha error. In ML, cross-validation is commonly applied to prevent overfitting and to obtain robust predictions. Cross-validation describes the process of splitting the data into a training dataset and a test dataset. The training dataset is used for model development and the test dataset for external model evaluation. In n-fold cross-validation, the data are split into n non-overlapping subsets, where n − 1 subsets are used for training and the left-out subset for evaluation. This procedure is repeated until all subsets have been used for model evaluation. Model performance is then computed across all test sets [ |
| - | Holdout/test dataset | Describes the test/unseen dataset used for external validation. It is of great importance that the holdout/test data is not used for model training or hyperparameter tuning in order not to overestimate the model’s predictive performance [ |
| - | Oversampling/Upsampling | Oversampling is an approach used to deal with highly imbalanced data. Data in areas with sparse data are resampled or synthesized using different methods, for example, Synthetic Minority Oversampling Technique (SMOTE) [ |
| Empirical Bayes Estimates (EBEs) | Bayesian optimization | EBEs in PM are the model parameter estimates for an individual, estimated based on the final model parameters as well as observed data using Bayesian estimation [ |
| Typical value | Typical value | The typical value in PM is the most likely parameter estimate for the whole population given a set of covariates. It could, e.g., be the drug clearance estimate that best summarizes the clearance of the whole population. In ML, the typical value in unsupervised learning, for example, is the center of a cluster (e.g., k-means). |
| Inter-individual variability (IIV) | - | Variability between individuals in a population. Describes the difference between typical and individual PK parameters. Often assumed to be log-normally distributed. |
| Inter-occasion variability (IOV) | - | Variability within an individual on different occasions (e.g., sampling or dosing occasions). Often assumed to be log-normally distributed. |
| Residual error variability (RUV) | - | Remaining random unexplained variability. Describes the difference between individual prediction and observed value. |
| Population prediction | - | The population prediction is the most likely representation of the population given a set of covariates. |
| Individual prediction | - | Predictions for an individual using the population estimates in combination with the observed data for this individual, computed in a Bayesian posthoc step. |
L1, least absolute deviations; L2, least absolute errors; MAPE, mean absolute prediction error; ML, machine learning; PM, pharmacometrics.
Figure 3Comparison of the general model development workflow between pharmacometrics and machine learning. The different colors represent different steps of model development. Green: data preparation, blue: model building, red: model evaluation, orange: finalizing the model.
Different scenarios of data sizes used for model training and predicted outcome.
| Scenario | Model | Predictions |
|---|---|---|
| 1 | Features only | Rifampicin concentration-time series c |
| 2 | Features + 2 observed rifampicin concentrations a | Rifampicin concentration-time series c |
| 3 | Features + 6 observed rifampicin concentrations b | Rifampicin concentration-time series c |
| 4 | Features only | AUC0–24h |
| 5 | Features + 2 observed rifampicin concentrations a | AUC0–24h |
| 6 | Features + 6 observed rifampicin concentrations b | AUC0–24h |
a Time-points of rifampicin concentrations are at 2 and 4 h post-dose at days 7 and 14, representing a sparse sampling schedule. b Time-points of rifampicin concentrations are at 0.5, 1, 2, 4, 8 and 24 h post-dose at days 7 and 14, representing a richer sampling schedule. c At pre-dose and 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12, and 24 h post-dose at days 7 and 14. AUC0–24h, area under the rifampicin plasma concentration-time curve up to 24 h.
Figure 4Importance scores for evaluated features shown for the different machine learning algorithms. (A) GBM, (B) Random Forest and (C) XGBoost using features only (scenario 1) as input for prediction of plasma concentration versus time. The error bars represent the standard deviation. AGE, age (years); BMI, body mass index (kg/m2); DOSE, daily rifampicin dose (mg); FFM, fat-free mass (kg); HIV, HIV-coinfection; HT, body height (cm); OCC, treatment week; RACE, race; SEX, gender; TAD, time after dose (h); WT, bodyweight (kg).
Model performance for prediction of plasma concentration over time using varying amounts of information as input.
| GBM | XGBoost | Random Forest | LASSO | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Scenario 1 | Scenario 2 | Scenario 3 | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 1 | Scenario 2 | Scenario 3 | |
|
| 0.57 | 0.76 | 0.83 | 0.60 | 0.76 | 0.84 | 0.54 | 0.75 | 0.82 | 0.25 | 0.36 | 0.39 |
| Pearson correlation | 0.77 | 0.87 | 0.90 | 0.78 | 0.87 | 0.91 | 0.75 | 0.86 | 0.90 | 0.52 | 0.62 | 0.65 |
| RMSE (mg/L) | 10.9 (8.9–13.3) | 8.3 | 7.1 | 10.6 | 8.3 | 6.9 | 11.3 | 8.5 | 7.2 | 14.5 | 13.3 | 12.9 |
| MAE (mg/L) | 7.1 | 5.2 | 4.1 | 7.0 | 5.1 | 4.0 | 7.0 | 4.9 | 3.8 | 10.2 | 9.6 | 9.3 |
| Run time (s) | 6.8 | 8.2 | 11.1 | 1.4 | 1.2 | 4.7 | 309.9 | 362.6 | 508.7 | 1.1 | 1.3 | 1.1 |
In scenario 1, the models were trained to predict the rifampicin plasma concentration-time series (11 time-points at days 7 and 14) based only on features (no plasma concentrations). In scenario 2, the models were trained to predict the rifampicin plasma concentration-time series (11 time-points at days 7 and 14) based on features and 2 plasma concentrations at time-points 2 and 4 h post-dose at days 7 and 14. In scenario 3, the models were trained to predict the rifampicin plasma concentration-time series (11 time-points at days 7 and 14) based on features and 6 plasma concentrations at time-points 0.5, 1, 2, 4, 8 and 24 h post-dose at days 7 and 14. MAE, mean absolute error averaged across the n-folds (range); RMSE, root mean square error averaged across the n-folds (range).
Figure 5Predictions of rifampicin plasma concentration-time series from the different ML algorithms compared to the simulations from the population PK model, considered to be observations in this study. Panel (A) is the scenario where the model was trained to predict the rifampicin plasma concentration-time series using features only as input. In panel (B), the models were trained to predict the rifampicin plasma concentration-time series based on features and 2 plasma concentrations at time-points 2 and 4 h post-dose at days 7 and 14. In panel (C), the models were trained to predict the rifampicin plasma concentration-time series based on features and 6 plasma concentrations at time-points 0.5, 1, 2, 4, 8 and 24 h post-dose at days 7 and 14. The red dashed line represents a trendline through the data. The black solid line is the line of identity, indicating 100% agreement between true and predicted values.
Figure 6Prediction interval visual predictive check for the best-performing model (XGBoost) trained using 6 plasma concentrations as input (scenario 3) shown for the whole population. Open circles are the rifampicin plasma concentrations simulated from the population PK model, considered to be observed data in this study. The shaded area is the 95th prediction interval of the machine learning model predictions (XGBoost) and the solid blue line is the median of the model predictions. The upper and lower red dashed lines are the 97.5th and 2.5th percentiles of the observed data, respectively, and the solid red line is the median of the observed data.
Figure 7Individual rifampicin plasma concentrations predicted from the XGBoost model (solid line and open circles) compared to the concentrations simulated from the population PK model, considered to be observations in this study (black closed circles) shown for scenario 3 (features and 6 plasma concentrations used for prediction) for 15 randomly selected IDs. Panel (A) represents the predictions for each individual in the test dataset at day 7. Panel (B) represents the predictions for each individual in the test dataset at day 14. The different colors indicate the different daily rifampicin doses.
Figure 8Visual predictive check for the re-estimated population PK model based on the simulated data. Open blue circles are the rifampicin plasma concentrations simulated from the population PK model, considered to be observed data in this study. The upper and lower dashed lines are the 95th and 5th percentiles of the observed data, respectively, and the solid line is the median of the observed data. The shaded areas (top to bottom) are the 95% confidence intervals of the 95th (blue shaded area), median (red shaded area) and 5th (blue shaded area) percentiles of the simulated data.
Model performance for prediction of rifampicin AUC0-24h using varying amounts of information as input.
| GBM | XGBoost | Random Forest | LASSO | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Scenario 4 | Scenario 5 | Scenario 6 | Scenario 4 | Scenario 5 | Scenario 6 | Scenario 4 | Scenario 5 | Scenario 6 | Scenario 4 | Scenario 5 | Scenario 6 | |
|
| 0.27 | 0.61 | 0.73 | 0.44 | 0.71 | 0.84 | 0.22 | 0.62 | 0.78 | 0.41 | 0.69 | 0.97 |
| Pearson correlation | 0.59 | 0.73 | 0.83 | 0.63 | 0.75 | 0.83 | 0.55 | 0.73 | 0.83 | 0.67 | 0.84 | 0.98 |
| RMSE (h·mg/L) | 131.7 | 103.0 | 88.2 | 121.0 | 92.6 | 69.6 | 137.1 | 103.5 | 79.9 | 117.9 | 86.8 | 29.1 |
| MAE (h·mg/L) | 85.5 | 61.3 | 47.6 | 76.7 | 52.6 | 30.4 | 84.6 | 59.4 | 38.3 | 74.2 | 54.5 | 18.8 |
| Run time (s) | 1.3 | 1.6 | 1.8 | 0.7 | 4.7 | 4.1 | 20.5 | 21.9 | 22.8 | 1.1 | 1.0 | 1.1 |
In scenario 4, the models were trained to predict rifampicin AUC0–24h based only on features (no plasma concentrations) at days 7 and 14. In scenario 5, the models were trained to predict rifampicin AUC0–24h based on features and 2 plasma concentrations at time-points 2 and 4 h post-dose at days 7 and 14. In scenario 6, the models were trained to predict rifampicin AUC0–24h based on features and 6 plasma concentrations at time-points 0.5, 1, 2, 4, 8 and 24 h post-dose at days 7 and 14. AUC0–24h, Area under the rifampicin plasma concentration-time curve from 0 to 24 h; MAE, mean absolute error averaged across the n-folds (range); RMSE, root mean square error averaged across the n-folds (range).
Figure 9Predictions of rifampicin AUC0–24h at days 7 and 14 from the different ML algorithms compared to the NCA derived AUC0–24h, considered to be observations in this study. Panel (A) is the scenario where the model was trained using features only as input. In panel (B), the models were trained to predict rifampicin AUC0–24h based on features and 2 plasma concentrations at time-points 2 h and 4 h post-dose at days 7 and 14. In panel (C), the models were trained to predict rifampicin AUC0–24h based on features and 6 plasma concentrations at time-points 0.5 h, 1 h, 2 h, 4 h, 8 h and 24 h post-dose at days 7 and 14. The red dashed line represents a trendline through the data. The black solid line is the line of identity, indicating 100% agreement between true and predicted values.