Yizhuo Wang1, Bing Z Carter2, Ziyi Li1, Xuelin Huang1. 1. Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. 2. Section of Molecular Hematology and Therapy, Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
Abstract
OBJECTIVE: A key component for precision medicine is a good prediction algorithm for patients' response to treatments. We aim to implement machine learning (ML) algorithms into the response-adaptive randomization (RAR) design and improve the treatment outcomes. MATERIALS AND METHODS: We incorporated 9 ML algorithms to model the relationship of patient responses and biomarkers in clinical trial design. Such a model predicted the response rate of each treatment for each new patient and provide guidance for treatment assignment. Realizing that no single method may fit all trials well, we also built an ensemble of these 9 methods. We evaluated their performance through quantifying the benefits for trial participants, such as the overall response rate and the percentage of patients who receive their optimal treatments. RESULTS: Simulation studies showed that the adoption of ML methods resulted in more personalized optimal treatment assignments and higher overall response rates among trial participants. Compared with each individual ML method, the ensemble approach achieved the highest response rate and assigned the largest percentage of patients to their optimal treatments. For the real-world study, we successfully showed the potential improvements if the proposed design had been implemented in the study. CONCLUSION: In summary, the ML-based RAR design is a promising approach for assigning more patients to their personalized effective treatments, which makes the clinical trial more ethical and appealing. These features are especially desirable for late-stage cancer patients who have failed all the Food and Drug Administration (FDA)-approved treatment options and only can get new treatments through clinical trials.
OBJECTIVE: A key component for precision medicine is a good prediction algorithm for patients' response to treatments. We aim to implement machine learning (ML) algorithms into the response-adaptive randomization (RAR) design and improve the treatment outcomes. MATERIALS AND METHODS: We incorporated 9 ML algorithms to model the relationship of patient responses and biomarkers in clinical trial design. Such a model predicted the response rate of each treatment for each new patient and provide guidance for treatment assignment. Realizing that no single method may fit all trials well, we also built an ensemble of these 9 methods. We evaluated their performance through quantifying the benefits for trial participants, such as the overall response rate and the percentage of patients who receive their optimal treatments. RESULTS: Simulation studies showed that the adoption of ML methods resulted in more personalized optimal treatment assignments and higher overall response rates among trial participants. Compared with each individual ML method, the ensemble approach achieved the highest response rate and assigned the largest percentage of patients to their optimal treatments. For the real-world study, we successfully showed the potential improvements if the proposed design had been implemented in the study. CONCLUSION: In summary, the ML-based RAR design is a promising approach for assigning more patients to their personalized effective treatments, which makes the clinical trial more ethical and appealing. These features are especially desirable for late-stage cancer patients who have failed all the Food and Drug Administration (FDA)-approved treatment options and only can get new treatments through clinical trials.
It is known that patients respond differently to the same treatments. The demand for selecting
the optimal treatment for each and every patient has resulted in a rapidly
developing field called precision medicine, also known as personalized
medicine. This field
aims to provide guidance to select the most effective treatment based on distinctive
patient biomarkers. As clinical trials also evolve in the age of precision medicine,
there is a substantial need for novel trial designs to deliver more ethical and
precise care. Compared with classical nonadaptive trials, adaptive trials have
become popular among clinicians as they integrate accumulating patient data to
modify the parameters of the trial protocol, provide personalized treatment
assignment, and ultimately optimize patients’ outcomes. For example, the
adaptive designs in phase 2/3 clinical trials take advantage of the interim
treatment response data during the course of the trial and allocate more patients to
the presumably more effective treatments.Among different adaptive designs, one common adaptation is response-adaptive
randomization (RAR). It refers to the adjustments of treatment allocations based on
intermediate patient responses and new patients’ characteristics collected
during the clinical trials. This RAR design is useful when the interaction between
biomarkers and treatments are only putative or not known at the beginning of a
trial, and it is also practical when there are multiple treatments to be considered.
Its ultimate objective is to provide more patients with their personalized optimal
therapies according to their biomarker profiles. The starting point of RAR can be
traced back to Thompson,
who proposed employing a posterior probability estimated from the interim data to
assign patients to the more effective treatment. Following his idea, the application
of Bayesian methods with an inherent adaptive nature has boomed in area of RAR
designs.Currently, there are several major successes in applying Bayesian RAR concepts in
clinical trials, from protocol development through legitimate registration. The
BATTLE-1 trial (Biomarker-integrated Approaches of Targeted Therapy for Lung Cancer
Elimination) for patients with advanced non–small cell lung cancer (NSCLC)
and the I-SPY 2 trial (Investigation of Serial Studies to Predict Your Therapeutic
Response with Imaging and Molecular Analysis) for patients with breast cancer in the
setting of neoadjuvant chemotherapy are 2 biomarker-based, Bayesian RAR clinical
trials., However, Bayesian RAR designs have a number of
challenges and limitations. Due to the modeling restrictions, Bayesian RAR methods
usually consider only a very small number of biomarkers. With complex diseases or
symptoms, hundreds or even thousands of biomarkers may need to be considered at the
same time for treatment assignment. Also, some Bayesian RAR methods adjust the design by
separating the cohort based on the existence of biomarker(s), and thus these methods
rely heavily on how well the biomarker(s) interact(s) with the treatments. If the
biomarker is chosen incorrectly, it is possible to make wrong adjustments
afterwards.,As the development of modern sequencing technology, clinicians have faced a massive
volume of high dimensional data with a complex, nonlinear structure. How to build an
effective and scalable algorithm for randomization becomes a fundamental question
for the research of RAR trial designs. Machine learning (ML) methods have been
applied to solve many real-world problems and have successfully demonstrated their
strengths in processing large data sets, as well as capturing nonlinear data
structures. With the expectations and resources to analyze this large amount of
complex healthcare data, ML methods have established their supremacy in disease
prediction, disease
classification,
imaging diagnosis, drug
manufacturing,
medication assignment,
and genomic feature identification tasks.Although several supervised ML approaches have been applied to drug response
prediction,, little of the work has explored
incorporating ML methods into RAR trial designs. In this study, we implemented 9 ML
algorithms into RAR designs and further presented an ML-ensemble RAR design
combining these 9 ML algorithms. Specifically, ML methods help to match patient
biomarker profiles with prediction of treatment outcomes and, in turn, have
determined treatment allocation for future patients. These ML methods are able to
address large data and complex structures. We have successfully demonstrated, in
both simulation study and a real-world example, that ML-based RAR designs have
higher response rates as there are more patients receiving effective treatments. The
ensemble method outperformed all other single ML methods.
MATERIALS AND METHODS
Adaptive design: response-adaptive randomization
In clinical trial design, adaptive design means making changes to the trial
protocol after the trial has started and some data have been collected. These
changes are based on the information from the collected data, including (1) the
total sample size, (2) interim analyses, (3) patient allocation to different
treatment arms, and more. For (3), it refers to the RAR design in which the
treatment allocation probability varies in order to favor the treatment
estimated to be more effective and to increase the response rate in patients.
The initial concept can be traced back to Thompson and Robbins, and led to others. Some famous RAR
trials include the extracorporeal membrane oxygenation (ECMO) trial, which
tested the efficacy of ECMO in patients with severe acute respiratory distress
syndrome (ARDS), and
the first large-scale double-blind, placebo-controlled study which tested the
superiority of fluoxetine over placebo in children and adolescents with
depression. A
general scheme of the RAR designs is shown in Figure 1.
Figure 1.
Response-adaptive randomization (RAR) design. The number of adaptive
randomization is adjustable per application.
Response-adaptive randomization (RAR) design. The number of adaptive
randomization is adjustable per application.
Benchmark design: equal randomization
Randomization as a standard means for addressing the selection bias in treatment
assignments has been extensively used in clinical trials. It helps to achieve balance among
treatment groups and accounts for the genuine uncertainty about which treatment
is better at the beginning of the trial. Randomly assigning patients to
treatment arms on a 1:1 basis is known as equal randomization (ER). Friedman et
al (p. 41)
presented that equal allocation in principle maximizes statistical power and is
consistent with the concept of equipoise that should exist before the trial
starts. Here, we used the ER design as a benchmark randomization design to
evaluate the performance of ML-based RAR designs.
Allocation rule
The key of the proposed method is to model the relationship of patient responses
and biomarkers. Such a model will then predict the response rate of each
treatment for each new patient and provide guidance for treatment assignment. In
detail, current enrolled patients’ biomarker profiles and treatment
response data were used to train ML models, which later were used to predict
future patients’ treatment responses based on their biomarker profiles.
Given treatment A and B, the probability of
allocating each treatment for patient i is shown as below:
where and , respectively, denote the response probability
of treatment and for patient predicted by the ML model.
ML algorithms and a ML ensemble
We selected 9 mainstream ML algorithms and implemented them in
the RAR design to predict treatment response. The prediction models were built
using the best-fitting parameters for each model, which were obtained by the
grid search method with a 10-fold cross-validation., Grid search is a standard method
which allows us to try a variety of tuning parameter combinations for the model
within a reasonable amount of time. The 10-fold cross-validation performs the
fitting process for a total of 10 times with randomly selected nine-tenths of
the data (90%) to train the model in each fit and the rest of the data
to validate. By doing this, we avoid bias from using a random single split. The
selected model will generalize better to all of the samples in the dataset.
Combining grid search with cross-validation, we evaluate the performance of each
parameter combination and select the best parameters for each ML model. Here we
conducted this hyperparameter tuning procedure in R using the
“Caret” package; similar techniques are available in the
scikit-learn Python ML library. These selected ML algorithms can be roughly
divided into 2 categories:Parametric models: logistic regression, LASSO regression, and Ridge
regression.Nonparametric models: gradient boosting machine (GBM), random forest
(RF),
support vector machine (SVM), Naive Bayes, k-nearest neighbors
(KNN),
and artificial neural networks (NNs).For logistic regression, Ridge regression and Lasso regression, they are all
considered parametric models. In detail, logistic regression assumes the
linearity of independent variables and log odds. It is a particular form of
GLM. Ridge
regression and LASSO regression assume that there is a linear relationship
between the “dependent” variable and the explanatory variables.
They are 2 regularization methods of GLM to prevent an over-fitting issue by
adding penalties on the predictor variables that are less significant.,KNN, which classifies data points based on the points that are most similar to
it, is a typical nonparametric model such that there is no assumption for
underlying data distribution, and the number of parameters grows with the size
of the data. With
NNs, however, there has been some debate regarding whether they belong to
parametric or nonparametric methods. NNs typically consist of 3 layers: input
layer, hidden layer, and output layer. Here we classify NNs as a nonparametric
method, as the network architecture grows adaptively to match the complexity of
given data.Both GBM and RF are nonparametric methods that consist of sets of decision trees.
Specifically, GBM builds one tree at a time and each new tree helps to correct
errors made by previously trained tree by adding weights to the observations
with the worst prediction from the previous iteration; RF trains each tree
independently using a random sample of the data, and the results are aggregated
in the end.,NB and SVM can be either parametric or nonparametric depending on whether they
use kernel tricks. For the NB classifier, it becomes nonparametric if using a
kernel density estimation (KDE) to obtain a more realistic estimate of the
probability of an observation belonging to a class. And for SVM, the basic idea is
finding a hyperplane that best divides a dataset into 2 classes. It is
considered a nonparametric when using the kernel trick to find this hyperplane.
This is because the kernel is constructed by computing the pair-wise distances
between the training points, and the complexity of the model grows with the size
of the dataset.Combing these 9 models, an ML ensemble method was built and implemented in the
RAR design to obtain a better treatment allocation rule. We defined the
treatment allocation probability function for patient i as
follows: where is the number of agreed models, is the total number of models, is the threshold number of agreed models, and
is the threshold treatment allocation
probability. Here we chose and for ; these threshold values can be adjusted
accordingly for different application purposes. To further understand the
impacts of selecting different parameter values, we did simulations using
different threshold number of agreed models and different threshold allocation
probabilities . The results are shown in Supplementary Figures
S1S4. In Supplementary Figures S1 and S3 right-sided figures, when
the treatment main effect is high, as the threshold parameter increases, the
response rate of the ensemble method increases and the individual loss
decreases. This intuitively makes sense because when the consensus method
reaches the correct decision, increasing will increase the probability of patients
receiving their optimal treatments (Supplementary Figure S2). When the treatment main effect
is low, the increments of the response rate are not very significant (Supplementary Figure S1,
left), but we still observe obvious differences regarding the optimal treatment
percentage and the individual loss (Supplementary Figures S2 and S3, left). Meanwhile, it is
still desirable to maintain some randomness of treatment assignment in a
clinical trial and thus an allocation probability of 1 for the optimal treatment
is not recommended. For Supplementary Figure S4, we show the results of the response rate
and the optimal treatment percentage, and we can see that the difference of
using different threshold values of agreed ML models is minor.Apart from comparing with the ER “benchmark” design, the current
study also examined whether the ML ensemble could assign more patients to the
best available treatment beyond other ML methods in adaptive design with the
same assessment of individuals.
Inverse probability of treatment weighting
Similar to the observational study in which certain outcomes are measured without
attempting to change the outcome, the treatment selection of future patients in
RAR trials is often influenced by individual characteristics of the initial
block of patients.
As a result, when estimating the impact of treatment on responses, systemic
variations in baseline characteristics between differently treated individuals
must be taken into account. Here we applied the inverse probability of treatment
weighted (IPTW) method to decrease or remove the effects of confounding when
using the observational data to estimate the treatment effects. The idea of IPTW
is to use weights based on the propensity score to create a synthetic sample in
which the distribution of baseline characteristics is independent of
treatment. The
propensity score refers to the probability of treatment allocation tied to the
observed individual characteristics. And the weight based on it is defined as
follows: where is the treatment indicator and is the propensity score for the th subject. Different estimators for treatment
effects based on IPTW have been developed; here we used an estimator of the
average treatment effect (ATE), which is defined as where is the effect of treatment. Incorporated with the
IPTW idea, the ATE estimator is defined as follows: where denotes the response variable of the
th subject, denotes the total number of subjects, and
still denotes the propensity score.
Evaluation metrics
Two commonly used criterion in the field of precision medicine, namely the
overall response rate and the percentage of individuals receiving optimal
treatments, are our primary evaluation metrics. The formulas of the response
rate and the optimal treatment percentage are as follows:The power and the average treatment effect (ATE) adjusted by the IPTW method were
also reported to thoroughly evaluate each methods’ performance. The
power of a clinical trial refers to the probability of detecting a difference
between different treatment groups when it truly exists; ATE was defined in the
previous section. Additionally, we proposed a new criterion, the individual loss
to quantify the loss for each patient due to receiving suboptimal treatments.
For the individual loss, we first define a match for the enrolled patients. A
match occurs when the patient’s actual treatment received is the same as
the best treatment from the true model. For an enrolled patient with signature let denote the probability of responding to the
received treatment . Let denote the probability of responding to the
optimal treatment determined by the true model. Then we define the personalized
loss function as follows:A low individual loss value suggests that the majority of patients have received
the treatment and will respond at least as well as the real model’s
optimal therapy.
RESULTS
Simulation
We used simulation studies to evaluate the proposed methods.
Setting
We generated the patient’s response from a logistic
regression model with 10 biomarkers: where is the treatment indicator (either
treatment 0 or treatment 1), is the treatment main effect coefficient,
is the biomarker for patient , and incorporates some polynomial and step
function terms. is the biomarker-treatment interaction
coefficient and were assumed to interact with the
treatment. A random noise for each subject is denoted as . In detail, each biomarker was assumed to follow a normal distribution
with a mean of 0 and a standard deviation of 1. Among these 10 biomarkers,
contributed to the true model as
third-degree polynomials, while contributed to the true model as step
functions:Seven scenarios of different treatment main effects (=0, 0.5, 0.7, 1, 1.3, 1.5, 1.7) and
a fixed treatment-biomarker interaction () were considered. We conducted 1000 Monte
Carlo simulations for each scenario and compared the results obtained by the
ML-based and ML-ensemble RAR designs with the results from the ER
design.
Response rate and optimal treatment percentage
The response rate results and the percentage of receiving the optimal
treatment are shown in Figure 2. Overall, the performance of ML-based RAR
designs is better than the performance of the ER design. When the treatment
main effect is zero, the differences for both response rate and the optimal
treatment percentage between ML-based RAR designs and the ER design are not
significant. As the treatment effect increases, these differences become
more obvious. Among these 9 ML algorithms, the neural network has the
highest response rate and the highest proportion of patients receiving their
optimal treatments. Additionally, the ensemble method combining these 9 ML
methods outperforms all other methods and achieves an approximate 5%
higher response rate and a more than 20% larger optimal treatment
percentage compared to the ER design.
Figure 2.
Simulation result: response rate (left), percentage of patients
receiving their optimal treatments (right). The treatment-biomarker
interaction, is fixed at 0.5. Boxplots display
the median (middle line), the interquartile range (hinges), and 1.5
times the interquartile range (lower and upper whiskers) based on
1000 times simulation. The mean (over 1000 simulations) response
rate ranges from 0.53 to 0.69, and the mean of optimal treatment
percentages ranges from 0.50 to 0.71.
Simulation result: response rate (left), percentage of patients
receiving their optimal treatments (right). The treatment-biomarker
interaction, is fixed at 0.5. Boxplots display
the median (middle line), the interquartile range (hinges), and 1.5
times the interquartile range (lower and upper whiskers) based on
1000 times simulation. The mean (over 1000 simulations) response
rate ranges from 0.53 to 0.69, and the mean of optimal treatment
percentages ranges from 0.50 to 0.71.
Individual loss, ATE, and power
The individual loss and the ATE results are shown in Figure 3. The interpretation of the
individual loss results coincides with the previous response rate results
and the optimal treatment percentage results such that the ML-ensemble RAR
design has the lowest individual loss value among all scenarios, which is
preferred in the trial. The ATE has been adjusted by the IPTW method to
account for confounding effect of using observational data. The logistic
regression method now has the highest ATE, followed by the NN method. The
ensemble method has a relatively low ATE, but it is higher than the ER
method when the treatment main effect becomes larger. This shows that the
average effect of changing the entire population from untreated to treated
using RAR designs is better than that of using the ER design.
Figure 3.
Simulation result: individual loss (left), average treatment effect
(ATE, right). The treatment-biomarker interaction, is fixed at 0.5. Boxplots display
the median (middle line), the interquartile range (hinges), and 1.5
times the interquartile range (lower and upper whiskers) based on
1000 times simulation. The mean (over 1000 simulations) individual
loss ranges from 0.04 to 0.10, and the mean ATE ranges from -0.12 to
0.30.
Simulation result: individual loss (left), average treatment effect
(ATE, right). The treatment-biomarker interaction, is fixed at 0.5. Boxplots display
the median (middle line), the interquartile range (hinges), and 1.5
times the interquartile range (lower and upper whiskers) based on
1000 times simulation. The mean (over 1000 simulations) individual
loss ranges from 0.04 to 0.10, and the mean ATE ranges from -0.12 to
0.30.
Power
The power results are shown in Figure 4. The power is also weighted by the IPTW method
to address potential bias. For the power analysis, the Type I error is
controlled at 0.05. Several papers have shown in their simulation studies
that the correlation among treatment assignments was inevitable when
performing inference on the data from RAR design-implemented studies., This correlation
can increase the binomial variability and lower the power. In our
simulation, the RAR design using the NN method has the lowest power,
followed by using the logistic regression method. However, other ML-based
RAR designs have comparable or even higher power than that of the ER design.
The ensemble method has a relatively low power, but it is still better than
the NN method.
Figure 4.
Simulation result: power. The treatment-biomarker interaction,
is fixed at 0.5. The Type I error
is controlled at 0.05. The power ranges from 0.04 to 0.97.
Simulation result: power. The treatment-biomarker interaction,
is fixed at 0.5. The Type I error
is controlled at 0.05. The power ranges from 0.04 to 0.97.
Real-world example
We analyzed a publicly available acute myeloid leukemia (AML) dataset from
Kornblau et al where
most of the clinical biomarkers are expression levels of cellular proteins.
Kornblau et al sequenced protein expressions in leukemia-enriched cells from 256
newly diagnosed AML patients with a primary goal of eventually establishing a
proteomic-based categorization of AML. The treatment and the response variables
were carefully adjusted to binary variables. Specifically, the treatments were
binarized to high-dose ara-C (HDAC)–based treatments and non-HDAC
treatments; the responses were binarized to complete response (CR) and
non-CR.We first performed a feature selection to decide what interaction terms should be
included in the model. We used each protein-treatment interaction term to build
the generalized linear model (GLM) model and reported the p-value for each
interaction to assess whether it has strong correlation with the dependent
variable/the treatment response. The top 10 proteins whose interaction variables
have the smallest P-values were selected. We then performed a
gene network analysis on the genes that code for these proteins using GeneMANIA
(http://genemania.org). This analysis helps
to illustrate the hidden interaction and network of the corresponding genes.
Additionally, it shows other genes that have been reported to associate with the
input 10 genes, using extensive existing knowledge such as protein and genetic
interactions, pathways, co-expression, co-localization, and protein domain
similarity. The results are presented in Figure 5. The top 10 genes corresponding to the biomarkers
identified in our study are highlighted with red circles.
Figure 5.
AML data: the gene network analysis. The input 10 genes, namely the genes
coding for top 10 proteins that significantly interacted with the
treatment, were highlighted using red circles. Other genes that were
presumably involved in AML were returned by GeneMANIA.
AML data: the gene network analysis. The input 10 genes, namely the genes
coding for top 10 proteins that significantly interacted with the
treatment, were highlighted using red circles. Other genes that were
presumably involved in AML were returned by GeneMANIA.Using a cut-off P-value of 0.1 among 71 proteins, the expression
levels of 3 of them were found to have the most significant interactions with
the treatment, that is, the strongest correlation with the treatment outcomes:
phosphothreonine 308 of Akt (Thr 308 p-Akt), the mechanistic target of rapamycin
(mTOR), and signal transducer and activator of transcription 1 (STAT1). Studies
have shown that these 3 proteins play critical roles in human AML. The level of
Thr 308 p-Akt is associated with high-risk cytogenetics and predicts poor
overall survival for AML patients. In AML, the mTOR signaling pathway is deregulated
and activated as a consequence of genetic and cytogenetic abnormalities. The
mTOR inhibitors are often used to target aberrant mTOR activation and
signaling., The STAT1 transcription factor is constitutively
activated in human AML cell lines and might contribute to the autonomous
proliferation of AML blasts. The inhibition of this pathway can be of great
interest for AML treatments., Hence, we chose these 3 proteins to build ML
models in our proposal.The whole dataset (256 observations) was randomly shuffled and divided into 2
equal-sized blocks: block 1 and block 2. Each block was taken in turn as either
the training set or the testing set. The results were aggregated after 100
repetitions. Since this clinical trial is already completed and it is not
possible to get actual treatment responses using our methods, we separated the
enrolled patients into 2 groups: a consistent group whose real treatments are
the same as the treatments using the ML-based RAR designs and an inconsistent
group whose real treatments are different from the treatments using the ML-based
RAR designs. We compared the response rates in these 2 groups to elucidate the
potential gain if the proposed RAR had been implemented. The results of each
method are shown in Figure 6. In the consistent group, the response patient
percentages are at least 10% higher than 50%; while the response
patient percentages in the inconsistent group are all lower than 50%,
that is, we observe higher response rates in the consistent group. This means
that patients in the inconsistent group may likely benefit from the RAR method
we developed.
Figure 6.
AML data result: the response percentage. Patients in the consistent
group (left) were assigned to the same treatments using our ML-based RAR
designs, while patients in the inconsistent group (right) were assigned
to different treatments using our ML-based RAR designs. The 50%
response percentage is marked with a black dashed line.
AML data result: the response percentage. Patients in the consistent
group (left) were assigned to the same treatments using our ML-based RAR
designs, while patients in the inconsistent group (right) were assigned
to different treatments using our ML-based RAR designs. The 50%
response percentage is marked with a black dashed line.
DISCUSSION
Patients are accrued in groups sequentially. RAR designs determine the treatment
allocation for new groups of patients based on the accrued information of how
previous groups of patients responded to their treatments. The number of RAR
implementations, , should be predefined. The choice of
may depend on the total sample size, trial length,
and other logistics and practical considerations. Our simulation study used
for a total sample size of . In the real data analysis with a smaller sample
size of 256 subjects, we used .We developed novel methods for RAR designs by incorporating 9 ML methods to predict
treatment response and assign treatments accordingly. We showed that our ML-based
RAR designs can effectively improve treatment response rates among patients. We
further proposed an ensemble approach based on the consensus of the 9 ML methods to
improve the prediction and decision making. Our proposed ML-ensemble RAR design
builds on the predictive ability of 9 ML methods and can further improve predication
accuracy and patient outcome. Specifically, suppose out of 9 models indicate that treatment
is better than treatment for patient , then we let for , let for , and for . For , we keep the assignment probability as a constant
of 0.85 because we still want to reserve some randomness in the trial. These
settings can be tuned based on prior knowledge of the treatment selections.We also tried the combination of NN and GLM algorithms as another binary-combination
method and conducted additional simulations. Since these 2 models may not always be
in consensus regarding optimal treatment selection for each individual, we took the
average of the treatment assigning probabilities of the NN and GLM methods. Similar
as we have done previously for the 9 ML methods and the ensemble approach, we
evaluated its performance through the overall response rate in simulated trials, the
percentage of patients receiving their individually optimal treatment, and the
average individual loss for all trial participants. We have provided the results in
the Supplementary Figure
S5. Although the performance of this combination is slightly better than
using either NN or GLM alone, it is still substantially worse than that of the
ensemble using 9 ML algorithms, especially when the treatment main effect is
high.While we only considered settings of 2 treatment options in this work, ML-based RAR
design can extend to multiple targeted treatments. Given treatments, the treatment allocating probability of patient
is shown as , where denotes the response probability of treatment for patient predicted by the ML algorithm. For example, NN can
naturally adapt to a multiclass classification problem by replacing the binary
cross-entropy loss to a categorical cross-entropy loss.Although our work can effectively improve the treatment outcomes in the clinical
trial, there are a few limitations that we would like to point out as directions for
further research. First, equal weight was given to each of the 9 ML algorithms in
the ensemble method. However, it is likely that different ML methods have distinct
prediction accuracy at different scenarios. Incorporating such information by
attaching different weights for different ML algorithms in the ensemble method could
potentially lead to better adaptation to the data and may provide more precise
treatment suggestions for personalized medicine. Second, although our method has
been extensively evaluated using simulated and real data, we did not consider the
setting with high-dimensional data, for example, the data from omics experiments.
With the development of modern sequencing technology, more clinical trials seek to
include such information in clinical decision making and trial design. With
high-dimensional data, there are more challenges, such as adding appropriate feature
selection steps, etc. Moreover, our current model did not consider the situation
when complex interactions between treatment and individualized biomarkers exist in
the dataset. When this problem is of interest, we might resort to other models that
are specifically designed to address the heterogeneous treatment effect caused by
these interactions, such as the honest causal forest model, that are specifically designed to address
the heterogeneous treatment effect caused by these interactions.
CONCLUSION
ML methods have successfully demonstrated their superior prediction performance in
many applications, but have not been applied to conduct RAR in clinical trials. In
this study, we developed novel methods for RAR designs by incorporating ML
algorithms to predict treatment response and assign treatments accordingly. We
showed that the ML-based RAR designs have better performance than that of the
traditional ER design. And the ensemble approach demonstrated better results than
the ER design at the greatest extent. As the ML field is getting mature and abundant
packages are available on different programming software, our method is easy to
implement in current clinical trial systems.
FUNDING
The research of XH was partially supported by the US National Institutes of Health
grants U54CA096300, U01CA253911, and 5P50CA100632, and the Dr. Mien-Chie Hung and
Mrs. Kinglan Hung Endowed Professorship.
AUTHOR CONTRIBUTIONS
XH, ZL, and YW conceived the concept of the study and designed the method. YW
implemented the method, performed the experiments, and drafted the initial
manuscript. BC interpreted the real-word data for the work. All authors edited and
approved the final manuscript.
SUPPLEMENTARY MATERIAL
Supplementary material is
available at JAMIA Open online.
CONFLICT OF INTEREST STATEMENT
None declared.
DATA AVAILABILITY
Simulation data can be reproduced by the R script that has been deposited in the
online Dryad repositor.
The AML dataset used for real-world illustration can be downloaded from https://bioinformatics.mdanderson.org/public-datasets/supplements/
under “RPPA Data in AML”.Click here for additional data file.
Authors: Maurizio Zazzi; Francesca Incardona; Michal Rosen-Zvi; Mattia Prosperi; Thomas Lengauer; Andre Altmann; Anders Sonnerborg; Tamar Lavee; Eugen Schülter; Rolf Kaiser Journal: Intervirology Date: 2012-01-24 Impact factor: 1.763
Authors: Alain Combes; David Hajage; Gilles Capellier; Alexandre Demoule; Sylvain Lavoué; Christophe Guervilly; Daniel Da Silva; Lara Zafrani; Patrice Tirot; Benoit Veber; Eric Maury; Bruno Levy; Yves Cohen; Christian Richard; Pierre Kalfon; Lila Bouadma; Hossein Mehdaoui; Gaëtan Beduneau; Guillaume Lebreton; Laurent Brochard; Niall D Ferguson; Eddy Fan; Arthur S Slutsky; Daniel Brodie; Alain Mercat Journal: N Engl J Med Date: 2018-05-24 Impact factor: 91.245
Authors: David Warde-Farley; Sylva L Donaldson; Ovi Comes; Khalid Zuberi; Rashad Badrawi; Pauline Chao; Max Franz; Chris Grouios; Farzana Kazi; Christian Tannus Lopes; Anson Maitland; Sara Mostafavi; Jason Montojo; Quentin Shao; George Wright; Gary D Bader; Quaid Morris Journal: Nucleic Acids Res Date: 2010-07 Impact factor: 16.971
Authors: V Gouilleux-Gruart; F Gouilleux; C Desaint; J F Claisse; J C Capiod; J Delobel; R Weber-Nordt; I Dusanter-Fourt; F Dreyfus; B Groner; L Prin Journal: Blood Date: 1996-03-01 Impact factor: 22.113
Authors: N Gallay; C Dos Santos; L Cuzin; M Bousquet; V Simmonet Gouy; C Chaussade; M Attal; B Payrastre; C Demur; C Récher Journal: Leukemia Date: 2009-01-22 Impact factor: 11.528