| Literature DB >> 36128052 |
Corrado Lanera1, Paola Berchialla2, Giulia Lorenzoni1, Aslihan Şentürk Acar3, Valentina Chiminazzo1, Danila Azzolina1,4, Dario Gregori1, Ileana Baldi1.
Abstract
A critical early step in a clinical trial is defining the study sample that appropriately represents the target population from which the sample will be drawn. Envisaging a "run-in" process in study design may accomplish this task; however, the traditional run-in requires additional patients, increasing times, and costs. The possible use of the available a-priori data could skip the run-in period. In this regard, ML (machine learning) techniques, which have recently shown considerable promising usage in clinical research, can be used to construct individual predictions of therapy response probability conditional on patient characteristics. An ensemble model of ML techniques was trained and validated on twin randomized clinical trials to mimic a run-in process within this framework. An ensemble ML model composed of 26 algorithms was trained on the twin clinical trials. SuperLearner (SL) performance for the Verum (Treatment) arm is above 70% sensitivity. The Positive Predictive Value (PPP) achieves a value of 80%. Results show good performance in the direction of being useful in the simulation of the run-in period; the trials conducted in similar settings can train an optimal patient selection algorithm minimizing the run-in time and costs of conduction.Entities:
Mesh:
Year: 2022 PMID: 36128052 PMCID: PMC9482682 DOI: 10.1155/2022/4306413
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Baseline characteristics, stratified by trial (A or B) and treatment (Placebo or Verum). Continuous variables are expressed in terms of I., II. (median), and III. quartiles while categorical ones with frequencies and absolute values.
| Variables | Valid cases | Trial A | Trial B | ||
|---|---|---|---|---|---|
| Placebo ( | Verum ( | Placebo ( | Verum ( | ||
| Age (years) | 253 | 59/63/69 | 57/63/66 | 60/65/69 | 59/65/71 |
| Body mass index | 257 | 26/27/27 | 24/26/27 | 26/28/30 | 25/28/30 |
| Gender: male | 257 | 30%(16) | 20%(13) | 23%(16) | 21%(14) |
| Height (cm) | 257 | 160/165/173 | 163/166/170 | 154/160/165 | 154/160/166 |
| Weight (kg) | 257 | 66/71/79 | 65/70/75 | 64/69/77 | 63/70/77 |
| Therapy responder | 257 | 50%(27) | 62%(41) | 49%(34) | 55%(37) |
Base learner used for each SL trained; risk (average value of MSE in the Cross-Validation procedure) and coefficient (weight of the base learner convex combination used to form the SL) are reported. Weights equal to zero are omitted. The algorithm composing the SL is identified; the average indicates the SL average ensemble prediction algorithm. The screening (feature selection) algorithm has been also identified. For example, “SL, Mars Algorithm, RF screened features” identify the risk associated with the Mars algorithm within SL ensemble with an RF-based feature selection procedure.
| SL trained on study A – Placebo | Risk | Coefficient |
| SL, Mars Algorithm, all features | 0.177 | 0.213 |
| SL, Mars Algorithm, RF screened features | 0.161 | 0.257 |
| SL, average, all features | 0.139 | 0.311 |
| SL, Rpart, RF screened features | 0.150 | 0.219 |
|
| ||
| SL trained on study A – Verum | Risk | Coefficient |
| SL, average, all features | 0.121 | 0.539 |
| SL, Polymars, RF screened features | 0.131 | 0.410 |
| SL, RF, RF screened features | 0.132 | 0.051 |
|
| ||
| SL trained on study B – Placebo | Risk | Coefficient |
| SL, Mars Algorithm, all features | 0.099 | 0.170 |
| SL, Glmnet Algorithm, all features | 0.082 | 0.119 |
| SL, Glmnet Algorithm, RF screened features | 0.075 | 0.298 |
| SL, average, all features | 0.127 | 0.015 |
| SL, RF, RF screened features | 0.076 | 0.398 |
|
| ||
| SL trained on study B - Verum | Risk | Coefficient |
| SL, Rpart, all features | 0.126 | 0.124 |
| SL, average, all features | 0.127 | 0.523 |
| SL, Polymars, RF screened features | 0.191 | 0.141 |
| SL, RF, RF all features | 0.126 | 0.213 |
Abbreviations: SL = SuperLearner; RF = Random Forest; Glmnet = Lasso and Elastic-Net Regularized 329 Generalized Linear Models; Mars = Multivariate Adaptive Regression Splines; Polymars = Poly-330 chotomous classification based on Multivariate Adaptive Regression Splines; Rpart = Recursive Par-331 titioning Trees.
Predictive performance statistics. The sentence “X to Y” (where X is trial A or trial B and Y is the other trial) indicates the performance of an algorithm trained on study X and tested on study Y (only on the indicated arm).
| Sens | Spec | Acc | PPV | NPV | AUC | |
|---|---|---|---|---|---|---|
| A to B Placebo | 0.611 | 0.529 | 0.571 | 0.579 | 0.563 | 0.658 |
| B to A Placebo | 0.370 | 0.778 | 0.574 | 0.625 | 0.553 | 0.630 |
| A to B Verum | 0.700 | 0.541 | 0.612 | 0.553 | 0.690 | 0.693 |
| B to A Verum | 0.760 | 0.634 | 0.682 | 0.559 | 0.813 | 0.763 |
Sens = sensitivity; Spec = specificity; Acc = accuracy; PPV=Positive Predictive Values; NPV=Negative Predictive Values; AUC = Area Under Curve.
Figure 1ROC curves for the SL performance. ROC curves for the SL performance. ap = SL trained on trial A placebo, tested on trial B placebo; av = SL trained on trial A Verum, tested on trial B Verum; bp = SL trained on trial B placebo, tested on trial A placebo; bv = SL trained on trial B Verum, tested on trial A.