| Literature DB >> 26942424 |
Roni Shouval1,2,3, Myriam Labopin4,5,6,7, Ron Unger3, Sebastian Giebel8, Fabio Ciceri9, Christoph Schmid10, Jordi Esteve11, Frederic Baron12, Norbert Claude Gorin4, Bipin Savani13, Avichai Shimoni1, Mohamad Mohty4,5,6,7, Arnon Nagler1,4.
Abstract
Models for prediction of allogeneic hematopoietic stem transplantation (HSCT) related mortality partially account for transplant risk. Improving predictive accuracy requires understating of prediction limiting factors, such as the statistical methodology used, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e., iterative computerized simulations), based on machine learning (ML) algorithms, we set out to analyze these factors. A cohort of 25,923 adult acute leukemia patients from the European Society for Blood and Marrow Transplantation (EBMT) registry was analyzed. Predictive objective was non-relapse mortality (NRM) 100 days following HSCT. Thousands of prediction models were developed under varying conditions: increasing sample size, specific subpopulations and an increasing number of variables, which were selected and ranked by separate feature selection algorithms. Depending on the algorithm, predictive performance plateaued on a population size of 6,611-8,814 patients, reaching a maximal area under the receiver operator characteristic curve (AUC) of 0.67. AUCs' of models developed on specific subpopulation ranged from 0.59 to 0.67 for patients in second complete remission and receiving reduced intensity conditioning, respectively. Only 3-5 variables were necessary to achieve near maximal AUCs. The top 3 ranking variables, shared by all algorithms were disease stage, donor type, and conditioning regimen. Our findings empirically demonstrate that with regards to NRM prediction, few variables "carry the weight" and that traditional HSCT data has been "worn out". "Breaking through" the predictive boundaries will likely require additional types of inputs.Entities:
Mesh:
Year: 2016 PMID: 26942424 PMCID: PMC4778768 DOI: 10.1371/journal.pone.0150637
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Patient Characteristics.
Interquartile range (IQR), Body mass index (BMI), Recipient (R), Donor (D), Cytomegalovirus (CMV), Acute lymphoblastic leukemia (ALL), Total body irradiation (TBI), Graft versus host disease (GVHD), Antithymocyte globulin (ATG), Peripheral blood (PB), Bone marrow (BM)
| Value | N | Missing, n | |
|---|---|---|---|
| Median year (IQR) | 2009 (2007–2011) | 25923 | 0 |
| Median recipient age (IQR) | 45 (33–56) | 25923 | 0 |
| Median BMI (IQR) | 24 (22–27) | 9350 | 16573 |
| Median days between diagnosis and HSCT (IQR) | 191 (138–363) | 25914 | 9 |
| Median donor's age (IQR) | 38.8 (29–48) | 10027 | 15896 |
| Recipient gender | 25872 | 51 | |
| Male | 14228 (55.0%) | ||
| Female | 11644 (45.0%) | ||
| Recipient CMV serostatus | 22855 | 3068 | |
| - | 7788 (34.1%) | ||
| + | 15067 (65.9%) | ||
| Karnofsky at transplant | 24369 | 1554 | |
| > = 80 | 22966 (94.2%) | ||
| <80 | 1403 (5.8%) | ||
| Comorbidity score merged | 2469 | 23454 | |
| 0 | 403 (16.3%) | ||
| 1 | 747 (30.3%) | ||
| 2 | 457 (18.5%) | ||
| 3 | 419 (17.0%) | ||
| > = 4 | 443 (17.9%) | ||
| Diagnosis | 25923 | 0 | |
| AML | 18610 (71.8%) | ||
| ALL | 7313 (28.2%) | ||
| Cytogenetics risk | 13430 | 12493 | |
| Standard | 10080 (75.1%) | ||
| Poor | 3350 (24.9%) | ||
| Disease stage | 25923 | 0 | |
| CR1 | 16201 (62.5%) | ||
| CR2 | 4909 (18.9%) | ||
| Advanced | 4813 (18.6%) | ||
| Previous autograft | 25923 | 0 | |
| - | 25235 (97.3%) | ||
| + | 688 (2.7%) | ||
| Donor gender | 25357 | 566 | |
| Male | 15712 (62.0%) | ||
| Female | 9645 (38.0%) | ||
| Donor CMV serostatus | 22726 | 3197 | |
| - | 10927 (48.1%) | ||
| + | 11799 (51.9%) | ||
| D-R sex combination | 25318 | 605 | |
| Male D to male R | 9153 (36.2%) | ||
| Female D to female R | 4863 (19.2%) | ||
| Male D to female R | 6528 (25.8%) | ||
| Female D to male R | 4774 (18.9%) | ||
| D-R CMV serostatus combination | 22395 | 3528 | |
| D-CMV–/R-CMV– | 5572 (24.9%) | ||
| D-CMV+/R-CMV–or D-CMV–/R-CMV+ | 8917 (39.8%) | ||
| D-CMV+/R-CMV+ | 7906 (35.3%) | ||
| Donor type | 25923 | 0 | |
| HLA matched unrelated donor | 13585 (52.4%) | ||
| HLA identical sibling | 12338 (47.6%) | ||
| HLA match degree | 9090 | 16833 | |
| 10/10 | 6519 (71.7%) | ||
| 9/10 | 2068 (22.8%) | ||
| <9/10 | 503 (5.5%) | ||
| Source of stem cells | 25923 | 0 | |
| BM | 4109 (15.9%) | ||
| PB or BM+PB | 21814 (84.1%) | ||
| Conditioning | 25420 | 503 | |
| MAC | 16836 (66.2%) | ||
| RIC | 8584 (33.8%) | ||
| TBI | 25742 | 181 | |
| No | 15042 (58.4%) | ||
| Yes | 10700 (41.6%) | ||
| GVHD prevention | 23228 | 2695 | |
| Ex-vivo T cell depletion | 800 (3.4%) | ||
| In-vivo T cell depletion | 9825 (42.3%) | ||
| No T cell depletion | 12603 (54.3%) | ||
| Relapse at day 100 | 25923 | 0 | |
| - | 23384 (90.2%) | ||
| + | 2539 (9.8%) | ||
| Non relapse related mortality at day 100 | 25923 | 0 | |
| - | 23536 (90.8%) | ||
| + | 2387 (9.2%) | ||
| Overall mortality at day 100 | 25923 | 0 | |
| - | 22643 (87.3%) | ||
| + | 3280 (12.7%) |
Fig 1In-silico predictive modeling- experimental design.
The original dataset was randomly split into an optimization and experimental datasets. The former was used for tuning of machine learning algorithms and feature selection. A. Several experiments were run on the experimental dataset, testing the effects of population size, specific subpopulations and number of variables included on predictive performance. B. A detailed explanation of the increasing population size experiment displayed in panel A. Patients were randomly sampled from the experimental dataset, creating samples with an expending size, which were later introduced to six machine learning algorithms. For each sample a prediction model for day 100 NRM was developed, and performance was measured through the area under the receiver operating curve (AUC). Models were trained and tested with 10 fold cross validation. The sampling process was repeated 5 times. C. For estimation of variable importance (ranked variables experiment in panel A) and the number of variables necessary for optimal prediction of day 100 NRM, we ran a feature selection algorithm on the optimization set. Variables were ranked according to their predictive contribution to each algorithm. The next step involved serial introduction of the variables, according to their importance to six machine learning algorithms which were applied on the experimental dataset. In each iteration a prediction model for day 100 NRM was trained and test with 10 fold cross validation. For instance in the first iteration the top ranking variable was introduced, in the second the top 2 variables and so on until all 23 variables were used. Performance was estimated according to the AUC. Machine learning (ML), Algorithm (Alg).
Fig 2Predictive performance of day 100 NRM prediction models with increasing sample size.
A gradually increasing sample from the experimental dataset was introduced to 6 machine learning algorithms. Prediction models were developed for each incremental step and their discriminative performance is plotted on the Y axis. Alternating decision tree (ADT), Logistic regression (LR), Multilayer perceptron (MLP), Naïve base (NB), Random forest (RF).
Predictive performance of day 100 NRM prediction models on varying subpopulations.
| AdaBoost | ADT | LR | MLP | NB | RF | Average performance | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample Size | AUC | STDV | AUC | STDV | AUC | STDV | AUC | STDV | AUC | STDV | AUC | STDV | AUC | STDV | |
| Full dataset | 22035 | 0.67 | 0.02 | 0.66 | 0.02 | 0.67 | 0.02 | 0.63 | 0.01 | 0.65 | 0.02 | 0.66 | 0.02 | 0.66 | 0.01 |
| Age<45 | 10820 | 0.66 | 0.03 | 0.65 | 0.03 | 0.66 | 0.03 | 0.64 | 0.02 | 0.65 | 0.03 | 0.66 | 0.03 | 0.65 | 0.01 |
| Age> = 45 | 11215 | 0.66 | 0.03 | 0.65 | 0.03 | 0.66 | 0.03 | 0.64 | 0.02 | 0.65 | 0.03 | 0.65 | 0.03 | 0.65 | 0.01 |
| ALL | 6214 | 0.65 | 0.04 | 0.64 | 0.04 | 0.66 | 0.04 | 0.64 | 0.02 | 0.64 | 0.03 | 0.65 | 0.03 | 0.65 | 0.01 |
| AML | 15821 | 0.67 | 0.02 | 0.66 | 0.03 | 0.67 | 0.02 | 0.65 | 0.02 | 0.65 | 0.02 | 0.66 | 0.02 | 0.66 | 0.01 |
| CR1 | 13787 | 0.63 | 0.03 | 0.61 | 0.03 | 0.64 | 0.03 | 0.63 | 0.02 | 0.61 | 0.03 | 0.62 | 0.03 | 0.62 | 0.01 |
| CR2 | 4165 | 0.58 | 0.05 | 0.55 | 0.04 | 0.59 | 0.05 | 0.59 | 0.03 | 0.58 | 0.05 | 0.58 | 0.05 | 0.58 | 0.01 |
| Advanced | 4083 | 0.62 | 0.04 | 0.61 | 0.04v | 0.61 | 0.04v | 0.59 | 0.03v | 0.6 | 0.03 | 0.61 | 0.04 | 0.61 | 0.01 |
| MAC | 14754 | 0.66 | 0.02 | 0.65 | 0.02 | 0.66 | 0.02 | 0.63 | 0.02 | 0.65 | 0.02 | 0.66 | 0.02 | 0.65 | 0.01 |
| RIC | 7703 | 0.67 | 0.03 | 0.66 | 0.03 | 0.67 | 0.03 | 0.66 | 0.02 | 0.65 | 0.03 | 0.66 | 0.03 | 0.66 | 0.01 |
| MRD | 10458 | 0.65 | 0.03 | 0.64 | 0.03 | 0.66 | 0.03 | 0.65 | 0.03 | 0.65 | 0.03 | 0.65 | 0.03 | 0.65 | 0.01 |
| MUD | 11577 | 0.64 | 0.02 | 0.63 | 0.03 | 0.64 | 0.03 | 0.62 | 0.02 | 0.62 | 0.03 | 0.63 | 0.02 | 0.63 | 0.01 |
* p-value <0.05 (t-test), Performance of reach model was compared with the performance of the model developed on the full experimental dataset, with the designated algorithm. Standard deviation (STDV), Alternating decision tree (ADT), Logistic regression (LR), Multilayer perceptron (MLP), Naïve base (NB), Random forest (RF), HLA matched related donor (MRD), HLA matched unrelated donor (MUD).
Fig 3Mean variable ranking of day 100 NRM prediction models.
Variable importance were extracted using a feature selection algorithm for 6 machine learning prediction models of day 100 NRM. The circle marks the mean ranking of each variable and the bars describe 2 standard deviations. Disease stage (Ds_Stage); Time from transplant to diagnosis (Time_dx); Diagnosis (Dx); Body mass index (BMI); Donor (D); Recipient (R); Previos autograft (2nd_tx); # of HLA mismatches (#HLA_miss); Graft versus host disease prophylaxis (GVHD_prox); Total body irradiation (TBI);
Fig 4Predictive performance of day 100 NRM prediction models on a cumulative ranked variable list.
Alternating decision tree (ADT); Logistic regression (LR); Multilayer perceptron (MLP); Naïve base (NB); Random forest (RF);