| Literature DB >> 28045960 |
Haneen Banjar1,2, Damith Ranasinghe1,3, Fred Brown1, David Adelson4, Trent Kroger1, Tamara Leclercq5,6, Deborah White5,6,7,8,9, Timothy Hughes5,6,8,9,10, Naeem Chaudhri11.
Abstract
BACKGROUND: Treatment of patients with chronic myeloid leukaemia (CML) has become increasingly difficult in recent years due to the variety of treatment options available and challenge deciding on the most appropriate treatment strategy for an individual patient. To facilitate the treatment strategy decision, disease assessment should involve molecular response to initial treatment for an individual patient. Patients predicted not to achieve major molecular response (MMR) at 24 months to frontline imatinib may be better treated with alternative frontline therapies, such as nilotinib or dasatinib. The aims of this study were to i) understand the clinical prediction 'rules' for predicting MMR at 24 months for CML patients treated with imatinib using clinical, molecular, and cell count observations (predictive factors collected at diagnosis and categorised based on available knowledge) and ii) develop a predictive model for CML treatment management. This predictive model was developed, based on CML patients undergoing imatinib therapy enrolled in the TIDEL II clinical trial with an experimentally identified achieving MMR group and non-achieving MMR group, by addressing the challenge as a machine learning problem. The recommended model was validated externally using an independent data set from King Faisal Specialist Hospital and Research Centre, Saudi Arabia. PRINCIPLEEntities:
Mesh:
Substances:
Year: 2017 PMID: 28045960 PMCID: PMC5207707 DOI: 10.1371/journal.pone.0168947
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The current predictive assays and score systems, the factors included in score systems and the methods used; the target prediction and final results.
| Previous methods | ||||
|---|---|---|---|---|
| Study | Factors | Method | Target prediction | Data and Results |
| White | OA (ng/200,000 cells) | Kaplan Meier Analysis | MMR by 60 months to IM | TIDEL I clinical trial (n = 56), High OA: 89%, and low OA: 55% |
| White | IC50IM (μM) | Kaplan Meier Analysis | MMR by 12 months to IM | TIDEL I clinical trial (n = 116), Low IC50IM: 65%, and High IC50IM: 39% |
| Sokal Score, Sokal | Age, spleen Size (cm), blast (%), and platelets (109/L) | Multivariate analysis of survival | Risk groups to chemotherapy | Six European and American sources (n = 813), Low 39%, intermediate 38%, and high 23% |
| Hasford Score, Hasford | Age, spleen size (cm), blasts (%), eosinophils (%), basophils (%)and platelets (109/L) | Multivariate analysis of survival | Risk groups to interferon alpha alone | 14 studies (n = 981), Low 40.6%, intermediate 44.7%, and high 14.6% |
| EUTOS Score, Hasford | Basophils (%) and spleen Size (cm) | Multivariate analysis of response | CCgR at 18 months to IM | Five national study group (n = 2060), Low 79%, and high 21% |
Predictive factor descriptions, factor type and median with range values.
| Factors | Description | Type | Median (Range) |
|---|---|---|---|
| Clinical factor recorded at the time of diagnosis | Continuous | 49 (17–81) | |
| Clinical factor recorded at the time of diagnosis | Categorical | ||
| Clinical factor measured by observation at diagnosis | Continuous | 3.8 (0–30) | |
| Genetic factor identified by quantitative PCR analysis to | Categorical | ||
| The OCT-1 protein activity as a protein function can be measured by uptake in the presence and absence of a specific OCT-1 inhibitor in mRNA. | Continuous | 4.7 (0–16.32) | |
| Biological factor measured as the concentration of IM producing a 50% decrease in the level of p-Crkl. | Continuous | 1 (0.2–4.5) | |
| Real-time quantitative polymerase chain reaction (RQ-PCR) can measure the level of | Continuous | 107.14 (1.96–969) | |
| Biological factor (neutrophil, granulocytes) that can be measured from peripheral blood | Continuous | 32.59 (0.5–219.2) | |
| Biological factor in white blood cells that can be measured from peripheral blood | Continuous | 1.7 (0–13.02) | |
| Biological factor in white blood cells that can be measured from peripheral blood | Continuous | 3.37 (0–13.9) | |
| Biological factor in white blood cells that can be measured from peripheral blood | Continuous | 2.16 (0–38.89) | |
| Biological factor in white blood cells that can be measured from peripheral blood | Continuous | 1.11 (0–17.81) | |
| Biological factor and important cells in CML that can be measured from peripheral blood. | Continuous | 49.6 (1.1–353.50) | |
| Biological factor that can be measured from peripheral blood | Continuous | 1.27 (0–13.9) | |
| Biological factor that can be measured from peripheral blood | Continuous | 485 (91–1219) | |
| Risk score developed in 1984 | Continuous | 1 (0.45–8.08) | |
| Risk score developed in 1998 | Continuous | 801 (0–2137.6) | |
| Risk score developed in 2011 | Continuous | 46.94 (0–228.5) |
Fig 1The schema for the CML predictive model, building, evaluation, and final model selection.
To build the predictive model, we studied a clinical trial, preparing data for analysis by imputing missing values and reformatting factors using comprehensive standard boundaries to create subcategories for each predictive factor based on domain knowledge. For evaluation and final model selection, the nested design was used to split the dataset into training, validation and testing sets. The model was trained on the training set, features were selected on the validation set, and performance was evaluated on the test set. The final models were compared with previous methods.
Fig 2TIDEL II patients in this study.
Inclusion and exclusion criteria.
The categories for each predictive factor used to transform data into categorical data and number of patients.
| Factors | Categories | Patient in TIDEL II | Patient in Saudi Population | ||
|---|---|---|---|---|---|
| No. of patient | Patient % | No. of patient | Patient % | ||
| Young ≤30 | 21 | 12.21% | 36 | 33.03% | |
| Middle Age>30, ≤60 | 104 | 60.47% | 67 | 61.47% | |
| Older>60 | 47 | 27.33% | 6 | 05.50% | |
| Male | 92 | 40.70% | 45 | 41.28% | |
| Female | 118 | 59.30% | 64 | 58.72% | |
| Not palpable ≤1 | 99 | 57.56% | 52 | 47.71% | |
| Small >1, ≤10 | 44 | 25.58% | 34 | 31.19% | |
| Large >10 | 28 | 16.28% | 23 | 21.10% | |
| b2a2 | 68 | 39.53% | None | None | |
| b3a2 | 68 | 39.53% | None | None | |
| Both | 34 | 19.77% | None | None | |
| e1a2 | 2 | 1.16% | None | None | |
| Low ≤4 | 80 | 46.51% | None | None | |
| Standard >4 | 87 | 50.58% | None | None | |
| Group 1 ≤0.5 | 19 | 11.05% | None | None | |
| Group 2 >0.5 ≤0.7 | 31 | 18.02% | None | None | |
| Group 3 >0.7 ≤0.95 | 31 | 18.02% | None | None | |
| Group 4 >0.95 | 79 | 45.93% | None | None | |
| Low ≤20 | 8 | 4.65% | None | None | |
| Moderate>20, ≤100 | 96 | 55.81% | None | None | |
| High>100 | 66 | 38.37% | None | None | |
| Low <1.8 | 3 | 1.74% | None | None | |
| Normal ≥1.8, ≤7.5 | 35 | 20.35% | None | None | |
| High >7.5, ≤ 50 | 97 | 56.40% | None | None | |
| Very High >50 | 36 | 20.93% | None | None | |
| Low <0.2 | 16 | 9.30% | 12 | 11.01% | |
| Normal ≥0.2, ≤0.8 | 46 | 26.74% | 23 | 21.10% | |
| High >0.8 | 109 | 63.37% | 74 | 67.89% | |
| Low <1 | 12 | 6.98% | None | None | |
| Normal ≥1, ≤3.5 | 98 | 56.98% | None | None | |
| High >3.5 | 61 | 35.47% | None | None | |
| Normal ≤0.1 | 28 | 16.28% | 30 | 27.52% | |
| High >0.1, ≤1 | 76 | 44.19% | 10 | 9.17% | |
| Very High >1 | 67 | 38.95% | 69 | 63.30% | |
| Normal ≤0.5 | 91 | 52.91% | 28 | 25.69% | |
| High >0.5 | 79 | 45.93% | 81 | 74.31% | |
| Low <10 | 29 | 16.86% | None | None | |
| Normal >10, <100 | 122 | 70.93% | None | None | |
| High >100 | 20 | 11.63% | None | None | |
| Normal >0, ≤5 | 154 | 89.53% | 99 | 90.83 | |
| High >5 | 13 | 7.56% | 10 | 9.17% | |
| Low ≥20, <150 | 3 | 1.74% | 8 | 7.34% | |
| Normal >150, ≤400 | 91 | 52.91% | 53 | 48.62% | |
| High >400 | 75 | 43.60% | 48 | 44.04% | |
| Low, intermediate≤1.2 | 132 | 79.51% | 92 | 84.40% | |
| High>1.2 | 34 | 20.48% | 17 | 15.60% | |
| Low, Intermediate<1480 | 154 | 93.9% | 101 | 92.66% | |
| High≥1481 | 10 | 6% | 8 | 7.34% | |
| Low<87 | 140 | 83.33% | 92 | 84.40% | |
| High≥87 | 28 | 16.67% | 17 | 15.60% | |
Each predictive factor and the number of CML patients included in the study. Haematologist experts and previous publications identified the categories.
Predictive performance of feature selection methods with the machine-learning technique from nested cross-validation methods, wrapper method used the highest cross validation performances for selecting the final models.
| Feature selection approaches | Model name | Features | Training Performance | Cross-validation Performance | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | PPV | NPV | G-mean | F-score | Accuracy | Sensitivity | Specificity | PPV | NPV | G-mean | F-score | |||
| 1 | 0.81 | 0.83 | 0.77 | 0.86 | 0.72 | 0.80 | 0.84 | 0.51(0.43,0.59) | 0.60 (0.53,0.67) | 0.31(0.17,0.45) | 0.63(0.49,0.77) | 0.33(0.15,0.51) | 0.37(0.20,0.54) | 0.61(0.46,0.76) | ||
| 8,9 | 0.64 | 0.67 | 0.54 | 0.80 | 0.38 | 0.60 | 0.72 | 0.57(0.5,0.64) | 0.62 (0.55,0.69) | 0.42(0.23,0.61) | 0.75(0.63,0.87) | 0.28 (0.15,0.41) | 0.46(0.29,0.63) | 0.67(0.55,0.79) | ||
| 8,9,10,16 | 0.70 | 0.73 | 0.64 | 0.81 | 0.53 | 0.68 | 0.76 | 0.50(0.38, 0.62) | 0.60 (0.48,0.72) | 0.31(0.17,0.45) | 0.59(0.43,0.75) | 0.36(0.16,0.56) | 0.38(0.21,0.55) | 0.59(0.43,0.75) | ||
| 2,3,7,13,15 | 0.78 | 0.77 | 0.81 | 0.92 | 0.57 | 0.79 | 0.83 | 0.78 (0.71,0.85) | 0.71(0.51,0.91) | 0.88(0.78,0.98) | 0.57(0.39,0.75) | 0.71(0.54,0.88) | 0.82(0.65,0.99) | |||
| 2,3,6,7,8,10,15,16 | 0.77 | 0.77 | 0.75 | 0.88 | 0.59 | 0.76 | 0.82 | 0.75(0.69,0.81) | 0.79(0.72,0.86) | 0.75(0.61,0.89) | 0.85 (0.76,0.94) | 0.59(0.42,0.76) | 0.81(0.69,0.93) | |||
| 3,7,8,15 | 0.73 | 0.72 | 0.82 | 0.94 | 0.40 | 0.77 | 0.81 | 0.76 (0.68,0.84) | 0.78(0.71,0.85) | 0.71(0.51,0.91) | 0.88(0.78,0.98) | 0.57(0.39,0.75) | 0.71(0.54,0.88) | |||
The features indexes are: 1 = all feature, 2 = Age, 3 = Spleen Size, 4 = Platelets, 5 = Basophils, 6 = Eosinophils, 7 = Blast, 8 = OA, 9 = IC50IM, 10 = BCR-ABL1 transcript level pre therapy, 11 = WCC, 12 = ANC, 13 = Monocytes, 14 = Lymphocytes, 15 = Gender, and 16 = BCR-ABL1 Transcript type. For each model the table gives the training and Cross-validation Performances. In cross validation performance, the means obtained from 10-fold cross validation and 95% confidence intervals.
The comparison between previous methods and, our predictive models.
| Testing Performance | ||||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | PPV | NPV | G-mean | F-Score | ||
| OA[ | 0.73 | 0.62 | ||||||
| IC50IM[ | 0.54 | 0.34 | 0.60 | 0.51 | 0.50 | 0.43 | ||
| Sokal score[ | 0.58 | 0.84 | 0.29 | 0.56 | 0.63 | 0.49 | 0.67 | |
| Hasford Score[ | 0.56 | 0.16 | 0.54 | 0.66 | 0.39 | 0.68 | ||
| EUTOS Score[ | 0.52 | 0.84 | 0.16 | 0.52 | 0.50 | 0.37 | 0.64 | |
| Model A | 0.60 | 0.59 | 0.61 | 0.45 | 0.60 | 0.65 | ||
| Model B | 0.62 | 0.59 | 0.69 | 0.37 | 0.64 | 0.69 | ||
| Model C | 0.58 | 0.56 | 0.61 | 0.33 | 0.59 | 0.65 | ||
| Model D | ||||||||
| Model E | 0.66 | 0.62 | 0.73 | 0.45 | ||||
| Model F | 0.64 | 0.59 | 0.29 | |||||
The bolded value indicated the comparative values between our methods and the previous methods, Model A = all predictive factors, Model B = OA and IC50IM, Model C = OA, IC50IM, BCR-ABL1 Transcript level Pretherapy and BCR-ABL1 Transcript Type, Model D = CART algorithm with the highest accuracy value, Model E = CART algorithm with the highest G-mean value, and Model F = CART algorithm with the highest F-score value.
Fig 3Tree structures for a) Model B and b) Model C. Relations between molecular predictive factors and MMR. The tree represents each predictive factor in nodes. The node has two possible splits: it is connected to either the second predictive factor or the MMR group of patients in the positive or negative group. This graphical structure illustrates the predictive rules. Predictive rules can be used on unseen data to predict the target. A predictive rule of the form: IF (conditions) THEN (class) is equivalent to a path from the root node to leaf in the decision tree, (Yes: achieve MMR at 24 months) and (No: did not achieve MMR at 24 months).
Fig 4Model D structure.
The final model in the tree graph that achieved high accuracy performance.
The comparison between previous methods and our recommended model on Saudi population.
| Validation Performance | |||||||
|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | PPV | NPV | Gmean | F-score | |
| 0.63 | 0.83 | 0.13 | 0.71 | 0.24 | 0.33 | 0.17 | |
| 0.92 | 0.06 | 0.71 | 0.25 | 0.24 | 0.1 | ||
| 0.67 | 0.86 | 0.19 | 0.41 | 0.25 | |||
| 0.50 | 0.55 | 0.68 | 0.24 | ||||