| Literature DB >> 34944830 |
Sébastien Benzekry1, Mathieu Grangeon2, Mélanie Karlsen1, Maria Alexa1, Isabella Bicalho-Frazeto1, Solène Chaleat2, Pascale Tomasini2,3, Dominique Barbolosi1, Fabrice Barlesi3,4, Laurent Greillier1,2.
Abstract
BACKGROUND: Immune checkpoint inhibitors (ICIs) are now a therapeutic standard in advanced non-small cell lung cancer (NSCLC), but strong predictive markers for ICIs efficacy are still lacking. We evaluated machine learning models built on simple clinical and biological data to individually predict response to ICIs.Entities:
Keywords: blood counts; lung cancer; machine learning; prediction; response; survival
Year: 2021 PMID: 34944830 PMCID: PMC8699503 DOI: 10.3390/cancers13246210
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Patients and disease characteristics.
| Variable | N = 298 1 |
|---|---|
|
| 62 (55, 69) |
| Sex | |
| Female | 99 (33%) |
| Male | 199 (67%) |
|
| |
| Former Smoker | 140 (47%) |
| Non smoker | 36 (12%) |
| Smoker | 122 (41%) |
|
| 72 (29%) |
|
| |
| ≥2 | 26 (8.9%) |
| 0–1 | 265 (91%) |
|
| |
| ALK | 1 (0.8%) |
| BRAF | 9 (7.4%) |
| EGFR | 14 (11%) |
| KRAS | 87 (71%) |
| Other mutation | 5 (4.1%) |
| ROS1 | 2 (1.6%) |
| Wild type | 4 (3.3%) |
|
| |
| anti-CTLA-4 | 3 (1.0%) |
| anti-PD-1 | 266 (89%) |
| anti-PD-L1 | 26 (8.7%) |
| Combination | 3 (1.0%) |
|
| |
| Chemotherapy | 281 (95%) |
| Chemotherapy + immunotherapy | 5 (1.7%) |
| Targeted therapy | 11 (3.7%) |
|
| |
| Complete response | 2 (0.7%) |
| Partial response | 44 (15%) |
| Progressive disease | 131 (45%) |
| Stable disease | 113 (39%) |
1 Median (inter-quartile range); n (%).
Figure 1Exploratory data analysis. (A) Boxplots of continuous variables. (B) Barplots of categorical variables. BMI = body mass index, NLR = neutrophil-to-lymphocyte ratio, PLR = platelets-to-lymphocytes ratio, CR = complete response, PR = partial response, SD = stable disease and PD = progressive disease. Stars indicate statistical significance: **: p < 0.01, ***: p < 0.001, ****: p < 0.0001, n.s. = non-significant.
Logistic regression analysis for disease control.
| Univariable Logistic Regression | Multivariable Logistic Regression | |||||
|---|---|---|---|---|---|---|
| Variable | Odds Ratio [95% CI] |
| Signif | Odds Ratio [95% CI] |
| Signif |
| Lymphocytes | 1.1 [0.83, 1.4] | 0.678 | 0.98 [0.15, 5.2] | 0.984 | ||
| NLR | 0.49 [0.31, 0.73] | 0.000879 | *** | 0.68 [0.098, 1.9] | 0.651 | |
| Platelets | 1 [0.82, 1.3] | 0.762 | 1.3 [0.72, 2.4] | 0.404 | ||
| PLR | 0.84 [0.64, 1.1] | 0.156 | 1.1 [0.5, 2.4] | 0.788 | ||
| Leukocytes | 0.68 [0.5, 0.89] | 0.00791 | ** | 0.6 [0.0022, 3 × 102] | 0.847 | |
| Hemoglobin | 1.9 [1.5, 2.5] | 9.26 × 10−7 | *** | 1.8 [1.3, 2.4] | 0.000122 | *** |
| dNLR | 0.63 [0.47, 0.83] | 0.00155 | ** | 0.8 [0.33, 2.7] | 0.689 | |
| Neutrophils | 0.62 [0.45, 0.83] | 0.00232 | ** | 1.5 [0.0047, 2.7 × 102] | 0.863 | |
| Monocytes | 0.87 [0.69, 1.1] | 0.226 | 0.86 [0.5, 1.4] | 0.545 | ||
| Eosinophils | 1.3 [0.97, 1.9] | 0.139 | 1.1 [0.75, 1.8] | 0.582 | ||
| Basophils | 1.2 [0.95, 1.8] | 0.177 | 1.2 [0.89, 1.8] | 0.321 | ||
| BMI | 1.2 [0.95, 1.5] | 0.123 | 1 [0.76, 1.3] | 0.997 | ||
| Performance status | 0.5 [0.39, 0.64] | 6.21 × 10−8 | *** | 0.58 [0.44, 0.75] | 7.79 × 10−5 | *** |
Stars indicate statistical significance: ** : p < 0.01, *** : p < 0.001. CI = confidence interval. signif = significant.
Figure 2Variable selection. (A) Feature importance based on random forest classification and mean decrease in accuracy. (B) Accuracy score of incremental logistic regression models built on an increasing number of predictors (i.e., the first one contains only hemoglobin, the second hemoglobin and NLR, etc.). NLR = neutrophil-to-lymphocyte ratio. PLR = platelet-to-lymphocyte ratio. BMI = body mass index.
Figure 3Machine learning algorithms predictive performances. (A) Receiver-operator curves for prediction on test sets from each fold of the outer cross-validation loop, for each model. AUC = area under the curve. (B) Precision (positive-predictive value)–recall (sensitivity) curves. (C) Main performance metrics for each algorithm. (D) Decision tree obtained after tuning and training. Each node shows: the predicted class (0 = PD, 1 = CR + PR + SD), the predicted probability of response and the percentage of total observations in the node.
Summary of machine learning algorithms predictive performances (mean ± standard deviation, bold entries are maximum values).
| Model | Accuracy | ROC AUC | PPV | NPV | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Random Forest |
|
| 0.70 ± 0.08 |
|
| 0.78 ± 0.06 |
| Logistic Regression | 0.67 ± 0.04 | 0.73 ± 0.03 | 0.69 ± 0.08 | 0.67 ± 0.06 | 0.57 ± 0.09 | 0.77 ± 0.07 |
| Naive Bayes | 0.67 ± 0.04 | 0.73 ± 0.03 |
| 0.65 ± 0.06 | 0.49 ± 0.07 | 0.83 ± 0.05 |
| Single Layer Neural Network | 0.66 ± 0.03 | 0.72 ± 0.03 | 0.69 ± 0.09 | 0.66 ± 0.06 | 0.54 ± 0.09 | 0.78 ± 0.07 |
| k-Nearest Neighbour | 0.66 ± 0.04 | 0.69 ± 0.04 | 0.65 ± 0.07 | 0.66 ± 0.06 | 0.58 ± 0.07 | 0.73 ± 0.07 |
| Linear SVM | 0.58 ± 0.09 | 0.73 ± 0.03 | 0.72 ± 0.09 | 0.58 ± 0.10 | 0.19 ± 0.25 |
|
| Polynomial SVM | 0.55 ± 0.08 | 0.73 ± 0.03 | 0.61 ± 0.13 | 0.58 ± 0.13 | 0.19 ± 0.29 | 0.89 ± 0.23 |
| Radial basis SVM | 0.55 ± 0.08 | 0.73 ± 0.03 | 0.67 ± 0.17 | 0.56 ± 0.06 | 0.20 ± 0.28 | 0.88 ± 0.25 |