| Literature DB >> 26278466 |
Chintan Parmar1,2,3, Patrick Grossmann1,4, Johan Bussink5, Philippe Lambin2, Hugo J W L Aerts1,6,4.
Abstract
Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics. Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care. In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival. A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients. To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used. Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients). We identified that Wilcoxon test based feature selection method WLCX (stability = 0.84 ± 0.05, AUC = 0.65 ± 0.02) and a classification method random forest RF (RSD = 3.52%, AUC = 0.66 ± 0.03) had highest prognostic performance with high stability against data perturbation. Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (34.21% of total variance). Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26278466 PMCID: PMC4538374 DOI: 10.1038/srep13087
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Table defining the acronyms related to the used feature selection and classification methods.
| Classification method acronym | Classification method name | Feature Selection method acronym | Feature selection method name |
|---|---|---|---|
| Nnet | Neural network | RELF | Relief |
| DT | Decision Tree | FSCR | Fisher score |
| BST | Boosting | GINI | Gini index |
| BY | Bayesian | CHSQ | Chi-square score |
| BAG | Bagging | JMI | Joint mutual information |
| RF | Random Forset | CIFE | Conditional infomax feature extraction |
| MARS | Multi adaptive regression splines | DISR | Double input symmetric relevance |
| SVM | Support vector machines | MIM | Mutual information maximization |
| DA | Discriminant analysis | CMIM | Conditional mutual information maximization |
| NN | Neirest neighbour | ICAP | Interaction capping |
| GLM | Generalized linear models | TSCR | T-test score |
| PLSR | Partial least squares and prinicipal componenet regression | MRMR | Minimum redundancy maximum relevance |
| — | — | MIFS | Mutual information feature selection |
| — | — | WLCX | Wilcoxon |
Figure 1A total of 440 radiomic features were extracted from the segmented tumor regions of the pre-treatment CT images of 464 NSCLC patients.
Feature selection and classification training was done using the training cohort Lung1 (n = 310), whereas Lung2 (n = 154) cohort was used as a validation cohort.
Figure 2Heatmap depicting the predictive performance (AUC) of feature selection (in rows) and classification (in columns) methods.
It can be observed that RF, BAG and BY classification methods and feature selection methods WLCX, MRMR and MIFS shows relatively high predictive performance in many cases.
Table describing the median values of AUC and stability for different Classification and Feature Selection methods.
| Classification method | AUC | RSD % | Feature Selection method | AUC | Stability |
|---|---|---|---|---|---|
| Nnet | 0.57 ± 0.04 | 6.41 | RELF | 0.61 ± 0.04 | 0.91 ± 0.05 |
| DT | 0.54 ± 0.04 | 7.89 | FSCR | 0.62 ± 0.04 | 0.78 ± 0.08 |
| BST | 0.58 ± 0.04 | 8.23 | GINI | 0.62 ± 0.04 | 0.68 ± 0.10 |
| BY | 0.64 ± 0.05 | 0.86 | CHSQ | 0.60 ± 0.04 | 0.69 ± 0.09 |
| BAG | 0.64 ± 0.03 | 5.56 | JMI | 0.61 ± 0.04 | 0.68 ± 0.05 |
| RF | 0.66 ± 0.03 | 3.52 | CIFE | 0.60 ± 0.03 | 0.69 ± 0.05 |
| MARS | 0.61 ± 0.03 | 6.98 | DISR | 0.62 ± 0.05 | 0.69 ± 0.05 |
| SVM | 0.61 ± 0.03 | 6.39 | MIM | 0.61 ± 0.04 | 0.94 ± 0.02 |
| DA | 0.61 ± 0.02 | 6.37 | CMIM | 0.62 ± 0.04 | 0.73 ± 0.04 |
| NN | 0.61 ± 0.02 | 4.08 | ICAP | 0.61 ± 0.03 | 0.72 ± 0.04 |
| GLM | 0.63 ± 0.02 | 2.19 | TSCR | 0.61 ± 0.02 | 0.78 ± 0.12 |
| PLSR | 0.63 ± 0.02 | 2.24 | MRMR | 0.63 ± 0.06 | 0.74 ± 0.03 |
| — | — | — | MIFS | 0.63 ± 0.06 | 0.8 ± 0.03 |
| — | — | — | WLCX | 0.65 ± 0.02 | 0.84 ± 0.05 |
Figure 3Scatterplots between the stability and predictive performance (AUC) of feature selection (FS) (Left) and classification methods (CF) (right).
Feature selection methods having stability ≥0.735 (median stability of FS) and AUC ≥ 0.615 (median AUC of FS) are considered as highly reliable and predictive methods. Similarly, classification methods having RSD ≤ 5.97 (median RSD of CF) and AUC ≥ 0.61 (median AUC of CF) are considered as highly reliable and accurate ones. Highly reliable and predictive methods are displayed in a gray square region.
Figure 4Variation of AUC explained by the experimental factors and their interactions.
It can be observed that classification method was the most dominant source of variability. Size of the selected (representative) feature subset shared the least of the total variance.