| Literature DB >> 28418006 |
Yucheng Zhang1, Anastasia Oikonomou1, Alexander Wong2, Masoom A Haider1, Farzad Khalvati1.
Abstract
Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28418006 PMCID: PMC5394465 DOI: 10.1038/srep46349
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Radiomics Analytics Pipeline.
Summary of radiomics features.
| Feature group | Number of features | Description |
|---|---|---|
| Statistical - First order | 11 | ROI Size (# of pixels), ROI size (mm2), Mean gray level, Standard Deviation, Median gray level, Min ROI, Max ROI, Mean Positive Values, Uniformity, Kurtosis, Skewness |
| Textural - Second order | 19 | Contrast, Energy, Correlation, Homogeneity, Entropy, Normalized Entropy, Variance, Inverse Difference Moment, Sum of Average, Sum of Variance, Sum of Entropy, Difference of Variance, Difference of Entropy, Information Measure of Correlation, Autocorrelation, Dissimilarity, Cluster Shade, Cluster Prominence, Maximum Probability |
Summary of feature selection and classification methods.
| Feature Reduction methods | Abbreviation | Classifiers | Abbreviation |
|---|---|---|---|
| No selection | NON | Random Forest | RF |
| Principal component analysis | PCA | Generalized linear model | GLM |
| Independent component analysis | ICA | Support Vector Machine | SVM |
| Near zero variance | NZV | Naïve Bayes | NB |
| Zero Variance | ZV | Neural network | NNET |
| Consensus Clustering + PCA | CC + PCA | k-nearest neighbor | KNN |
| Wilcoxon | WLCX | Mixture Discriminant Analysis | MDA |
| Partial Least Squares GLM | PLS |
Figure 2Individual results for 3 outcomes (recurrence (REC), Death, and recurrence free survival (RFS)).
Figure 3AUC Variance Analysis.
Figure 4AUCs for Death data using SMOTE subsampling method.
Figure 5AUC for Death data for different sample sizes.