| Literature DB >> 35986072 |
Anna Braghetto1,2, Francesca Marturano3, Marta Paiusco3, Marco Baiesi4,5, Andrea Bettinelli3,6.
Abstract
In this study, we tested and compared radiomics and deep learning-based approaches on the public LUNG1 dataset, for the prediction of 2-year overall survival (OS) in non-small cell lung cancer patients. Radiomic features were extracted from the gross tumor volume using Pyradiomics, while deep features were extracted from bi-dimensional tumor slices by convolutional autoencoder. Both radiomic and deep features were fed to 24 different pipelines formed by the combination of four feature selection/reduction methods and six classifiers. Direct classification through convolutional neural networks (CNNs) was also performed. Each approach was investigated with and without the inclusion of clinical parameters. The maximum area under the receiver operating characteristic on the test set improved from 0.59, obtained for the baseline clinical model, to 0.67 ± 0.03, 0.63 ± 0.03 and 0.67 ± 0.02 for models based on radiomic features, deep features, and their combination, and to 0.64 ± 0.04 for direct CNN classification. Despite the high number of pipelines and approaches tested, results were comparable and in line with previous works, hence confirming that it is challenging to extract further imaging-based information from the LUNG1 dataset for the prediction of 2-year OS.Entities:
Mesh:
Year: 2022 PMID: 35986072 PMCID: PMC9391464 DOI: 10.1038/s41598-022-18085-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Summary of previous studies based on LUNG1 public dataset.
| Work | Aim | Approach | Lung dataset | Methods | Results | Conclusions | RQS (%) |
|---|---|---|---|---|---|---|---|
| Parmar et al., 2015[ | Prediction of 2-year OS | Radiomic features | • LUNG1 for training • LUNG2 for validation | Different feature selection methods and ML classifiers | Highest average AUC for Wilcoxon test based feature selection method (AUC = 0.65) and a random forest classifier (AUC = 0.66) | The choice of classification method is the most dominant source of performance variation | 31 |
| Parmar et al., 2015[ | Investigation of the clinical relevance of radiomic clusters | Radiomic features | • LUNG1 for training • LUNG2 for validation | • Cluster analysis • Cox Proportional Hazards model on cluster centroids | • All lung clusters were significantly associated to survival • AUC = 0.64 for tumour histology and tumour stage prediction | Clustering and prognostic characteristics of radiomic features are cancer-specific | 22 |
| Wu et al., 2016[ | Classification of tumour histologic subtypes | Radiomic features | • LUNG1 for training • LUNG2 for validation | Different feature selection methods and ML classifiers | Naive Bayes classifier combined with ReliefF achieved the highest AUC of 0.72 | Radiomic features show significant association with the lung tumour histology | 28 |
| Lambrecht et al., 2017[ | Classification of: T-stage, overall-stage, N-stage, M-stage, histology; Prediction of survival time | Radiomic features | LUNG1 for training and validation | K-means clustering Random Forest and Neural Networks | Neural networks achieved the highest ACC of 63.9% | Results are highly dependent on the choice of the clinical outcome to predict | 33 |
| Chaddad et al., 2017[ | Prediction of the survival outcome for different cancer subtype and stage groups | Radiomic features | LUNG1 for training and validation | Random Forest classifier | Highest AUC of 0.76 for the TNM | Radiomic features can be used as indicators of survival for large-cell carcinoma patients with primary tumour size and no lymph-node metastasis | 39 |
| Haarburger et al., 2018[ | Prediction of survival outcome | • Radiomic & deep features + ML • Direct CNN prediction | LUNG1 for training and validation | • Cox Proportional Hazards model • CNN Hazard model | • C-index of 0.623 for model fitted with selected radiomic and CNN features • C-index of 0.585 for CNN direct hazard prediction | Cox models with radiomics and deep features outperform CNNs with concatenated radiomics features | 31 |
| Shi et al., 2019[ | Prediction of 2-year OS and survival outcome with Aerts radiomic signature | Radiomic features | • LUNG1 for training • LUNG2 for validation | Multivariable logistic regression and Cox Proportional Hazards model | AUC of 0.61 and Harrell C-index of 0.58 on LUNG2 dataset | External validation of radiomic models can be done with decentralized data without exchanging patients’ sensitive data | 28 |
| Welch et al., 2019[ | Prediction of survival outcome depending on several factors using Aerts radiomic signature | Radiomic features | • LUNG1 for training • H&N1 for validation | Cox Proportional Hazards model | • C-index of 0.64 on H&N1 external dataset • Tumour volume was as prognostic as the radiomic signature in H&N1 (C-index = 0.64) | The radiomic signature was a surrogate for tumour volume | 28 |
| Haarburger et al., 2020[ | Testing of the Aerts radiomic signature for different GTV delineations | Radiomic features | LUNG1 for training and validation | • ICC for feature stability • Cox Proportional Hazards model | • 28.7% of all features had an ICC < 0.9 • C-indices of Cox models varied between 0.57 and 0.58 | Features are subject to higher (GLRLM and GLSZM) and lower (shape, GLCM, and NGTDM) variance across delineations | 25 |
| Ubaldi et al., 2021[ | Classification of tumor histology and overall stage (I or II) | Radiomic features | LUNG1 for training and validation on an private dataset and viceversa + Merging of the two datasets | Different feature selection methods and ML classifiers | • AUC = 0.72 for histology classification with merged datasets • AUC = 0.84 when training on LUNG1 and testing on another dataset | Histology classification improved when considering subjects with overall stages I and II, hence reducing the heterogeneity of the sample | 31 |
ACC accuracy, AUC area under the receiver operating characteristic curve, C-index concordance index, CNN convolutional neural network, GLCM grey level co-occurrence matrix, GLRLM grey level run length matrix, GLSZM grey level size zone matrix, ICC intra-class correlation coefficient, ML machine learning, NGTDM neighboring grey tone difference matrix, OS overall survival, RQS radiomic quality score.
Figure 1Representative slices of three LUNG1 patients with superimposed delineation of the GTV (viewing window: [−1000, 400] HU).
Figure 2Flow chart of the analysis for the two different classification approaches. In the figure, the number of tested models for each approach is also visible. A total of 168 different pipelines were tested for the feature-based approach (including clinical, radiomic and deep features only and their combination), while 4 architectures were tested for the CNN-based approach.
Figure 3Results for the clinical feature-based models. (Left panel) Average AUCs on the five training splits. (Right panel) Average AUCs on the test splits. SFM SelectFromModel, PCA principal component analysis, CLUSTER feature agglomeration through clustering, SVM support vector machines, BAG bagging, RF random forest, XGB extreme gradient boosting, NNET neural network, NN k-nearest neighbours.
Figure 4Results for the radiomic feature-based models without the use of clinical data. (Left panel) Average AUCs on the five training splits. (Right panel) Average AUCs on the test splits. SFM SelectFromModel, PCA principal component analysis, CLUSTER feature agglomeration through clustering, SVM support vector machines, BAG bagging, RF random forest, XGB extreme gradient boosting, NNET neural network, NN k-nearest neighbours.
Figure 5Results for the deep feature-based models without clinical data. (Left panel) Average AUCs on the five training splits. (Right panel) Average AUCs on the test splits. SFM SelectFromModel, PCA principal component analysis, CLUSTER feature agglomeration through clustering, SVM support vector machines, BAG bagging, RF random forest, XGB extreme gradient boosting, NNET neural network, NN k-nearest neighbours.
Figure 6Results for the radiomic and deep feature-based models without including clinical data. (Left panel) Average AUCs on the five training splits. (Right panel) Average AUCs on the test splits. SFM SelectFromModel, PCA principal component analysis, CLUSTER feature agglomeration through clustering, SVM support vector machines, BAG bagging, RF random forest, XGB extreme gradient boosting, NNET neural network, NN k-nearest neighbours.