| Literature DB >> 31101024 |
Lingming Yu1, Guangyu Tao1, Lei Zhu2, Gang Wang3, Ziming Li4, Jianding Ye5, Qunhui Chen6.
Abstract
PURPOSE: To explore imaging biomarkers that can be used for diagnosis and prediction of pathologic stage in non-small cell lung cancer (NSCLC) using multiple machine learning algorithms based on CT image feature analysis.Entities:
Keywords: Computed tomography (CT); Machine learning algorithm; Non-small cell lung cancer (NSCLC); Radiomics
Mesh:
Year: 2019 PMID: 31101024 PMCID: PMC6525347 DOI: 10.1186/s12885-019-5646-9
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Patient and tumor characteristics in the training and validation sets
| NSCLC ( | TCGA-LUAD ( | TCGA-LUSC ( | ||
|---|---|---|---|---|
| sex | ||||
| male | 58 | 9 | 19 | 0.033 |
| female | 29 | 15 | 15 | |
| Overall stage | ||||
| IA | 14 | 5 | 3 | 0.513 |
| IB | 28 | 5 | 10 | |
| IIA | 5 | 2 | 5 | |
| IIB | 22 | 3 | 9 | |
| IIIA | 10 | 7 | 4 | |
| IIIB | 3 | 1 | 2 | |
| IV | 5 | 1 | 1 | |
Fig. 1Performance assessment of prediction model in training and testing sets. a The imbalanced class distribution of NSCLC samples. b The final class distribution of NSCLC samples after equilibrium processing. c Confusion matrix was used to examine whether there is a consistency between the actual and the predicted results in NSCLC cohort. d Receiver operating characteristic (ROC) curve analysis for the prediction of the pathologic stages in NSCLC cohort. The corresponding reference groups are all the other stages patients. e Average precision score of prediction model in NSCLC cohort, micro-average over all classes: AP = 0.60. f Extension of precision-recall curve to multi-classes in NSCLC cohort
Feature importance
| Feature | Importance |
|---|---|
| wavelet-HHH_firstorder_RootMeanSquared | 0.007136 |
| log-sigma-2-0-mm-3D_firstorder_RootMeanSquared | 0.006829 |
| wavelet-HHL_glcm_InverseVariance | 0.006782 |
| wavelet-HHL_glcm_Idn | 0.006155 |
| wavelet-HHL_firstorder_Variance | 0.005531 |
| wavelet-HHL_glszm_SmallAreaHighGrayLevelEmphasis | 0.00533 |
| wavelet-HHL_glcm_InverseVariance | 0.005291 |
| wavelet-HHL_glcm_Imc1 | 0.005072 |
| wavelet-HHL_glrlm_LongRunLowGrayLevelEmphasis | 0.005063 |
| wavelet-HHL_glcm_Idmn | 0.004948 |
| wavelet-HHL_glrlm_GrayLevelVariance | 0.004713 |
| wavelet-HHL_glszm_LargeAreaLowGrayLevelEmphasis | 0.004235 |
| wavelet-HHL_glcm_Idm | 0.004189 |
| wavelet-HHL_glrlm_ShortRunHighGrayLevelEmphasis | 0.004122 |
| wavelet-LLL_glrlm_LongRunHighGrayLevelEmphasis | 0.003965 |
| wavelet-HLH_glcm_JointEnergy | 0.003955 |
| wavelet-HHL_gldm_LargeDependenceEmphasis | 0.003925 |
| original_glszm_ZoneVariance | 0.003886 |
| log-sigma-2-0-mm-3D_glcm_ClusterProminence | 0.003725 |
| wavelet-HHL_firstorder_Median | 0.003717 |
| wavelet-HHL_gldm_SmallDependenceHighGrayLevelEmphasis | 0.003683 |
| wavelet-HHL_glrlm_LongRunHighGrayLevelEmphasis | 0.003615 |
| wavelet-HHL_glcm_DifferenceVariance | 0.003579 |
| log-sigma-4-0-mm-3D_glszm_GrayLevelNonUniformity | 0.003525 |
| wavelet-LLH_firstorder_RootMeanSquared | 0.003449 |
| wavelet-LLL_glszm_SizeZoneNonUniformityNormalized | 0.003391 |
| wavelet-HLL_glszm_GrayLevelVariance | 0.003327 |
| log-sigma-4-0-mm-3D_glrlm_ShortRunEmphasis | 0.003289 |
Fig. 2Performance assessment of prediction model in training/testing sets in binarized predictive scenario. a The imbalanced class distribution of NSCLC samples. b The final class distribution of NSCLC samples after equilibrium processing. c Receiver operating characteristic (ROC) curve analysis for the prediction of the pathologic stages in NSCLC cohort. d. Confusion matrix was used to examine whether there is a consistency between the actual and the predicted results in NSCLC cohort. e Precision-recall curve in NSCLC cohort. f Average precision score of prediction model in NSCLC cohort
Fig. 3Performance assessment of prediction model in validation sets. a The class distribution of samples in LUAD dataset. b The class distribution of samples in LUSC dataset. c and d. Confusion matrix was used to determine whether there is a consistency between the actual and the predicted results in LUAD (c) and LUSC (d). e and f) Receiver operating characteristic (ROC) curve analysis for the prediction of the pathologic stages in LUAD (e) and LUSC (f). The corresponding reference groups are all the other stages patients
Fig. 4Performance assessment of prediction model in validation sets. in binarized predictive scenario. a & c Confusion matrix was used to determine whether there is a consistency between the actual and the predicted results in LUAD (up) and LUSC (down). b & d Receiver operating characteristic (ROC) curve analysis for the prediction of the pathologic stages in LUAD (up) and LUSC (down). e Average precision score of prediction model in LUAD. f Average precision score of prediction model in LUSC
Fig. 5Performance assessment of prediction model in validation sets in terms of precision and recall score. a Average precision score of prediction model in LUAD, micro-average over all classes: AP = 0.84. b Extension of precision-recall curve to multi-classes in LUAD. The corresponding reference groups are all the other stages patients. c Average precision score of prediction model in LUSC, micro-average over all classes: AP = 0.62. d Extension of precision-recall curve to multi-classes in LUSC. The corresponding reference groups are all the other stages patients