| Literature DB >> 28130689 |
Hongkai Wang1, Zongwei Zhou2, Yingci Li3, Zhonghua Chen1, Peiou Lu3, Wenzhi Wang3, Wanyu Liu4, Lijuan Yu5.
Abstract
BACKGROUND: This study aimed to compare one state-of-the-art deep learning method and four classical machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer (NSCLC) from 18F-FDG PET/CT images. Another objective was to compare the discriminative power of the recently popular PET/CT texture features with the widely used diagnostic features such as tumor size, CT value, SUV, image contrast, and intensity standard deviation. The four classical machine learning methods included random forests, support vector machines, adaptive boosting, and artificial neural network. The deep learning method was the convolutional neural networks (CNN). The five methods were evaluated using 1397 lymph nodes collected from PET/CT images of 168 patients, with corresponding pathology analysis results as gold standard. The comparison was conducted using 10 times 10-fold cross-validation based on the criterion of sensitivity, specificity, accuracy (ACC), and area under the ROC curve (AUC). For each classical method, different input features were compared to select the optimal feature set. Based on the optimal feature set, the classical methods were compared with CNN, as well as with human doctors from our institute.Entities:
Keywords: Computer-aided diagnosis; Deep learning; Machine learning; Non-small cell lung cancer; Positron-emission tomography
Year: 2017 PMID: 28130689 PMCID: PMC5272853 DOI: 10.1186/s13550-017-0260-9
Source DB: PubMed Journal: EJNMMI Res ISSN: 2191-219X Impact factor: 3.138
Patient and lymph node characteristic
| Patients number (male/female/total) | 91/77/168 |
| Patient ages (min/max/median) | 38/81/61 |
| Lymph nodes number (benign/malignant/total) | 1270/127/1397 |
| Lymph nodes short axis diameter (≤2/≤4/≤7/≤10/>10 mm) | 306/816/246/23/6 |
The image features used in this study. For the column of “image modality”, the term “PET/CT” means the feature is calculated for both PET and CT
| Feature | Image modality | Spatial dimension | Definition |
|---|---|---|---|
|
| PET/CT | 1D | Diagnostic feature, maximum short diameter of the axial section |
| Area | PET/CT | 2D | Diagnostic feature, area of the axial section |
| Volume | PET/CT | 3D | Diagnostic feature, volume of the lymph node |
| CTmean | CT | 2D/3D | Diagnostic feature, mean CT value inside the lymph node |
| CTcontrast | CT | 2D/3D | Diagnostic feature, the difference between CTmean and the mean CT value of a 2-mm-thick tissue layer surrounding the lymph node. |
| SUVmean | PET | 2D/3D | Diagnostic feature, mean SUV inside the lymph node |
| SUVmax | PET | 2D/3D | Diagnostic feature, max SUV inside the lymph node |
| SUVstd | PET | 2D/3D | Diagnostic feature, standard deviation of SUV inside the lymph node |
| 1st-order texture features | PET/CT | 3D | Six texture features calculated based on the pixel intensity histogram, see the supplementary material of [ |
| 2nd-order texture features | PET/CT | 2D | Nineteen texture features calculated based on gray-level co-occurrence matrix (GLCM), see the supplementary material of [ |
| High-order texture features | PET/CT | 3D | Five texture features calculated based on neighborhood gray-tone difference matrix (NGTDM) and 11 texture features calculated based on gray-level zone size matrix (GLZSM), see the supplementary material of [ |
For the column of “spatial dimension”, the term “2D/3D” means the feature is calculated for both 2D and 3D images
Performance values of the machine learning methods and human doctors
For each classical method, the results of each feature set are listed. The row of best feature set is marked with gray background
Fig. 1Comparison between different feature sets of the four classical machine learning methods, based on mean AUCs and mean ACCs of the 10 times 10-fold cross-validation. The error bars indicate 95% confidence interval. The p value between different feature sets are plotted as bridge and stars, where two stars means p < 0.05 after both Bonferroni and FDR corrections, and one star means p < 0.05 only after FDR correction
Fig. 2Comparison between different machine learning methods and the human doctors, based on mean AUCs and mean ACCs of the 10 times 10-fold cross-validation. The error bars indicate 95% confidence interval. The p value between different methods are plotted as bridge and stars, where two stars means p < 0.05 after both Bonferroni and FDR corrections, and one star means p < 0.05 only after FDR correction. Human doctor has no AUC value
Fig. 3The average ROC curves of different machine learning methods. The black dot is the performance point of human doctors