| Literature DB >> 34976794 |
Mehdi Astaraki1,2, Guang Yang3,4, Yousuf Zakko5, Iuliana Toma-Dasu2,6, Örjan Smedby1, Chunliang Wang1.
Abstract
OBJECTIVES: Both radiomics and deep learning methods have shown great promise in predicting lesion malignancy in various image-based oncology studies. However, it is still unclear which method to choose for a specific clinical problem given the access to the same amount of training data. In this study, we try to compare the performance of a series of carefully selected conventional radiomics methods, end-to-end deep learning models, and deep-feature based radiomics pipelines for pulmonary nodule malignancy prediction on an open database that consists of 1297 manually delineated lung nodules.Entities:
Keywords: benign-malignant classification; deep classifier; lung cancer prediction; lung nodule; radiomics
Year: 2021 PMID: 34976794 PMCID: PMC8718670 DOI: 10.3389/fonc.2021.737368
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Illustration of (A) benign and (B) malignant pulmonary nodules in chest LDCT scans. The manually identified nodules are highlighted with yellow contours. The examples show that benign and malignant pulmonary nodules present similar visual characteristics. The cropped patches around the nodules (context images) provide the relative location of the nodules with respect to nearby structures.
Figure 2Graphical demonstration of the study pipeline. To predict lung nodule malignancy, three modules were studied. In the CNN module (red color), deep networks were trained with context and target nodule images separately and simultaneously. In the Radiomics module (blue color), handcrafted features were extracted from both target and context nodule images to train the learning algorithms. In the hybrid module (green color), extracted radiomic features were combined with learned deep features to form the hybrid sets.
The prediction power of the radiomic features extracted from target nodule images with different learning algorithms and feature selection methods over the balanced dataset.
| Learning Algorithm | Target Radiomic Prediction Performance (AUROC) | |||||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
|
|
|
|
|
|
|
| |
| Adab |
|
|
|
| 0.671 ± 0.019 | 0.531 ± 0.037 |
|
|
| DT | 0.723 ± 0.011 | 0.733 ± 0.027 | 0.703 ± 0.019 | 0.711 ± 0.028 | 0.642 ± 0.013 | 0.523 ± 0.024 | 0.712 ± 0.026 | 0.730 ± 0.032 |
| RF | 0.871 ± 0.008 | 0.849 ± 0.025 | 0.846 ± 0.028 | 0.856 ± 0.023 |
| 0.517 ± 0.031 | 0.862 ± 0.026 | 0.891 ± 0.011 |
| KNN | 0.850 ± 0.016 | 0.846 ± 0.016 | 0.807 ± 0.036 | 0.833 ± 0.017 | 0.735 ± 0.021 | 0.671 ± 0.089 | 0.846 ± 0.017 | 0.870 ± 0.023 |
| SVM | 0.777 ± 0.011 | 0.774 ± 0.029 | 0.752 ± 0.025 | 0.775 ± 0.027 | 0.751 ± 0.020 | 0.522 ± 0.040 | 0.775 ± 0.028 | 0.802 ± 0.008 |
| LDA | 0.655 ± 0.045 | 0.680 ± 0.032 | 0.785 ± 0.017 | 0.75 ± 0.027 | 0.741 ± 0.031 | 0.735 ± 0.018 | 0.771 ± 0.028 | 0.796 ± 0.011 |
| QDA | 0.778 ± 0.172 | 0.696 ± 0.181 | 0.747 ± 0.016 | 0.738 ± 0.024 | 0.753 ± 0.031 |
| 0.752 ± 0.026 | 0.865 ± 0.006 |
| Naive | 0.763 ± 0.006 | 0.759 ± 0.023 | 0.742 ± 0.030 | 0.731 ± 0.022 | 0.756 ± 0.034 | 0.583 ± 0.046 | 0.739 ± 0.024 | 0.808 ± 0.010 |
For each feature selection algorithm, the highest value is marked in bold.
The prediction power of the joint context and target radiomic with different learning algorithms and feature selection methods over the balanced dataset.
| Learning Algorithm | Combined Radiomic Prediction Performance (AUROC) | |||||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
|
|
|
|
|
|
|
| |
| Adab |
|
|
|
| 0.570 ± 0.043 | 0.676 ± 0.005 | 0.876 ± 0.014 |
|
| DT | 0.739 ± 0.032 | 0.699 ± 0.018 | 0.728 ± 0.011 | 0.720 ± 0.014 | 0.568 ± 0.030 | 0.594 ± 0.020 | 0.702 ± 0.016 | 0.772 ± 0.013 |
| RF | 0.897 ± 0.016 | 0.858 ± 0.032 | 0.877 ± 0.019 | 0.865 ± 0.028 | 0.620 ± 0.045 | 0.621 ± 0.021 |
| 0.910 ± 0.008 |
| KNN | 0.872 ± 0.014 | 0.844 ± 0.014 | 0.804 ± 0.013 | 0.860 ± 0.007 | 0.650 ± 0.018 | 0.604 ± 0.029 | 0.848 ± 0.012 | 0.816 ± 0.028 |
| SVM | 0.756 ± 0.023 | 0.711 ± 0.019 | 0.709 ± 0.035 | 0.722 ± 0.022 | 0.625 ± 0.038 | 0.574 ± 0.574 | 0.724 ± 0.022 | 0.818 ± 0.022 |
| LDA | 0.711 ± 0.027 | 0.726 ± 0.012 | 0.802 ± 0.023 | 0.759 ± 0.011 | 0.730 ± 0.019 | 0.647 ± 0.032 | 0.768 ± 0.012 | 0.827 ± 0.020 |
| QDA | 0.862 ± 0.014 | 0.711 ± 0.015 | 0.827 ± 0.028 | 0.766 ± 0.020 |
|
| 0.744 ± 0.010 | 0.887 ± 0.015 |
| Naive | 0.783 ± 0.023 | 0.702 ± 0.020 | 0.741 ± 0.019 | 0.731 ± 0.024 | 0.628 ± 0.020 | 0.544 ± 0.013 | 0.736 ± 0.021 | 0.825 ± 0.021 |
For each feature selection algorithm, the highest value is marked in bold.
The prediction power of the radiomic features extracted from context nodule images with different learning algorithms and feature selection methods over the balanced dataset.
| Learning Algorithm | Context Radiomic Prediction Performance (AUROC) | |||||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
|
|
|
|
|
|
|
| |
| Adab |
|
|
|
| 0.580 ± 0.017 | 0.643 ± 0.078 |
|
|
| DT | 0.718 ± 0.011 | 0.697 ± 0.025 | 0.695 ± 0.015 | 0.702 ± 0.032 | 0.571 ± 0.021 | 0.550 ± 0.039 | 0.697 ± 0.031 | 0.744 ± 0.027 |
| RF | 0.881 ± 0.008 | 0.843 ± 0.024 | 0.855 ± 0.009 | 0.864 ± 0.011 | 0.645 ± 0.025 | 0.613 ± 0.044 | 0.845 ± 0.007 | 0.901 ± 0.014 |
| KNN | 0.852 ± 0.007 | 0.824 ± 0.019 | 0.811 ± 0.010 | 0.843 ± 0.021 | 0.625 ± 0.015 | 0.590 ± 0.023 | 0.827 ± 0.019 | 0.779 ± 0.029 |
| SVM | 0.777 ± 0.012 | 0.757 ± 0.010 | 0.689 ± 0.010 | 0.716 ± 0.008 | 0.685 ± 0.012 | 0.571 ± 0.020 | 0.715 ± 0.009 | 0.817 ± 0.023 |
| LDA | 0.682 ± 0.040 | 0.727 ± 0.018 | 0.774 ± 0.014 | 0.758 ± 0.017 | 0.743 ± 0.017 | 0.746 ± 0.022 | 0.751 ± 0.016 | 0.842 ± 0.027 |
| QDA | 0.841 ± 0.013 | 0.705 ± 0.033 | 0.777 ± 0.032 | 0.751 ± 0.025 |
|
| 0.739 ± 0.024 | 0.872 ± 0.010 |
| Naive | 0.767 ± 0.014 | 0.690 ± 0.006 | 0.757 ± 0.029 | 0.745 ± 0.009 | 0.682 ± 0.010 | 0.609 ± 0.038 | 0.728 ± 0.020 | 0.820 ± 0.013 |
For each feature selection algorithm, the highest value is marked in bold.
The prediction power of the deep learning-based analyses.
| Feature Type | Deep Features Prediction Performance (AUROC) | |
|---|---|---|
|
|
| |
| Target nodule | 0.801 [0.777,0.824] | 0.906 [0.890,0.921] |
| Context nodule | 0.806 [0.788,0.827] | 0.927 [0.912,0.940] |
| Combined | 0.824 [0.798,0.837] | 0.936 [0.921,0.950] |
The combined model refers to a dual-pathway network that was fed by context and target nodule images simultaneously. Lower and upper limits of confidence interval at 95% level are indicated in square brackets.
Figure 3The prediction power of the hybrid model: combinations of deep and radiomic features.
Figure 4Effect of training size on the prediction power of the hybrid feature sets.
Comparing the prediction power of the employed methods.
| Prediction Performance Comparison (AUROC) | |||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| Before | No | 0.792 ± 0.025 | 0.801 [0.777,0.824] | 0.753 [0.743,0.775] | 0.817 ± 0.032 |
| After | No | 0.911 ± 0.016 | – | 0.906 [0.890,0.921] | 0.914 ± 0.015 |
| Before | Yes | 0.777 ± 0.017 | 0.806 [0.788,0.827] | 0.761 [0.736,0.779] | 0.780 ± 0.022 |
| After | Yes | 0.916 ± 0.011 | 0.824 [0.798,0.837] | 0.927 [0.912,0.940] | 0.929 ± 0.013 |
Note that the “best end-to-end deep learning model” column presents the performance of two single pathway models trained with target and context nodule images separately and one dual pathway model trained with both target and context images simultaneously. Lower and upper limits of confidence interval at 95% level are indicated in square brackets.