| Literature DB >> 35804946 |
Jeong Woo Son1, Ji Young Hong2, Yoon Kim1,3, Woo Jin Kim4, Dae-Yong Shin5, Hyun-Soo Choi1,3, So Hyeon Bak6, Kyoung Min Moon7.
Abstract
Early detection of lung nodules is essential for preventing lung cancer. However, the number of radiologists who can diagnose lung nodules is limited, and considerable effort and time are required. To address this problem, researchers are investigating the automation of deep-learning-based lung nodule detection. However, deep learning requires large amounts of data, which can be difficult to collect. Therefore, data collection should be optimized to facilitate experiments at the beginning of lung nodule detection studies. We collected chest computed tomography scans from 515 patients with lung nodules from three hospitals and high-quality lung nodule annotations reviewed by radiologists. We conducted several experiments using the collected datasets and publicly available data from LUNA16. The object detection model, YOLOX was used in the lung nodule detection experiment. Similar or better performance was obtained when training the model with the collected data rather than LUNA16 with large amounts of data. We also show that weight transfer learning from pre-trained open data is very useful when it is difficult to collect large amounts of data. Good performance can otherwise be expected when reaching more than 100 patients. This study offers valuable insights for guiding data collection in lung nodules studies in the future.Entities:
Keywords: computed tomography; deep learning; lung nodule; nodule detection; publicly available data; radiologist; transfer learning
Year: 2022 PMID: 35804946 PMCID: PMC9265117 DOI: 10.3390/cancers14133174
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
GNAH clinical data.
| Clinical Data | Total ( | Malignancy ( | Benign ( |
|---|---|---|---|
| Age, yr (min-max) | 67.5 (21–87) | 68.3 (32–87) | 66.1 (21–87) |
| Sex, male (%) | 181 (63.1) | 110 (59.5) | 71 (69.6) |
| Pathological diagnosis, | |||
| NSCLC | 145 (50.5) | 145 (78.4) | 0 |
| Adenocarcinoma | 107 (37.3) | 107 (57.8) | |
| Squamous cell carcinoma | 33 (11.5) | 33 (17.8) | |
| Adenosquamous carcinoma | 2 (0.7) | 2 (1.1) | |
| NSCLC | 3 (1.0) | 3 (1.6) | |
| SCLC | 9 (3.1) | 9 (4.9) | 0 |
| Other type of lung malignancy | 7 (2.4) | 7 (3.8) | 0 |
| Malignancy other than in the lung | 24 (8.4) | 24 (13.0) | 0 |
| Negative for malignancy | 58 (20.2) | 0 | 58 (56.9) |
| NA | 44 (15.3) | 0 | 44 (43.1) |
| Methods for pathological diagnosis | |||
| PCNA | 105 (36.6) | 57 (30.8) | 48 (47.1) |
| Surgical operation | 77 (26.8) | 71 (38.4) | 6 (5.9) |
| Bronchoscopy | 4 (1.4) | 4 (2.2) | 0 |
| EBUS-TBNA | 2 (0.7) | 2 (1.1) | 0 |
| Contrast-enhanced Chest CT (≥2 years) | |||
| No growth | 35 (12.2) | 0 | 35 (34.3) |
| Disappearance | 9 (3.1) | 0 | 9 (8.8) |
NSCLC, non-small cell lung cancer; SCLC, small cell lung cancer; NA, not applicable; PCNA, percutaneous needle aspiration; EBUS-TBNA, endobronchial ultrasound-guided transbronchial needle aspiration.
KNUH clinical data.
| Clinical Data | Total ( | Malignancy ( | Benign ( |
|---|---|---|---|
| Age, yr (min-max) | 67.2 (31–88) | 69.0 (46–88) | 60.3 (31–85) |
| Sex, male (%) | 116 (64.1) | 90 (62.1) | 26 (72.2) |
| Pathologic diagnosis, | |||
| NSCLC | 118 (65.2) | 118 (81.4) | 0 |
| Adenocarcinoma | 87 (48.1) | 87 (73.3) | |
| Squamous cell carcinoma | 25 (13.8) | 25 (21.1) | |
| Large cell endocrine carcinoma | 3 (1.7) | 3 (2.5) | |
| NSCLC | 3 (1.7) | 3 (2.5) | |
| SCLC | 5 (2.8) | 5 (3.4) | 0 |
| Other type of lung malignancy | 6 (3.3) | 6 (4.1) | 0 |
| Malignancy other than lung | 16 (8.8) | 16 (11.0) | 0 |
| Negative for malignancy | 33 (18.2) | 0 | 33 (91.7) |
| NA | 3 (1.7) | 0 | 3 (8.3) |
| Methods for pathologic diagnosis | |||
| PCNA | 92 (50.8) | 70 (48.3) | 22 (61.1) |
| Surgical operation | 86 (47.5) | 75 (51.7) | 11 (30.6) |
| Contrast-enhanced Chest CT (≥2 years) | |||
| No growth | 3 (1.7) | 3 (8.3) | |
| Disappearance | 0 | 0 |
HSHH clinical data.
| Clinical Data | Total ( | Malignancy ( | Benign ( |
|---|---|---|---|
| Age, yr (min-max) | 68.0 (35–88) | 70.4 (50–88) | 63.6 (35–87) |
| Sex, male (%) | 28 (63.1) | 18 (60.0) | 10 (58.8) |
| Pathologic diagnosis, | |||
| NSCLC | 24 (51.1) | 24 (80.0) | 0 |
| Adenocarcinoma | 17 (36.2) | 17 (56.7) | |
| Squamous cell carcinoma | 7 (14.9) | 7 (23.3) | |
| SCLC | 2 (4.3) | 2 (6.7) | 0 |
| Other type of lung malignancy | 3 (6.4) | 3 (10.0) | 0 |
| Malignancy | 1 (2.1) | 1 (3.3) | 0 |
| Negative for malignancy | 17 (36.2) | 0 | 17 (100.0) |
| Methods for pathologic diagnosis | |||
| PCNA | 19 (40.4) | 13 (43.3) | 6 (35.3) |
| Bronchoscopy | 16 (34.0) | 9 (30.0) | 7 (41.2) |
| Transbronchial lung biopsy | 5 (10.6) | 1 (3.3) | 4 (23.5) |
| Surgical operation | 3 (6.4) | 3 (10.0) | 0 |
| EBUS-TBNA | 1 (2.1) | 1 (3.3) | 0 |
| Unknown | 3 (6.4) | 2 (1.1) | 0 |
Chest CT protocols at Gangneung Asan Hospital (GNAH), Kangwon National University Hospital (KNUH), and Hallym Sacred Heart Hospital (HSHH).
| CT Protocols | GNAH Dataset | KNUH Dataset | HSHH Dataset |
|---|---|---|---|
| Name | SIEMENS/SOMATOM Definition Edge 2 | SIEMENS/SOMATOM Definition & Definition Flash | SIEMENS/SOMATOM Flash (128 ch) |
| kVp/mAs | 120/35 | 120/110 | 120/35 |
| kernel | B41f medium | B41f | B40f medium |
| slice/gap (mm) | 5 | 3 | 3 |
Figure 1Ground truth sample. (a–d) LUNA16; (e–h) Private data.
Figure 2Illustration of dataset configuration.
Figure 3Example of data augmentation applied to lung nodule CT images. (a) Original; (b) Flip; (c) Mosaic; (d) Random Affine; (e) MixUp.
Figure 4Detailed architecture of YOLOX. Cls, Class; Reg, Regression; IOU, intersection over union; conv, convolution.
Dataset configurations used in the experiments.
| Dataset | Training | Validation | Test |
|---|---|---|---|
| (A) | LUNA16 (subsets 1–8) | LUNA16 | GNAH (20%) |
| (B) | LUNA16 pre-train (100%) | GNAH (20%) | |
| (C) | GNAH (60%) | GNAH (20%) | |
| (D) | KNUH (60%) | KNUH (20%) | |
| (E) | GNAH (60%), KNUH (60%) | GNAH (20%) |
Figure 5Experimental settings.
Performance Comparison of Collected Private Data and Open Data.
| Training | Test | AP10 | AP50 | CPM |
|---|---|---|---|---|
| LUNA16 | GNAH | 0.8590 (+0.0) | 0.5482 (+0.0) | 0.8886 (+0.0) |
| KNUH | 0.9151 (+0.0) | 0.4499 (+0.0) | 0.9340 (+0.0) | |
| HSHH | 0.7262 (+0.0) | 0.1111 (+0.0) | 0.7628 (+0.0) | |
| GNAH | GNAH | 0.8674 (+0.0084) | 0.8160 (+0.2678) | 0.8967 (+0.0081) |
| KNUH | 0.8548 (−0.0603) | 0.8228 (+0.3729) | 0.8844 (−0.0496) | |
| HSHH | 0.7878 (+0.0616) | 0.6270 (+0.5159) | 0.8114 (+0.0486) | |
| KNUH | GNAH | 0.8455 (−0.0135) | 0.7996 (+0.2514) | 0.8717 (−0.0169) |
| KNUH | 0.9167 (+0.0016) | 0.8817 (+0.4318) | 0.9395 (+0.0055) | |
| HSHH | 0.8307 (+0.1045) | 0.7729 (+0.6618) | 0.8561 (+0.0933) | |
| GNAH | GNAH | 0.8960 (+0.0370) | 0.8450 (+0.2968) | 0.9290 (+0.0404) |
| KNUH | 0.9525 (+0.0374) | 0.9341 (+0.4842) | 0.9660 (+0.0320) | |
| HSHH | 0.8391 (+0.1129) | 0.7935 (+0.6824) | 0.8645 (+0.1017) |
AP, average precision; CPM, competition performance metric.
Figure 6Image sample overlapping the prediction results of the LUNA16 and private data trained models and the ground truth. (a) Case where the predictions of the model trained on LUNA16 and the model trained on private data is very similar to the ground truth; (b,c) Case where the predicted box of the model trained on LUNA16 is significantly larger than the size of the ground truth; (d) Case in which the LUNA16 trained model did not predict lung nodules. Red (Solid line) = Ground Truth; Blue (Dashed line) = Private data; Green (Dotted line) = LUNA16.
Figure 7Comparison of the performance of the models with transfer learning from open data and without transfer learning.
Figure A1Experiment 2.
Figure 8CPM results for models trained with incremental growth of GNAH and KNUH combined data. (a) YOLOX; (b) EfficientDet.
Figure A2Experiment 3—YOLOX.
Figure A3Experiment 3—EfficientDet.