| Literature DB >> 34934144 |
Jaeil Kim1, Hye Jung Kim2, Chanho Kim1, Jin Hwa Lee3, Keum Won Kim4, Young Mi Park5, Hye Won Kim6, So Yeon Ki7, You Me Kim8, Won Hwa Kim9.
Abstract
Conventional deep learning (DL) algorithm requires full supervision of annotating the region of interest (ROI) that is laborious and often biased. We aimed to develop a weakly-supervised DL algorithm that diagnosis breast cancer at ultrasound without image annotation. Weakly-supervised DL algorithms were implemented with three networks (VGG16, ResNet34, and GoogLeNet) and trained using 1000 unannotated US images (500 benign and 500 malignant masses). Two sets of 200 images (100 benign and 100 malignant masses) were used for internal and external validation sets. For comparison with fully-supervised algorithms, ROI annotation was performed manually and automatically. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map, we determined how accurately the weakly-supervised DL algorithms localized the breast masses. For internal validation sets, the weakly-supervised DL algorithms achieved excellent diagnostic performances, with AUC values of 0.92-0.96, which were not statistically different (all Ps > 0.05) from those of fully-supervised DL algorithms with either manual or automated ROI annotation (AUC, 0.92-0.96). For external validation sets, the weakly-supervised DL algorithms achieved AUC values of 0.86-0.90, which were not statistically different (Ps > 0.05) or higher (P = 0.04, VGG16 with automated ROI annotation) from those of fully-supervised DL algorithms (AUC, 0.84-0.92). In internal and external validation sets, weakly-supervised algorithms could localize 100% of malignant masses, except for ResNet34 (98%). The weakly-supervised DL algorithms developed in the present study were feasible for US diagnosis of breast cancer with well-performing localization and differential diagnosis.Entities:
Mesh:
Year: 2021 PMID: 34934144 PMCID: PMC8692405 DOI: 10.1038/s41598-021-03806-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the data acquisition.
Figure 2Overview of weakly-supervised and fully-supervised deep learning (DL) algorithms for breast mass classification and localization. The weakly-supervised DL algorithm does not require image annotation of region of interest (ROI) of the lesion, whereas the fully-supervised DL algorithm requires tumor segmentation (manual or automated) and cropping for ROI before being put in the classifiers. For the weakly-supervised DL algorithm, a class activation map (CAM) is generated to visualize the region detected by this algorithm using a global average pooling layer (GAP) that is added to the final convolutional layer.
Baseline characteristics of datasets.
| Training | Internal validation | External validation | |
|---|---|---|---|
| Patients | 818 | 167 | 125 |
| Images | 1000 | 200 | 200 |
| Patients with malignant masses | 500 | 100 | 88 |
| Images with malignant masses | 500 | 100 | 100 |
| Patients with benign masses | 346 | 68 | 39 |
| Images with benign masses | 500 | 100 | 100 |
| < 30 years | 16 (2%) | 3 (2%) | 12 (6%) |
| 30–50 years | 431 (53%) | 93 (56%) | 91 (46%) |
| ≥ 50 years | 371 (45%) | 71 (43%) | 97 (49%) |
| Homogeneous, fat | 20 (2%) | 5 (3%) | 14 (11%) |
| Homogeneous, fibroglandular | 956 (96%) | 191 (96%) | 69 (55%) |
| Heterogeneous | 24 (2%) | 4 (2%) | 42 (34%) |
| < 2 cm | 754 (75%) | 149 (75%) | 163 (82%) |
| 2–5 cm | 239 (24%) | 50 (25%) | 37 (19%) |
| ≥ 5 cm | 7 (1%) | 1 (1%) | 0 |
| Phillips | 1000 (100%) | 200 (100%) | 16 (8%) |
| GE | 0 | 0 | 169 (85%) |
| Siemens | 0 | 0 | 15 (8%) |
*Patient-level, **image-level.
Diagnostic performance metrics of weakly-supervised and fully-supervised deep learning algorithms in internal validation set.
| Weakly-supervised | Fully-supervised | ||||
|---|---|---|---|---|---|
| Manual | Automated | Weakly-supervised vs. manual | Weakly-supervised vs. automated | ||
| VGG16 | 0.96 (0.92, 0.98) | 0.96 (0.93, 0.98) | 0.96 (0.92, 0.98) | 0.72 | 0.96 |
| ResNet34 | 0.92 (0.88, 0.96) | 0.94 (0.89, 0.97) | 0.92 (0.87, 0.95) | 0.57 | 0.76 |
| GoogLeNet | 0.94 (0.90, 0.97) | 0.96 (0.92. 0.98) | 0.95 (0.91, 0.98) | 0.30 | 0.65 |
| VGG16 | 87% | 85% | 87% | 0.77 | 1.00 |
| ResNet34 | 82% | 89% | 79% | 0.12 | 0.63 |
| GoogLeNet | 87% | 87% | 87% | 1.00 | 1.00 |
| VGG16 | 91% | 91% | 94% | 1.00 | 0.45 |
| ResNet34 | 91% | 90% | 92% | 1.00 | 1.00 |
| GoogLeNet | 94% | 92% | 92% | 0.69 | 0.69 |
95% confidence intervals in parenthesis.
Diagnostic performance metrics of weakly-supervised and fully-supervised deep learning algorithms in external validation set.
| Weakly-supervised | Fully-supervised | ||||
|---|---|---|---|---|---|
| Manual | Automated | Weakly-supervised vs. manual | Weakly-supervised vs. automated | ||
| VGG16 | 0.89 (0.84, 0.93) | 0.91 (0.86, 0.95) | 0.85 (0.79, 0.89) | 0.28 | 0.04 |
| ResNet34 | 0.86 (0.81, 0.91) | 0.89 (0.84, 0.93) | 0.84 (0.78, 0.88) | 0.31 | 0.32 |
| GoogLeNet | 0.90 (0.85, 0.94) | 0.92 (0.87, 0.95) | 0.87 (0.82, 0.92) | 0.32 | 0.19 |
| VGG16 | 91% | 85% | 89% | 0.45 | 0.73 |
| ResNet34 | 78% | 89% | 81% | < 0.001 | 0.66 |
| GoogLeNet | 88% | 87% | 87% | 0.51 | 1.00 |
| VGG16 | 72% | 91% | 52% | 0.85 | < 0.001 |
| ResNet34 | 80% | 90% | 69% | < 0.05 | 0.07 |
| GoogLeNet | 76% | 92% | 63% | 0.85 | 0.04 |
95% confidence intervals in parenthesis.
Figure 3Classification results with class activation map (CAM) using the weakly-supervised deep learning (DL) algorithm on external validation set. Examples of true-positive (A), false-negative (B), false-positive (C), and true-negative (D) are shown for each network (VGG16, ResNet34, and GoogLeNet). (A) Ultrasound images show a 17-mm irregular, spiculated invasive ductal carcinoma, which was predicted as malignancy with probability of malignancy (POM) of 1.00, 1.00, and 0.999 in VGG16, ResNet34, and GoogLeNet, respectively. (B) Ultrasound images show an 11-mm oval, circumscribed, isoechoic mucinous carcinoma, which was predicted as benign with POM of 0.007, 0.000, and 0.000, respectively. (C) Ultrasound images show a 29-mm oval, hypoechoic mass with macrocalcifications considered as benign (unchanged during the 46-month follow-up period), which was predicted as malignancy with POM of 1.000, 0.994, and 1.000, respectively. (D) Ultrasound images show a 6-mm oval, circumscribed mass considered as benign (unchanged during the 55-month follow-up period), which was predicted as benign with POM of 0.434, 0.006, and 0.006, respectively.
Metrics for discriminative localization of benign and malignant breast masses in the weakly-supervised deep learning algorithm.
| Internal validation | External validation | |||||
|---|---|---|---|---|---|---|
| Benign | Malignant | Benign | Malignant | |||
| VGG16 | 1.00 | 1.00 | ||||
| Correct | 99 | 100 | 99 | 100 | ||
| Incorrect | 1 | 0 | 1 | 0 | ||
| ResNet34 | 1.00 | 0.68 | ||||
| Correct | 99 | 100 | 96 | 98 | ||
| Incorrect | 1 | 0 | 4 | 2 | ||
| GoogLeNet | N/A | 0.25 | ||||
| Correct | 100 | 100 | 97 | 100 | ||
| Incorrect | 0 | 0 | 3 | 0 | ||
N/A = not applicable. For discriminative localization, we created binary images by applying a threshold of 0.3 to a class activation map (CAM) and compared them with manual annotation. Discriminative localization is regarded as correct when the segmented area overlaps with the manually annotated area.