| Literature DB >> 31467326 |
Li Shen1, Laurie R Margolies2, Joseph H Rothstein3, Eugene Fluder4, Russell McBride5, Weiva Sieh3.
Abstract
The rapid development of deep learning, a family of machine learning techniques, has spurred much interest in its application to medical imaging problems. Here, we develop a deep learning algorithm that can accurately detect breast cancer on screening mammograms using an "end-to-end" training approach that efficiently leverages training datasets with either complete clinical annotation or only the cancer status (label) of the whole image. In this approach, lesion annotations are required only in the initial training stage, and subsequent stages require only image-level labels, eliminating the reliance on rarely available lesion annotations. Our all convolutional network method for classifying screening mammograms attained excellent performance in comparison with previous methods. On an independent test set of digitized film mammograms from the Digital Database for Screening Mammography (CBIS-DDSM), the best single model achieved a per-image AUC of 0.88, and four-model averaging improved the AUC to 0.91 (sensitivity: 86.1%, specificity: 80.1%). On an independent test set of full-field digital mammography (FFDM) images from the INbreast database, the best single model achieved a per-image AUC of 0.95, and four-model averaging improved the AUC to 0.98 (sensitivity: 86.7%, specificity: 96.1%). We also demonstrate that a whole image classifier trained using our end-to-end approach on the CBIS-DDSM digitized film mammograms can be transferred to INbreast FFDM images using only a subset of the INbreast data for fine-tuning and without further reliance on the availability of lesion annotations. These findings show that automatic deep learning methods can be readily trained to attain high accuracy on heterogeneous mammography platforms, and hold tremendous promise for improving clinical tools to reduce false positive and false negative screening mammography results. Code and model available at: https://github.com/lishen/end2end-all-conv .Entities:
Mesh:
Year: 2019 PMID: 31467326 PMCID: PMC6715802 DOI: 10.1038/s41598-019-48995-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Converting a patch classifier to an end-to-end trainable whole image classifier using an all convolutional design. The function f was first trained on patches and then refined on whole images. We evaluated whether removing the heatmap improved information flow from the bottom layers of the patch classifier to the top convolutional layers in the whole image classifier. The magnifying glass shows an enlarged version of the heatmap. This figure is best viewed in color.
Accuracy of the patch classifiers using the Resnet50 and VGG16 in the independent test set.
| Model | Pretrained | Patch set | Accuracy | #Epochs |
|---|---|---|---|---|
| Resnet50 | N | S1 | 0.97 [0.96, 0.98] | 198 |
| Resnet50 | Y | S1 | 0.99 [0.98, 1.00] | 99 |
| Resnet50 | N | S10 | 0.63 [0.62, 0.64] | 24 |
| Resnet50 | Y | S10 | 0.89 [0.88, 0.90] | 39 |
| Resnet50 | Y | S1g | 0.76 [0.74, 0.79] | 84 |
| VGG16 | Y | S10 | 0.84 [0.83, 0.85] | 25 |
#Epochs indicates the epoch when the highest accuracy was reached in the validation set.
Figure 2Confusion matrix analysis of 5-class patch classification for Resnet50 (a) and VGG16 (b) in the S10 test set. The matrices are normalized so that each row sums to one. This figure is best viewed in color.
Per-image AUCs of whole image classifiers using the Resnet50 as patch classifiers in the independent test set.
| Patch set | Block 1 | Block 2 | AUC [95% CI] | A-AUC [95% CI] | #Epochs |
|---|---|---|---|---|---|
| S1 | [512-512-2048] × 1 | [512-512-2048] × 1 | 0.63 [0.58, 0.67] | NA | 35 |
| S1g | [512-512-2048] × 1 | [512-512-2048] × 1 | 0.83 [0.79, 0.86] | NA | 38 |
| S10 | [512-512-2048] × 1 | [512-512-2048] × 1 | 0.85 [0.82, 0.88] | 0.86 [0.83, 0.89] | 20 |
|
|
| ||||
| S10 | [256-256-256] × 1 | [128-128-128] × 1 | 0.84 [0.81, 0.87] | 0.86 [0.82, 0.89] | 25 |
|
|
| ||||
|
| |||||
| S10 | [512-512-1024] × 2 | [512-512-1024] × 2 | 0.80 [0.76, 0.84] | NA | 47 |
| S10 | [64-64-256] × 2 | [128-128-512] × 2 | 0.81 [0.77, 0.85] | NA | 41 |
|
|
|
| |||
| 5 × 5 | 64 | 32 | 0.74 [0.69, 0.78] | NA | 28 |
| 2 × 2 | 512 | 256 | 0.72 [0.67, 0.76] | NA | 47 |
| 1 × 1 | 2048 | 1024 | 0.65 [0.60, 0.69] | NA | 43 |
#Epochs indicates the epoch when the highest AUC was reached in the validation set. The best performing models are shown in boldface.
Per-image AUCs of whole image classifiers using the VGG16 as patch classifiers in the independent test set.
| Patch set | Block 1 | Block 2 | AUC [95% CI] | A-AUC [95% CI] | #Epochs |
|---|---|---|---|---|---|
| S10 | 512 × 3 | 512 × 3 | 0.81 [0.77, 0.84] | 0.82 [0.78, 0.85] | 91 |
|
|
| ||||
| S10 | 128 × 1 | 64 × 1 | 0.84 [0.80, 0.87] | 0.86 [0.82, 0.89] | 142 |
|
|
| ||||
|
|
|
| |||
| 5 × 5 | 64 | 32 | 0.71 [0.66, 0.75] | NA | 26 |
| 2 × 2 | 512 | 256 | 0.68 [0.63, 0.73] | NA | 27 |
| 1 × 1 | 2048 | 1024 | 0.70 [0.65, 0.74] | NA | 50 |
#Epochs indicates the epoch when the highest AUC was reached in the validation set. The best performing models are shown in boldface.
Figure 3ROC curves for the four best individual models and ensemble model on the CBIS-DDSM (a) and INbreast (b) test sets. This figure is best viewed in color.
Figure 4Saliency maps of TP (a), FP (b) and FN (c) image classifications. The outlines represent the regions of interest annotated by the radiologist, and biopsy-confirmed to contain either malignant (blue) or benign (green) tissue. The red dots represent the gradients of the input image with respect to the cancer class output. The gradients were rescaled to be within [0, 1] and a low cutoff of 0.06 was used to remove background noise. Heatmaps (d) of the four non-background classes for input image (a). The colors of the heatmaps represent the activation values after ReLU. This figure is best viewed in color.
Figure 5Representative examples of a digitized film mammogram from CBIS-DDSM and a digital mammogram from INbreast.
Transfer learning efficiency with different training set sizes assessed by the per-image AUC on the INbreast test set.
| #Patients | #Images | Resnet-Resnet | Resnet-VGG | VGG-VGG | VGG-Resnet |
|---|---|---|---|---|---|
| 20 | 79 | 0.92 | 0.88 | 0.87 | 0.89 |
| 30 | 117 | 0.93 | 0.94 | 0.93 | 0.90 |
| 40 | 159 | 0.93 | 0.95 | 0.93 | 0.93 |
| 50 | 199 | 0.94 | 0.95 | 0.94 | 0.93 |
| 60 | 239 | 0.95 | 0.95 | 0.95 | 0.94 |
| 72 (All) | 280 | 0.95 | 0.95 | 0.95 | 0.95 |