| Literature DB >> 32601367 |
Qiyuan Hu1, Heather M Whitney2,3, Maryellen L Giger2.
Abstract
Multiparametric magnetic resonance imaging (mpMRI) has been shown to improve radiologists' performance in the clinical diagnosis of breast cancer. This machine learning study develops a deep transfer learning computer-aided diagnosis (CADx) methodology to diagnose breast cancer using mpMRI. The retrospective study included clinical MR images of 927 unique lesions from 616 women. Each MR study included a dynamic contrast-enhanced (DCE)-MRI sequence and a T2-weighted (T2w) MRI sequence. A pretrained convolutional neural network (CNN) was used to extract features from the DCE and T2w sequences, and support vector machine classifiers were trained on the CNN features to distinguish between benign and malignant lesions. Three methods that integrate the sequences at different levels (image fusion, feature fusion, and classifier fusion) were investigated. Classification performance was evaluated using the receiver operating characteristic (ROC) curve and compared using the DeLong test. The single-sequence classifiers yielded areas under the ROC curves (AUCs) [95% confidence intervals] of AUCDCE = 0.85 [0.82, 0.88] and AUCT2w = 0.78 [0.75, 0.81]. The multiparametric schemes yielded AUCImageFusion = 0.85 [0.82, 0.88], AUCFeatureFusion = 0.87 [0.84, 0.89], and AUCClassifierFusion = 0.86 [0.83, 0.88]. The feature fusion method statistically significantly outperformed using DCE alone (P < 0.001). In conclusion, the proposed deep transfer learning CADx method for mpMRI may improve diagnostic performance by reducing the false positive rate and improving the positive predictive value in breast imaging interpretation.Entities:
Mesh:
Year: 2020 PMID: 32601367 PMCID: PMC7324398 DOI: 10.1038/s41598-020-67441-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of slice thickness and in-plane resolution of the dynamic contrast-enhance (DCE) sequences and T2-weighted (T2w) sequences in the multiparametric MRI database.
Clinical characteristics of the dataset.
| Benign/malignant prevalence | Benign: 199 (21.5) |
| Malignant: 728 (78.5) | |
| Age (years): mean ± standard deviation | 55.0 ± 12.8 |
| Unknown: 97 | |
| Lesion size (mm) | Mean: 8.86 |
| Median: 7.33 | |
| Range: 3.38–42.8 | |
| Lesion subtypes | Fibroadenoma: 60 (30.2) |
| Columnar change: 15 (7.5) | |
| Papilloma: 13 (6.5) | |
| Parenchyma tissue: 12 (6.0) | |
| Fibrotic tissue: 10 (5.0) | |
| Hyperplasia: 8 (4.0) | |
| Cystic change: 6 (3.0) | |
| Fat necrosis: 5 (2.5) | |
| Other: 27 (13.6) | |
| Unknown: 43 (21.6) | |
| Lesion size (mm) | Mean: 17.9 |
| Median: 14.9 | |
| Range: 3.37–73.7 | |
| Lesion subtypes | IDC: 147 (20.2) |
| DCIS: 120 (16.5) | |
| IDC + DCIS: 359 (49.3) | |
| ILC: 31 (4.3) | |
| ILC mixed: 26 (3.6) | |
| Other: 33 (4.5) | |
| Unknown: 12 (1.6) | |
| Estrogen receptor status | Positive: 410 (56.3) |
| Negative: 128 (17.6) | |
| Unknown: 190 (26.1) | |
| Progesterone receptor status | Positive: 352 (48.4) |
| Negative: 184 (25.3) | |
| Unknown: 192 (26.4) | |
| HER-2 status | Positive: 87 (12.0) |
| Negative: 404 (55.5) | |
| Equivocal: 5 (0.7) | |
| Unknown: 232 (31.9) | |
Numbers in parentheses are percentages. Patient age is summarized on a patient basis, and lesion information (malignancy status and subtypes) is summarized on a lesion basis.
For some subjects, only the decade of age was available (e.g., “60 s”) as part of the patient information deidentification process. In these situations, the middle of the decade was used for the calculation of the mean subject age. Lesion size is measured by the effective diameter, i.e., the greatest dimension of a sphere with the same volume as the lesion.
IDC invasive ductal carcinoma, DCIS Ductal carcinoma in situ, ILC invasive lobular carcinoma, HER-2 human epidermal growth factor receptor 2.
Figure 2Lesion classification pipeline based on diagnostic images. Information from dynamic contrast-enhanced (DCE) and T2-weighted (T2w) MRI sequences are incorporated in three different ways: image fusion—fusing DCE and T2w images to create RGB composite image, feature fusion—merging convolutional neural network features extracted from DCE and T2w as the support vector machine (SVM) classifier input, and classifier fusion—aggregating the probability of malignancy output from the DCE and T2w classifiers via soft voting. MIP maximum intensity projection, ROI region of interest, ROC receiver operating characteristic.
Figure 3Example input images. A dynamic contrast-enhanced (DCE)-MRI transverse second post-contrast subtraction maximum intensity projection (MIP) and a T2-weighted (T2w)-MRI transverse center slice are shown with their corresponding regions of interest (ROIs) extracted. The RGB fusion ROI is created by inputting the DCE ROI into the red channel and the T2w ROI into the green channel.
Figure 4Fitted binomial receiver operating characteristic (ROC) curves for two single-sequence and three mpMRI classifiers using (i) convolutional neural network (CNN) features extracted from dynamic contrast-enhanced (DCE) subtraction maximum intensity projections (MIPs), (ii) CNN features extracted from T2-weighted (T2w) center slices, (iii) CNN features extracted from DCE and T2w fusion images, (iv) ensemble of features extracted from DCE and T2w images, and (v) probability of malignancy outputs from the DCE MIP and T2w classifiers aggregated via soft voting. The legend gives the area under the ROC curve (AUC) with standard error (SE) for each classifier scheme. T2w images were rescaled to match the in-plane resolution of their corresponding DCE sequences, but image registration was not performed.
Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUC) along with the 95% confidence interval (CI) for AUC for each classifier.
| Classifier | DCE | T2w | Image fusion | Feature fusion | Classifier fusion |
|---|---|---|---|---|---|
| AUC [95% CI] | 0.85 [0.82, 0.88] | 0.78 [0.75, 0.81] | 0.85 [0.82, 0.88] | 0.87 [0.84, 0.89] | 0.86 [0.83, 0.88] |
| Sensitivity (%) | 75.9 | 69.8 | 76.5 | 77.9 | 77.6 |
| Specificity (%) | 76.5 | 72.7 | 77.1 | 78.5 | 77.1 |
| PPV (%) | 89.7 | 87.3 | 90.0 | 90.7 | 90.1 |
| NPV (%) | 54.2 | 47.3 | 55.0 | 56.9 | 56.2 |
Sensitivity and specificity presented are for the optimal operating point determined using a metric for cut-off value that minimizes m = (1 − sensitivity) + (1 − specificity).
Performance comparison for the five classifiers.
| Classifier | DCE MIP | T2w center slice |
|---|---|---|
| Image fusion | 95% CI ∆AUC = [− 0.01, 0.02] | 95% CI ∆AUC = [0.05, 0.09] |
| Feature fusion | 95% CI ∆AUC = [0.01, 0.03] | 95% CI ∆AUC = [0.06, 0.11] |
| Classifier fusion | 95% CI ∆AUC = [− 0.00, 0.02] | 95% CI ∆AUC = [0.06, 0.09] |
The classifier names are shown in the first row (single-parametric) and first column (multiparametric). P-value and 95% confidence interval (CI) of the difference in area under the receiver operating characteristic curves (AUCs) for each comparison are presented in the table, where each multiparametric classifier was compared with each single-parametric classifier using the DeLong test. P-values were corrected for multiple comparisons using Bonferroni–Holm corrections. Asterisks denote significance (P < 0.05) after accounting for multiple comparisons.
Figure 5A diagonal classifier agreement plot between the T2-weighted (T2w) and dynamic contrast-enhanced (DCE) single-sequence classifiers. The x-axis and y-axis denote the probability of malignancy (PM) scores predicted by the DCE classifier and the T2w classifier, respectively. Each point represents a lesion for which predictions were made. Points along or near the diagonal from bottom left to top right indicate high classifier agreement; points far from the diagonal indicate low agreement. Examples of lesions on which the two classifiers were in extreme agreement/disagreement are also included.
Figure 6Bland–Altman plot illustrating classifier agreement between the dynamic contrast-enhanced (DCE) maximum intensity projection and T2-weighted (T2w)-based single-sequence classifiers. The y-axis shows the difference between the support vector machine output scores (predicted posterior probabilities of malignancy) of the two classifiers; the x-axis shows the mean of two classifiers’ outputs, which is also the probability of malignancy scores calculated in the classifier fusion method.