| Literature DB >> 32443922 |
Mio Adachi1, Tomoyuki Fujioka2, Mio Mori2, Kazunori Kubota2,3, Yuka Kikuchi2, Wu Xiaotong2, Jun Oyama2, Koichiro Kimura2, Goshi Oda1, Tsuyoshi Nakagawa1, Hiroyuki Uetake1, Ukihide Tateishi2.
Abstract
We aimed to evaluate an artificial intelligence (AI) system that can detect and diagnose lesions of maximum intensity projection (MIP) in dynamic contrast-enhanced (DCE) breast magnetic resonance imaging (MRI). We retrospectively gathered MIPs of DCE breast MRI for training and validation data from 30 and 7 normal individuals, 49 and 20 benign cases, and 135 and 45 malignant cases, respectively. Breast lesions were indicated with a bounding box and labeled as benign or malignant by a radiologist, while the AI system was trained to detect and calculate possibilities of malignancy using RetinaNet. The AI system was analyzed using test sets of 13 normal, 20 benign, and 52 malignant cases. Four human readers also scored these test data with and without the assistance of the AI system for the possibility of a malignancy in each breast. Sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were 0.926, 0.828, and 0.925 for the AI system; 0.847, 0.841, and 0.884 for human readers without AI; and 0.889, 0.823, and 0.899 for human readers with AI using a cutoff value of 2%, respectively. The AI system showed better diagnostic performance compared to the human readers (p = 0.002), and because of the increased performance of human readers with the assistance of the AI system, the AUC of human readers was significantly higher with than without the AI system (p = 0.039). Our AI system showed a high performance ability in detecting and diagnosing lesions in MIPs of DCE breast MRI and increased the diagnostic performance of human readers.Entities:
Keywords: artificial intelligence; breast imaging; convolutional neural network; deep learning; magnetic resonance imaging; object detection
Year: 2020 PMID: 32443922 PMCID: PMC7277981 DOI: 10.3390/diagnostics10050330
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Histopathology of benign and malignant lesions.
| Training Data | Validation Data | Test Data | |||
|---|---|---|---|---|---|
| Benign ( | Malignant | Benign ( | Malignant | Benign ( | Malignant |
| Fibroadenoma 8 | Ductal Carcinoma In Situ 22 | Fibroadenoma 4 | Ductal Carcinoma In Situ 2 | Fibroadenoma 4 | Ductal Carcinoma In Situ 3 |
| Papilloma 9 | Invasive Ductal Carcinoma 91 | Papilloma 3 | Invasive Ductal Cancer 29 | Papilloma 3 | Invasive Ductal Carcinoma 38 |
| Mastopathy 4 | Mucinous Carcinoma 3 | Mastopathy 3 | Mucinous Carcinoma 2 | Mastopathy 3 | Mucinous Carcinoma 1 |
| Benign Phyllodes Tumor 2 | Invasive Lobular Carcinoma 7 | Non-Specific Benign Lesion 2 | Invasive Lobular Carcinoma 5 | Non-Specific Benign Lesion 2 | Invasive Lobular Carcinoma 1 |
| Non-Specific Benign Lesion 4 | Apocrine Carcinoma 2 | Not Known * 8 | Apocrine Carcinoma 1 | Not Known * 8 | Apocrine Carcinoma 1 |
| Not Known * 22 | Malignant Phyllodes Tumor 1 | Malignant Phyllodes Tumor 1 | Malignant Phyllodes Tumor 2 | ||
| Unclassifiable 9 | Unclassifiable 5 | Unclassifiable 6 | |||
*: Clinically diagnosed by observation, **: Diagnosed by fine needle aspiration.
Figure 1Maximum intensity projection of dynamic contrast-enhanced breast magnetic resonance images. Representative images of labeled normal (a), benign (b), and malignant (c) breast lesions.
Figure 2The RetinaNet architecture uses a Feature Pyramid Network [21] backbone on top of a feedforward ResNet architecture [22] (a) to generate a rich, multi-scale convolutional feature pyramid (b). To this backbone RetinaNet attaches two subnetworks: one for classifying anchor boxes (c) and one for regressing from anchor boxes to ground-truth object boxes (d). (Reprinted and adapted with permission from ICCV 2017 [20].)
Figure 3Figure 3 shows the learning curves generated by epochs and data loss, demonstrating that learning above 150 epochs resulted in low data loss.
Characteristics of patients and images.
| Normal | Benign | Malignant | ||
|---|---|---|---|---|
| Training | Patients (n) | 30 | 49 | 135 |
| Age (years) | 38–72 | 38–74 | 26–86 | |
| Range, Mean ± SD | 52.9 ± 11.1 | 46.3 ± 11.0 | 58.6 ±12.8 | |
| Breasts (n) | 201 a | 88 | 139 | |
| Mass/Non-mass (n) | 77/11 | 114/25 | ||
| Size at MRI (mm) | 3–105 | 6–95 | ||
| Range, Mean ± SD | 14.9 ± 15.0 | 23.6 ± 16.9 | ||
| Validation | Test Data (n) | 7 | 20 | 45 |
| Age (years) | 40–54 | 28–79 | 26–78 | |
| Range, Mean ± SD | 46.0 ± 5.6 | 50.0 ± 12.5 | 55.6 ± 13.1 | |
| Breasts (n) | 64 a | 30 | 50 | |
| Mass/Non-mass (n) | 22/8 | 38/12 | ||
| Size at MRI (mm) | 3–62 | 6–123 | ||
| Range, Mean ± SD | 17.5 ± 15.2 | 35.0 ± 26.8 | ||
| Test | Patients (n) | 13 | 20 | 52 |
| Age (years) | 21–77 | 20–79 | 30–85 | |
| Range, Mean ± SD | 47.2 ± 12.6 | 47.2 ± 11.1 | 58.3 ±13.6 | |
| Breasts (n) | 92 a | 24 | 54 | |
| Mass/Non-mass (n) | 19/5 | 47/7 | ||
| Size at MRI (mm) | 5–33 | 7–106 | ||
| Range, Mean ± SD | 13.6 ± 8.8 | 25.6 ± 22.0 |
SD: Standard deviation. a: “Normal breasts” is the total number of bilateral breasts of normal patients and contralateral normal breasts of benign and malignant patients.
Diagnostic performance of human readers and the AI system.
| Sensitivity | Specificity | AUC (95% CI) | |
|---|---|---|---|
| AI | 0.926 | 0.828 | 0.925 (0.878–0.971) |
| Reader 1 | 0.833 | 0.836 | 0.872 (0.811–0.933) |
| Reader 1 with AI system | 0.889 | 0.802 | 0.893 (0.837–0.948) |
| Reader 2 | 0.889 | 0.776 | 0.904 (0.849–0.959) |
| Reader 2 with AI system | 0.907 | 0.759 | 0.915 (0.863–0.967) |
| Reader 3 | 0.833 | 0.853 | 0.876 (0.816–0.936) |
| Reader 3 with AI system | 0.874 | 0.844 | 0.893 (0.847–0.956) |
| Reader 4 | 0.862 | 0.833 | 0.887 (0.829–0.945) |
| Reader 4 with AI system | 0.889 | 0.845 | 0.902 (0.847–0.956) |
| All Human Readers | 0.847 | 0.841 | 0.884 (0.854–0.920) |
| All Human Readers with AI system | 0.889 | 0.823 | 0.899 (0.872–0.929) |
Cutoff value was defined as 2%. AI: artificial intelligence; AUC: area under the receiver operating characteristic curve; CI: confidence interval.
Figure 4The ROC curve for the AI system compared with that of each human reader (a–d) and all four human readers (e). The AUCs of the human readers without the AI system were 0.872–0.904, and those of each human reader with the AI system were 0.893–0.915. The AUCs of the AI system, all four human readers without the AI system, and all four human readers with the AI system were 0.925, 0.884, and 0.899, respectively. The AI system showed better diagnostic performance compared with that of all four human readers without the AI system (p = 0.002), and the AUC of all four human readers using the AI system was significantly higher than that without the AI system (p = 0.039).
Statistical Analysis of the AUCs of Human Readers and AI.
| Reader without AI System | ||
|---|---|---|
| Reader 1 | 0.038 | 0.203 |
| Reader 2 | 0.414 | 0.200 |
| Reader 3 | 0.076 | 0.393 |
| Reader 4 | 0.143 | 0.203 |
| Readers | 0.002 | 0.039 |
The p-value was calculated by comparing AUCs. AI: artificial intelligence; AUC: area under the receiver operating characteristic curve.
Figure 5True-negative (a) and true-positive cases (b, c) diagnosed by the AI system (left, original image; right, image diagnosed by AI system). The AI system did not respond to normal breasts (a). The AI system correctly detected and diagnosed invasive ductal carcinoma (IDC) of the right breast (b) and invasive ductal carcinoma of bilateral breasts (c).
Pathological features of false-positive and false-negative cases diagnosed by AI.
| No | Size (mm)/Mass or Non-Mass | Possibility of Malignancy (%) | Pathology | ||||
|---|---|---|---|---|---|---|---|
| AI | Reader 1 | Reader 2 | Reader 3 | Reader 4 | |||
| False Positive | |||||||
| 1 | 27 mm/mass | 66.2 | 10 | 0 | 0 | 30 | Observation |
| 2 | No findings | 57.9 | 0 | 10 | 0 | 0 | Normal |
| 3 | No findings | 90.0 | 0 | 0 | 1 | 0 | Normal |
| 4 | 19 mm/mass | 99.9 | 80 | 60 | 10 | 90 | Fibroadenoma |
| 5 | 9 mm/mass | 83.5 | 2 | 20 | 0 | 1 | IDP |
| 6 | 9 mm/mass | 97.7 | 60 | 20 | 0 | 10 | Observation |
| 7 | 14 mm/mass | 99.9 | 95 | 40 | 5 | 95 | Fibroadenoma |
| False Negative | |||||||
| 1 | 10 mm/mass | 0 | 0 | 50 | 0 | 90 | IDC |
| 2 | 11 mm/mass | 0 | 0 | 0 | 0 | 0 | IDC |
| 3 | 17 m/non-mass | 0 | 0 | 0 | 0 | 0 | DCIS |
| 4 | 7 mm/mass | 0 | 50 | 10 | 1 | 20 | IDC |
AI: artificial intelligence; DCIS: ductal carcinoma in situ; IDC: invasive ductal carcinoma; IDP: intraductal papilloma.
Figure 6False-positive case (a,b) and false-negative case (c) diagnosed by the AI system (left, original image; right, image diagnosed by the AI system). The AI system mistakenly detected normal breasts (a) and fibroadenoma (b) and diagnosed them as malignant. The AI system failed to detect invasive ductal carcinoma near the right axilla. This may have been mistaken for an axillary lymph node (c).