| Literature DB >> 31249497 |
Andrea Duggento1, Marco Aiello2, Carlo Cavaliere2, Giuseppe L Cascella3,4, Davide Cascella5, Giovanni Conte5, Maria Guerrisi1, Nicola Toschi1,6,7.
Abstract
Breast cancer is one of the most common cancers in women, with more than 1,300,000 cases and 450,000 deaths each year worldwide. In this context, recent studies showed that early breast cancer detection, along with suitable treatment, could significantly reduce breast cancer death rates in the long term. X-ray mammography is still the instrument of choice in breast cancer screening. In this context, the false-positive and false-negative rates commonly achieved by radiologists are extremely arduous to estimate and control although some authors have estimated figures of up to 20% of total diagnoses or more. The introduction of novel artificial intelligence (AI) technologies applied to the diagnosis and, possibly, prognosis of breast cancer could revolutionize the current status of the management of the breast cancer patient by assisting the radiologist in clinical image interpretation. Lately, a breakthrough in the AI field has been brought about by the introduction of deep learning techniques in general and of convolutional neural networks in particular. Such techniques require no a priori feature space definition from the operator and are able to achieve classification performances which can even surpass human experts. In this paper, we design and validate an ad hoc CNN architecture specialized in breast lesion classification from imaging data only. We explore a total of 260 model architectures in a train-validation-test split in order to propose a model selection criterion which can pose the emphasis on reducing false negatives while still retaining acceptable accuracy. We achieve an area under the receiver operatic characteristics curve of 0.785 (accuracy 71.19%) on the test set, demonstrating how an ad hoc random initialization architecture can and should be fine tuned to a specific problem, especially in biomedical applications.Entities:
Mesh:
Year: 2019 PMID: 31249497 PMCID: PMC6556299 DOI: 10.1155/2019/5982834
Source DB: PubMed Journal: Contrast Media Mol Imaging ISSN: 1555-4309 Impact factor: 3.161
Summary of the annotations available for each image in the CBIS-DDSM dataset. As all these annotations are derived from the image, none of these features were imputed into our classifier.
| Patient_id | Anonymous alphanumeric code |
| Breast_density | 4 (153), 2 (757), 3 (449), 1 (337) |
| Left or right breast | Left (817), right (879) |
| Image view | CC(784), MLO(912) |
| Abnormality id | 1 (1570), 2 (84), 4 (10), 3 (28), 5 (2), 6 (2) (integer index used to label multiple lesions within the same image) |
| Abnormality type | Mass (1696) |
| Mass shape | Irregular (526), round (169), lobulated (399), oval (423), architectural_distortion(158), asymmetric_breast_tissue(26), lymph_node(45) |
| Mass margins | Focal_asymmetric_density (27), n/a (4), spiculated (407), circumscribed (455), ill_defined (472), obscured (308), microlobulated (143), n/a (60) |
| Assessment | 5 (374), 4 (702), 0 (162), 3 (364), 2 (91), 1 (3) |
| Pathology | Malignant (784), benign (771), benign_without_callback (141) |
| Subtlety | 5 (687), 4 (453), 2 (141), 3 (358), 1 (55), 0 (2) |
Figure 1Workflow of our method. The original training set provided by CBIS-DDSM is further divided into a new “training set” and a “validation set.” The new training set is employed to fit the model parameters, and the validation set is employed to validate and compare the performance of each model on an unbiased set of images. The final model is chosen accordingly to its performance of the validation set and its performance quantified in an unbiased manner on the test set. Overall, the split was as follows: training set (1158 images), validation set (160 images), and test set (378 images).
Figure 2Example whole raw images and ROI extraction to be passed to image augmentations.
Figure 3Example of a batch of 16 images from the training set. The ROI from which each image has been generated has been randomly rescaled (independently over the two axes), rotated by a random angle, randomly flipped, and resampled to fit into a pixel frame with aspect ratio 1. Any remaining area not filled by the image is padded with an array of pixels drawn from the edge of the image.
Figure 4Overall architecture of the model (adapted from [28]).
Figure 5(a) Receiver operating characteristic (ROC) curves for a subsample of the architectures tested on the validation set (AUCs obtained on the validation set are shown in the legend). (b) ROC curve related to our best performing model (model 1: selected according to AUC on the validation set and model 2: selected according to F2 statistics on the validation set) when evaluated on the test set.
Figure 6Example images that are easy to classify: (a) image of a benign lesion that is easily categorized as a benign lesion (score 2.2 × 10−9 from model 1 on a scale from 0 to 1); (b) image of a malignant lesion that is easily categorized as a malignant lesion (score 1.0 from model 1 on a scale from 0 to 1).
Figure 7Example images that are very difficult to classify: (a) image of a benign lesion that is falsely categorized as a malignant lesion (score 0.99992 from model 1 on a scale from 0 to 1); (b) image of a malignant lesion that is falsely categorized as a benign lesion (score .0133 from model 1 on a scale from 0 to 1).
Performance statistics for our best performing models as evaluated on the test set.
|
| |||||||||
| Accuracy | PPV (precision) | FDR | TPR (recall, sensitivity) | FNR (missrate) | FPR (fall out) | TN (specificity) |
|
|
|
| 71.19% | 59.80% | 40.20% | 84.40% | 15.60% | 37.56% | 62.44% | 70.00% | 77.98% | 63.50% |
|
| |||||||||
|
| |||||||||
| Accuracy | PPV (precision) | FDR | TPR (recall, sensitivity) | FNR (missrate) | FPR (fallout) | TN (specificity) |
|
|
|
| 55.93% | 47.40% | 52.60% | 97.16% | 2.84% | 71.36% | 28.64% | 63.72% | 80.30% | 52.81% |