Literature DB >> 32642525

Dataset of breast mammography images with masses.

Mei-Ling Huang1, Ting-Yu Lin1.   

Abstract

Among many cancers, breast cancer is the second most common cause of death in women. Early detection and early treatment reduce breast cancer mortality. Mammography plays an important role in breast cancer screening because it can detect early breast masses or calcification region. One of the drawbacks in breast mammography is breast cancer masses are more difficult to be found in extremely dense breast tissue. We select 106 breast mammography images with masses from INbreast database. Through data augmentation, the number of breast mammography images was increased to 7632. We utilize data augmentation on breast mammography images, and then apply the Convolutional Neural Networks (CNN) models including AlexNet, DenseNet, and ShuffleNet to classify these breast mammography images.
© 2020 Published by Elsevier Inc.

Entities:  

Keywords:  Breast density; Breast mammography images; Breast mass; Contrast limited adaptive histogram equalization; Data augmentation

Year:  2020        PMID: 32642525      PMCID: PMC7334406          DOI: 10.1016/j.dib.2020.105928

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

Breast density affects the diagnosis of breast cancer. The dataset combines four breast densities with benign or malignant status to become eight groups for breast mammography images. The dataset helps physicians for early detection and treatment to reduce breast cancer mortality. The numbers of images in the dataset are increased through data augmentation. It allows the model to learn more pictures of different situations and angles to accurately classify new images. Different machine learning and deep learning algorithms can be used to model the data and predict the classification results.

Data description

Mammography images of INbreast database was originally collected from Centro Hospitalar de S. Joao [CHSJ], Breast center, Porto. INbreast database collects data from Aug. 2008 to July 2010, which contains 115 cases with a total of 410 images [1]. Among them, 90 cases were women with disease on both breasts. There are four different types of breast diseases recorded in the database, including Mass, Calcification, Asymmetries, and Distortions. The images of this database have two perspectives of Craniocaudal (CC) and medilateral oblique (MLO), and the breast density is divided into four categories according to BI-RADS standards [2], which are Entirely fat (Density 1), Scattered fibroglandular densities (Density 2), Heterogeneously dense (Density 3), and Extremely dense (Density 4). Imagegs were saved in two sizes: 3328 X 4084 or 2560 X 3328 pixels in DICOM [2]. Among 410 mammograms in INbreast database, 106 images were breast mass and were selected in this study. Through data augmentation, the number of breast mammography images was increased to 7632 in this study. Fig. 1 presents examples of breast mammography images with masses for four density categories with benign or malignant status: (a) Density 1 with breast mass Benign; (b) Density 1 with breast mass Malignant; (c) Density 2 with breast mass Benign; (d) Density 2 with breast mass Malignant; (e) Density 3 with breast mass Benign; (f) Density 3 with breast mass Malignant; (g) Density 4 with breast mass Benign; (h) Density 4 with breast mass Malignant. Compared to benign masses, the shapes of malignant masses are irregular.
Fig. 1

Breast Masses for four density categories with benign or malignant status.

Breast Masses for four density categories with benign or malignant status.

Experimental design, materials, and methods

Data collection

Each image was marked with its corresponding breast density and the original images in INbreast database are DICOM files. We converted the DICOM files to PNG files through Matlab R2019a [3]. Combining four breast density categories and breast benign or malignant status, therefore, there are 8 categories in our classification task. The eight categories are: The category of breast density is 1 and breast mass is benign (Density1+Benign) The category of breast density is 1 and breast mass is malignant (Density1+Malignant) The category of breast density is 2 and breast mass is benign (Density2+Benign) The category of breast density is 2 and breast mass is malignant (Density2+Malignant) The category of breast density is 3 and breast mass is benign (Density3+Benign) The category of breast density is 3 and breast mass is malignant (Density3+Malignant) The category of breast density is 4 and breast mass is benign (Density4+Benign) The category of breast density is 4 and breast mass is malignant (Density4+Malignant). Table 1 displays the number of images selected from INbreast dataset for each breast density with bening or malignant class labels. The number of (a) Density 1 with breast mass Benign; (b) Density 1 with breast mass Malignant; (c) Density 2 with breast mass Benign; (d) Density 2 with breast mass Malignant; (e) Density 3 with breast mass Benign; (f) Density 3 with breast mass Malignant; (g) Density 4 with breast mass Benign; (h) Density 4 with breast mass Malignant are 12, 30, 4, 32, 13, 8, 6, and 1, respectively.
Table 1

Number of images for breast density with benign and malignant class labels.

CategoryNumber
(a)Density1+Benign12
(b)Density1+Malignant30
(c)Density2+Benign4
(d)Density2+Malignant32
(e)Density3+Benign13
(f)Density3+Malignant8
(g)Density4+Benign6
(h)Density4+Malignant1
Total106
Number of images for breast density with benign and malignant class labels.

Pre-processing

The image preprocessing method contrast limited adaptive histogram equalization (CLAHE) was used on the original 106 images. Fig. 1 presents examples of breast mammography images with masses for four categories after CLAHE processing: (a) Density 1 with breast mass Benign; (b) Density 1 with breast mass Malignant; (c) Density 2 with breast mass Benign; (d) Density 2 with breast mass Malignant; (e) Density 3 with breast mass Benign; (f) Density 3 with breast mass Malignant; (g) Density 4 with breast mass Benign; (h) Density 4 with breast mass Malignant. Compared with Fig. 1, it can be seen from Fig. 2 that the mass location of the image after CLAHE processing is clearer than the original image. We have 106 original images and another 106 images after CLAHE processing, so there are 106 * 2 = 212 images. Table 2 presents the number of images for 8 categories in training and testing sets after CLAHE processing.
Fig. 2

The image after CLAHE processing.

Table 2

Number of images before image augmentation.

CategoryImage Before Data Augmentation
AllTrainingTesting
1Density1+Benign24195
2Density1+Malignant604812
3Density2+Benign862
4Density2+Malignant645113
5Density3+Benign2625
6Density3+Malignant16133
7Density4+Benign12102
8Density4+Malignant220
Total21217042
The image after CLAHE processing. Number of images before image augmentation.

Data augmentation

In addition to CLAHE, we further perform data augmentation with multi-angle rotation (θ = 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330°), and then flips the original image and 11 angle rotation images horizontally and vertically. The method not only increases the number of samples, but also prevents the problem of overfitting. Fig. 3 is an example to show the original image, images with multi-angle rotation, and images with horizontally and vertically flipping.
Fig. 3

Example of original image and images after data augmentation.

Example of original image and images after data augmentation. The number of images after image augmentation is 7632. The number of images in the training set and testing set for 8 categories after image augmentation are shown in Table 3.
Table 3

Number of images after image augmentation.

CategoryImage After Data Augmentation
AllTrainingTesting
1Density1+Benign864691173
2Density1+Malignant21601728432
3Density2+Benign28823058
4Density2+Malignant23041843461
5Density3+Benign936749187
6Density3+Malignant576461115
7Density4+Benign43234686
8Density4+Malignant725814
Total763261061526
Number of images after image augmentation. The dataset in this study was built to be used in convolutional neural network including AlexNet, DenseNet, and ShuffleNet for the classification of benign and malignant mammograms. Due to different image sizes required by different CNN models, we resized the original images from 3328 x 4084 and 2560 x 3328 pixels into 224 × 224 pixels for ShuffleNet and DenseNet, and 227 × 227 pixels for AlexNet.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
SubjectMedicine and Dentistry
Specific subject areaRadiology and imaging
Type of dataRaw and analyzed
How data were acquiredThe data was obtained from Breast center in CHSJ, Porto.
Data formatPNG
Parameters for data collectionAmong 410 mammograms in INbreast database, 106 images were breast mass and were selected in this study.
Description of data collectionThrough data augmentation, the number of breast mammography images was increased to 7632 in this study.
Data source locationCentro Hospitalar de S. Joao [CHSJ], Breast center, Porto
Data accessibilityhttp://dx.doi.org/10.17632/x7bvzv6cvr.1
  2 in total

Review 1.  Image Augmentation Techniques for Mammogram Analysis.

Authors:  Parita Oza; Paawan Sharma; Samir Patel; Festus Adedoyin; Alessandro Bruno
Journal:  J Imaging       Date:  2022-05-20

2.  Connected-SegNets: A Deep Learning Model for Breast Tumor Segmentation from X-ray Images.

Authors:  Mohammad Alkhaleefah; Tan-Hsu Tan; Chuan-Hsun Chang; Tzu-Chuan Wang; Shang-Chih Ma; Lena Chang; Yang-Lang Chang
Journal:  Cancers (Basel)       Date:  2022-08-20       Impact factor: 6.575

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.