Literature DB >> 33052172

Convolutional neural network-based models for diagnosis of breast cancer.

Mehedi Masud¹, Amr E Eldin Rashed¹, M Shamim Hossain^2,3.

Abstract

Breast cancer is the most prevailing cancer in the world and each year affecting millions of women. It is also the cause of largest number of deaths in women dying in cancers. During the last few years, researchers are proposing different convolutional neural network models in order to facilitate diagnostic process of breast cancer. Convolutional neural networks are showing promising results to classify cancers using image datasets. There is still a lack of standard models which can claim the best model because of unavailability of large datasets that can be used for models' training and validation. Hence, researchers are now focusing on leveraging the transfer learning approach using pre-trained models as feature extractors that are trained over millions of different images. With this motivation, this paper considers eight different fine-tuned pre-trained models to observe how these models classify breast cancers applying on ultrasound images. We also propose a shallow custom convolutional neural network that outperforms the pre-trained models with respect to different performance metrics. The proposed model shows 100% accuracy and achieves 1.0 AUC score, whereas the best pre-trained model shows 92% accuracy and 0.972 AUC score. In order to avoid biasness, the model is trained using the fivefold cross validation technique. Moreover, the model is faster in training than the pre-trained models and requires a small number of trainable parameters. The Grad-CAM heat map visualization technique also shows how perfectly the proposed model extracts important features to classify breast cancers. © Springer-Verlag London Ltd., part of Springer Nature 2020.

Entities: Chemical

Keywords: Breast cancer; Convolutional neural network; Deep learning; Transfer learning

Year: 2020 PMID： 33052172 PMCID： PMC7545025 DOI： 10.1007/s00521-020-05394-5

Source DB: PubMed Journal: Neural Comput Appl ISSN： 0941-0643 Impact factor: 5.102

Introduction

Breast cancer is affecting millions of women every year in the world and is the reason of highest cause of deaths by cancers among women [1]. The survival rates of breast cancer largely vary in the countries. In North America, it is greater than 80%, in Sweden and Japan it is around 60%, while in low-income countries it is below 40% [1]. The main reason of low survival rate in low-income countries is the lack of programs for early detection and the shortage of enough diagnosis and healthcare facilities. Therefore, it is vital to detect breast cancer in the pre-mature stage to minimize the rate of mortality. Mammography and ultrasound images are the common tools to identify cancers by the experts and requires expert radiologists. Manual process may cause to generate high false positive and false negative numbers. Therefore, nowadays computer aided diagnosis systems (CADs) are vastly used to aid radiologists during the process of decision making in identifying cancers. The CAD systems now potentially reducing the efforts of radiologists and minimizing the false positive and negative numbers in diagnosis. Machine learning techniques have showed significant performance in various healthcare applications over the traditional computer aided systems for disease diagnosis and patient monitoring [2]. However, the traditional machine learning techniques involve a hand-created step for extraction of features which is very difficult sometimes. It also requires domain knowledge and an expert radiologist. Meanwhile, deep learning (DL) models automatically develop a learning process adaptively and can extract features from the input dataset considering the target output [3, 4]. The DL methods tremendously reduce the exhaustive process of data engineering and feature extraction while enabling the reusability of the methods. Numerous researches [5] have been conducted to study breast cancer images from various perceptions. Machine learning (ML), convolutional neural networks (CNNs), and deep learning methods are now widely used to classify breast cancers from the breast images. CNN models have been effectively used in the wide-ranging computer vision fields for years [6, 7]. Since the last few years numerous researches have been conducted applying CNN-based deep learning deep architectures for disease diagnosis. A CNN-based image recognition and classification model perhaps first applied in the competition of ImageNet [8]. After then CNN-based models are currently considered in various applications, for example, image segmentation in medical image processing, feature extraction from images, finding region of interests, object detection, natural language processing, etc. CNN has incredible trainable parameters at the various layers that are applied for extracting important features at various abstraction levels [9]. Meanwhile, a CNN model needs a huge dataset to train. Particularly in the medical field medical dataset may not be always possible to obtain. Moreover, a CNN model requires high speed computing resources to train and tune its hyper parameters. To overcome the data unavailability transfer learning techniques at present vastly applied in classification of medical images. Applying transfer learning techniques, a model can use knowledge from other the pre-trained models (e.g., VGG16 [10], AlexNet [11], DenseNet [12], etc.) that are trained over a huge dataset to classify images. This lessens the requirement of data linked to the problem we are tackling with. The pre-trained models often are used as feature extractors in images from abstract level to more detailed levels. Transfer learning techniques using pre-trained models have shown promising results in different medical diagnosis, such as chest X-ray image analysis for pneumonia and COVID-19 patients’ identification [13], retina image analysis for blind person classification, MRI image analysis for brain tumor classification, etc. Deep learning models leveraging CNNs are used widely to classify breast cancers. We now discuss some of the promising researches that have been proposed using CNN. Authors in [14] proposed a learning framework leveraging deep learning architecture that can learn features automatically form mammography images in order to identify cancer. The framework was tested on the BCDR-FM dataset. Although they showed improved results, however, did not compare employing pre-trained models. Authors in [15] considered AlexNet as a feature extractor for mass diagnosis in mammography images. Support Vector Machin (SVM) is applied as a classification model after AlexNet generates features. The outcome of the proposed model is higher compared to the analytically feature extractor method. In our approach, we considered eight different pre-trained models and show their performances using ultrasound images. Authors in [16] considered transfer learning approach using GoogleNet [17] and AlexNet pre-trained models and some preprocessing techniques. The model is applied on mammograms images, where cancers are already segmented. The authors claim that the model achieves improved performance than the human involved methods. Authors in [18] proposed a convolutional neural network leveraging Inception-v3 pre-trained model to classify breast cancer using breast ultrasound images. The model supports facility for extracting Multiview features. The model is trained only on 316 images and achieved 0.9468 AUC, 0.886 sensitivity, and 0.876 specificity. Authors in [19] developed an ensembled CNN model leveraging VGG192 and ResNet1523 pre-trained models with fine tuning. The authors considered a dataset managed by JABTS. There are 1536 breast masses that include 897 malignant and 639 benign cases. The model achieves 0.951 AUC, 90.9% sensitivity, and 87.0 specificity. Authors in [20] developed another ensemble-based computer aided diagnosis (CAD) system combining VGGNet, ResNet, and DenseNet pre-trained models. They considered a private database that consists of 1687 images that includes 953 benign and 734 malignant cases. The model achieved 91.0% accuracy and 0.9697 AUC score. The model is also tested on the BUSI dataset. In this dataset, the model achieved 94.62% accuracy and 0.9711 AUC score. Authors in [21] implemented two different approaches (1) a CNN and (2) a transfer learning to classify breast cancer from combining two sets of datasets, one containing 780 images and another containing 163 images. The model showed better performance results combining traditional and generative adversarial network augmentation techniques. In the transfer learning approach, the authors compared the performance of four pre-trained models, mainly, VGG16, Inception [22], ResNet, and NASNet [23]. In the combined dataset, the NASNet achieved highest accuracy value 99%. Authors in [24] compared three CNN-based transfer learning models ResNet50, Xception, and InceptionV3, and proposed a base model that consists of three convolutional layers to classify breast cancers from the breast ultrasound images dataset. The dataset comprised of 2058 images that includes 1370 benign and 688 malignant cases. According to their analysis, InceptionV3 showed best accuracy of 85.13% with AUC score 0.91. Authors in [25] analyzed four pre-trained models VGG16, VGG19, InceptionV3, and ResNet50 on a dataset that consists of 5000 breast images comprised of 2500 benign and 2500 malignant cases. InceptionV3 model achieved the highest AUC of 0.905. Authors in [26] proposed a CNN model for breast cancer classification considering the local and frequency domain information using Histopathological images. The objective is to utilize the important information of images that are carried by the local and frequency domain information which sometime shows better accuracy for the model. The proposed model is applied on the BreakHis dataset. However, the model obtained 94.94% accuracy. Authors in [27] proposed a novel deep neural network consisting of clustering method and CNN model for breast cancer classification using Histopathological images. The model is based on CNN, a Long-Short-Term-Memory (LSTM), and a mixture of the CNN and LSTM models. In the model, both Softmax and SVM are applied at the classifier layer. However, the model achieved 91% accuracy. From the above discussion, it is evident researchers still on the search for a better model to classify breast cancers. In order to overcome the scarcity of datasets, this research combines two publicly available ultrasound image datasets. Then eight different pre-trained models after fine tuning are applied on the combined dataset to observe the performance results of breast cancer classification. However, the pre-trained models did not show expected outcome. Therefore, we also develop a shallow CNN-based model. The model outperforms all the fine-tuned pre-trained models in all the performance metrics. The proposed model is also faster in training. We also employed different evaluation techniques to prove the better outcome of the proposed model. The details of the methods study, evaluation results and discussion are presented in Sect. 3. The paper is organized as follows: Sect. 2 discusses materials and methods that are used for the purpose of breast cancer classification. Section 3 proposes the custom CNN model. Section 4 discusses evaluation results of the pre-trained models and the proposed custom. Finally, the paper concludes in Sect. 5.

Materials and methods

In this research, we consider two publicly available breast ultrasound image datasets [28, 29]. The two datasets are considered mainly for two reasons: (1) to increase the size of the dataset for the training purpose in order to avoid overfitting and biasness and (2) to consider three classes (benign, malignant and normal). Combining the datasets also will improve the reliability of the model. Dataset in [28] contains 250 images in which there are two categories: malignant and benign cases. The size of the images is different. The minimum and the maximum size of the images are 57 × 75 and 61 × 199 pixels with gray and RGB colors, respectively. Therefore, all the images are transformed into gray color to fit into the model. The dataset in [29] contains 780 images, in which there are three categories: malignant, benign, and normal cases. The average image size of the images is 500 × 500 pixels. The breast ultrasound images are collected from 600 women in 2018, and the age range of the women is between 25 and 75 years. Table 1 shows the class distribution of the images in the two datasets. Figure 1 demonstrates examples of ultrasound images of different cases in the two datasets.

Table 1

Datasets and distribution of classes

Datasets	Benign	Malignant	Normal	Total samples
Dataset 1	100	150	–	250
Dataset 2	437	210	133	780
Total	537	360	133	1030

Fig. 1

Sample images of breast ultrasound images of different cases in the two datasets

Datasets and distribution of classes Sample images of breast ultrasound images of different cases in the two datasets Data normalization is an important pre-processing phase before feeding the data into a model for training. With pre-processing the data features become easily interpretable by the model. Lack of correct pre-processing makes the model slow in training and unstable. Generally, standardization and normalization techniques are used in scaling data. Normalization technique rescales the data values between 0 and 1. Since the datasets that are considered In this research, are both gray and color images, hence the values of the pixels lie between 0 and 255. We consider zero-centering approach that shifts the distribution data values in such a way that its mean becomes equal to zero. Assume a dataset D, that consists of N samples and M features. Therefore, D[:, i] denotes ith feature and D[j, :] denotes sample j. The equation below defines zero-centering. In this research, we employed k-fold (k = 5) cross-validation on the dataset to overcome overfitting problem during model training. In k-fold cross validation method, K different datasets of same size is generated, where each fold is used to validate the model, and k−1 folds are considered for the purpose of model training. This ensures that the model produces reliable accuracy. Cross-validation is a widely used mechanism to resample data for evaluating machine learning models when the dataset sample size is small. Cross-validation is mainly considered to approximate the learning skill of a machine learning model using data which the model has not seen previously. The result of a model obtained using cross-validation is normally less biased or gives optimistic estimation skill of the model compared to train/test split method. Table 2 shows how the fivefold cross-validation generates five different datasets of ultrasound images from the two datasets.

Table 2

Data distribution in fivefolds after applying fivefold cross validation

	Train			Test
	Benign	Malignant	Normal	Benign	Malignant	Normal
Fold 1	429	288	107	108	72	26
Fold 2	429	288	107	108	72	26
Fold 3	430	288	106	107	72	27
Fold 4	430	288	106	107	72	27
Fold 5	430	288	106	107	72	27

Data distribution in fivefolds after applying fivefold cross validation

Fine-tuned pre-trained CNN models and the proposed custom model

During the last few years, transfer learning algorithms are widely used in many research problems in machine learning which concentrate on preserving knowledge acquired during unraveling one problem and employing the knowledge into another but a relevant problem. For example, an algorithm that is trained to learn in recognizing dogs can be applied to recognize horses. Authors in [30] formally define the transfer learning in terms of domain and task as follows: Let an arbitrary domain D = {X, P(X)}. Here X denotes a feature vector {x1, x2, …, x} and the probability distribution in X is denoted by P(X). A task T on D is defined as T = {Y, F(.)}, where Y = {y1, y2,…, y} denotes the domain of labels of X and F(.) denotes a predictive function which learns from {x, y}∈{X, Y}. Then F(.) is applied to predict the corresponding label F(x’) in an unknown instance x’. Hence, the transfer learning is defined as: consider two pairs of domains (D, T) and (D, T), where, D is a source domain and D is a target domain. T and T are learning tasks of D and D, respectively. The goal of a transfer learning technique is to enhance the learning of the predictive function F(.) in D by applying the knowledge learned from D and T, such that D ≠ D, T ≠ T. One of the reasons that the transfer learning algorithms being used when small size dataset is available to train a custom model, but the goal is to produce an accurate model. A custom model employing transfer learning, applies the knowledge of the pre-trained models that are trained over a huge dataset for a long duration. There are mainly two approaches to apply transfer learning: (i) model developing and (ii) using pre-trained models. The pre-trained model approach is widely used in deep learning domain. Considering the importance of the pre-trained models as feature extractors this research implements eight pre-trained models using the weights of the convolutional layers of the pre-trained models. These weights act as feature extractors for classifying breast cancers applying on the ultrasound images. Table 3 shows the pre-trained models that are considered in this research. All the models are built on convolutional neural network and were trained on the ImageNet database [31] that consists of a million images. The models can classify 1000 objects (mouse, keyboard, pencil, and many animals) from different images. Therefore, all the models have learned huge feature representations from a large number of images.

Table 3

Pre-trained models and their image input size

Pre-Trained models	Number of convolutional layers	Input size
AlexNet	8	227 × 227
DarkNet19	19	256 × 256
GoogleNet	22	224 × 224
MobileNet-v2	53	224 × 224
ResNet18	18	224 × 224
ResNet50	50	224 × 224
VGG16	16	224 × 224
Xception	71	229 × 229

Pre-trained models and their image input size From the Table 3, we see that different models us different input size. Therefore, the images in the dataset are transformed accordingly to feed into the models. In the fine-tuning process of the pre-trained models, the final layer is substituted with a classifier that can classify three objects since the dataset consists of images with three classes (normal, malignant, and benign). Hence the models are fine tuned at the top layers. In the fine-tuning process the last three layers of the models are substituted with (i) a fully connected layer (ii) Softmax activation layer, and (iii) a custom classifier. We considered three different optimizers to train the models and to determine which model produces the best results. The brief description of the optimizers is given below: Stochastic Gradient Descent with Momentum (SGDM) is the fundamental optimizer in neural network that is used for the convergence of neural networks, i.e., moving in the direction of the optimum cost function. The following equation is used to update neural network parameters to calculate the gradient .here : a parameter (weights, biases and activations), : learning rate, : is the gradient, and is cost function. Root Mean Square Propagation (RMSprop) is formulated by Geoffrey Hinton. RMSprop attempts to reduce the oscillations. It also adjusts learning rate automatically. In addition, for each parameter RMSprop selects a different learning rate. In RMSprop, an update is done according to the equations described below. For each parameter here : initial learning, : exponential average of squares of gradients, and : gradient at time along . Adam optimizer associates the heuristics of momentum and RMSprop. The equation is given below. For each parameter here : initial learning, : exponential average of gradients and : gradient at time t along : exponential average of squares of gradients along , are hyperparameters. The fine-tuned pre-trained models used Softmax activation function to generate the probability between the range 0 and 1 of the class outcomes from the input images. Using Softmax activation function at the end of a CNN model to convert its outcome scores into a normalized probability distribution is a very well-known practice. Softmax function is defined with following equation:where is a input vector, are the elements in , is the exponential function, and

Proposed custom model

Figure 2 shows the proposed custom model. The model consists of one convolutional layer consists of 20 filters of 5 × 3 × 3 with stride [1 1] and padding [0 0 0 0]. The input image size for the model is 227*227*3, and the images are normalized with the zero-centering approach. The model applies batch normalization with 20 channels. It also consists of one max pooling layer, one fully connected layer. Dropout regularization is also added after the fully connected layer. Finally, SoftMax activation function is applied since the model needs to classify three classes. The initial learning rate 1.0000e−04 is considered during training. The model also considers mini-batch size 8. The model is trained using three optimizers as the pre-trained models are trained. The model is trained and validated using the configuration of Table 4.

Fig. 2

The architecture of the custom model

Table 4

Training parameters of the pre-trained models

Training options	Value
Initial learn rate	1.0000e−04 or 10e−5
Learn rate schedule	Constant or piece wise
Mini-batch size	8
Shuffle	Every epoch
Optimizer	Adam, RMSprop, SGDM
Cross validation	5
Max epochs	20
ExecutionEnvironment	gpu

The architecture of the custom model Training parameters of the pre-trained models

Performance measure

The performance of the fine-tuned pre-trained models are evaluated with various standard performance of metrics. The metrics are accuracy (ACC), Area Under Curve (AUC), precision, recall, sensitivity, specificity, and F1-score. Confusion matrix for each model is also generated to observe the scores of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of normal, malignant, and benign cases. The TP (e.g., malignant) score represents how the model correctly classify real malignant cases as malignant. The FP (e.g., malignant) represents how the model wrongly classifies benign cases as malignant. Similarly, TN (e.g., benign) score represents how the model correctly classifies benign cases as benign, and FN (e.g., benign) score represents how the model wrongly classifies malignant cases as benign. Another important metric is the precision that demonstrates the performance of a model in terms of proportion of the truly classified patients as malignant, benign, and normal cases. Meanwhile, sensitivity or recall value shows the proportion of a case (e.g., malignant) a model truly classifies as malignant cases. Specificity demonstrates the percentage of a case (e.g., benign) that a model classifies correctly. Through the F1-score we achieve a single score from precision and recall through evaluating their harmonic mean. In the below, we show the formula of different metrics.

Results and discussion

The scores of performance evaluation of the fine-tuned pre-trained models as well as the custom model are shown are Table 5. Among the pre-trained models ResNet50 demonstrates the highest accuracy score 92.04 with Adam optimizer and the highest precision score 92.73 with RMSprop optimizer. VGG16 achieves the highest AUC score 0.972 with Adam optimizer. GoogleNet achieved the highest score 93.85 in sensitivity and recall using RMSprop optimizer. ResNet18 achieved the highest score 92.29 in specificity metric. Compared to all the pre-models the proposed custom model outperforms in all the evaluation metrics. The proposed model obtained 100% accuracy and 1.0 AUC score using Adam optimizer. In other metrics, the proposed model also shows better results. Table 6 summarizes the models’ performance with the best scores with different evaluation metrics and compares the results with the proposed CNN model.

Table 5

Evaluation results of the pre-trained models and the custom model

Models	Optimizer	AUC	Accuracy	Sensitivity	Specificity	Precision	Recall
AlexNet	SGDM	0.967	89.71	90.88	88.44	89.54	90.88
	Adam	0.943	84.85	85.85	83.77	85.21	85.85
	RMSprop	0.919	82.62	81.38	83.98	84.69	81.38
Darknet19	SGDM	0.948	87.18	88.27	86	87.29	88.27
	Adam	0.958	88.35	88.27	88.44	89.27	88.27
	RMSprop	0.959	88.93	89.57	88.24	89.24	89.57
GoogleNet	SGDM	0.959	88.93	91.25	86.41	87.97	91.25
	Adam	0.966	90.49	93.11	87.63	89.13	93.11
	RMSprop	0.96	91.26	93.85	88.44	89.84	93.85
MobileNet	SGDM	0.944	85.73	85.1	86.41	87.21	85.1
	Adam	0.969	90.19	89.01	91.48	91.92	89.01
	RMSprop	0.961	90.68	90.32	91.08	91.68	90.32
ResNet18	SGDM	0.948	87.67	89.94	85.19	86.87	89.94
	Adam	0.967	90	90.13	89.86	90.64	90.13
	RMSprop	0.97	91.26	90.32	92.29	92.73	90.32
ResNet50	SGDM	0.952	87.77	83.46	88.41	51.63	83.46
	Adam	0.971	92.04	93.23	91.86	62.94	93.23
	RMSprop	0.971	91.84	90.23	92.08	62.83	90.23
VGG16	SGDM	0.965	89.9	91.99	87.63	89.01	91.99
	Adam	0.972	89.9	92.92	86.61	88.32	92.92
	RMSprop	0.955	90.29	92.92	87.42	88.95	92.92
Xception	SGDM	0.924	83.11	83.8	82.35	83.8	83.8
	Adam	0.968	90.68	89.94	91.48	92	89.94
	RMSprop	0.968	91.17	91.43	90.87	91.6	91.43

Table 6

Comparison results of the best pre-trained models and the custom model

Pre-trained models				Custom model
Performance metric	Score	Models	Optimizers	Score	Optimizer
AUC	0.972	VGG16	Adam	1.0	Adam, RMSprop
Accuracy	92.04	ResNet50	Adam	100	Adam
Sensitivity	93.85	GoogleNet	RMSprop	100	SGDM, Adam, RMSprop
Specificity	92.29	ResNet18	RMSprop	100	Adam
Precision	92.73	ResNet50	RMSprop P	100	Adam
Recall	93.85	GoogleNet	RMSprop	100	Adam

Evaluation results of the pre-trained models and the custom model Comparison results of the best pre-trained models and the custom model Figure 3 shows the confusion matrix generated from the different pre-trained models as well as the proposed custom model. The figure only shows the confusion matrix of the best pre-trained models as mentioned in the Table 6. From the confusion matrix of the custom model, we observe that in all the classification of breast cancers the score is high. For example, the model classifies 100% benign class, 100% malignant class, and 100% normal classes using Adam optimizer. The results also outperform the results of the pre-trained models. Table 7 shows the classification results of the models.

Fig. 3

Confusion matrix of different pre-rained models and the custom model

Table 7

Comparison of classification results (the custom model (Adam optimizer) and best pre-trained models)

Models	Benign (537 original case)	Malignant (360 original case)	Normal (133 original case)
Custom model	537	360	133
GoogleNet	504	319	117
ResNet18	485	331	124
ResNet50	494	330	124
VGG16	499	310	121

Confusion matrix of different pre-rained models and the custom model Comparison of classification results (the custom model (Adam optimizer) and best pre-trained models) Table 8 shows performance comparison results between the custom and the pre-trained models. The custom model outperforms all the pre-trained models with respect to accuracy, prediction time and number of parameters. The custom model is also very fast in training than all the fine-tuned pre-trained models. The reason is that the custom model has only one fully connected layer. In addition, the custom model requires a very small number of trainable parameters compared to the other models. All the models are trained in a GPU (NVIDIA® GeForce GTX 1660 Ti with Max-Q design and 6 GB RAM) considering a mini-batch size of 8. Figure 4 shows execution time and the accuracy score of each model. To calculate accurate time, we run the code four times. The area of each marker in the Fig. 4 shows the size of the number of parameters in the networks. The time of models’ prediction is calculated with respect to the fastest network. From the plot, it is quite evident that that custom model is fast and training and produces higher accuracy than the other pre-trained models. Figure 5 shows the accuracy and loss values when the custom model is trained and validated. From the graph in Fig. 5, it is evident that the custom model generates very high accuracy result as claimed in Table 8.

Table 8

Comparison of the pre-trained models and the custom model (accuracy, prediction time, and parameters)

Performance comparison
Model	Custom model	AlexNet	DarkNet19	GoogleNet	MobileNet	ResNet50	ResNet18	VGG16	Xception
Accuracy	100	89.7	88.9	91.3	90.7	92	91.3	90.3	91.2
Prediction time (second)	0.00495	0.04564	0.06824	0.018441	0.245589	0.02194	0.007419	0.04667	0.13845
Number of parameters	2985263	56880515	20838179	5976627	4233133	23487619	11173251	1.36E + 08	20808883

Fig. 4

Performance comparison of different pre-rained models and the custom model

Fig. 5

Accuracy and loss of the custom model during training phase

Comparison of the pre-trained models and the custom model (accuracy, prediction time, and parameters) Performance comparison of different pre-rained models and the custom model Accuracy and loss of the custom model during training phase

Heat map visualization

The custom model’s performance is also evaluated by generating heat map visualization using Grad-CAM tool [32] to see how the model identifies the region of interest and how well the model distinguishes cancer classes. Grad-CAM is used to judge whether a model identifies the key areas in the images for prediction. Grad-CAM visualizes the portion of an image through heatmap of a class label that the model focuses for prediction. Figure 6 shows a sample Grad-CAM output of benign and malignant classes and prediction probability. From the output, we observe that the model perfectly focuses on the key areas of images to classify cancers.

Fig. 6

Heat map visualization of same images using the custom model

Conclusion

This study implemented eight pre-trained CNN models with fine tuning leveraging transfer learning to observe the classification performance of breast cancer from ultrasound images. The images are combined from two different datasets. We evaluated the fine-tuned pre-trained models applying the Adam, RMSprop, and SGDM optimizers. The highest accuracy 92.4% is achieved by the ResNet50 with Adam optimizer and the highest AUC 0.97 score is achieved by VGG16. We also proposed a shallow custom model since the pre-trained models have not shown expected results and all the pre-trained models have many convolutional layers and need long duration in the training phase. The proposed custom model consists of only one convolutional layer as feature extractors. The custom model achieved 100% accuracy and 1.0AUC value. With respect to training time, the custom model is faster than any other model and needs small size of trainable parameters. The future plan is to validate the model with other datasets that include new ultrasound images.

10 in total

1. Representation learning for mammography mass lesion classification with convolutional neural networks.

Authors: John Arevalo; Fabio A González; Raúl Ramos-Pollán; Jose L Oliveira; Miguel Angel Guevara Lopez
Journal: Comput Methods Programs Biomed Date: 2016-01-07 Impact factor: 5.428

2. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks.

Authors: Woo Kyung Moon; Yan-Wei Lee; Hao-Hsiang Ke; Su Hyun Lee; Chiun-Sheng Huang; Ruey-Feng Chang
Journal: Comput Methods Programs Biomed Date: 2020-01-25 Impact factor: 5.428

3. Digital mammographic tumor classification using transfer learning from deep convolutional neural networks.

Authors: Benjamin Q Huynh; Hui Li; Maryellen L Giger
Journal: J Med Imaging (Bellingham) Date: 2016-08-22

4. Computer-aided diagnosis system for breast ultrasound images using deep learning.

Authors: Hiroki Tanaka; Shih-Wei Chiu; Takanori Watanabe; Setsuko Kaoku; Takuhiro Yamaguchi
Journal: Phys Med Biol Date: 2019-12-05 Impact factor: 3.609

5. Breast Cancer Classification in Automated Breast Ultrasound Using Multiview Convolutional Neural Network with Transfer Learning.

Authors: Yi Wang; Eun Jung Choi; Younhee Choi; Hao Zhang; Gong Yong Jin; Seok-Bum Ko
Journal: Ultrasound Med Biol Date: 2020-02-12 Impact factor: 2.998

6. Comparison of Transferred Deep Neural Networks in Ultrasonic Breast Masses Discrimination.

Authors: Ting Xiao; Lei Liu; Kai Li; Wenjian Qin; Shaode Yu; Zhicheng Li
Journal: Biomed Res Int Date: 2018-06-21 Impact factor: 3.411

7. Diagnostic Efficiency of the Breast Ultrasound Computer-Aided Prediction Model Based on Convolutional Neural Network in Breast Cancer.

Authors: Heqing Zhang; Lin Han; Ke Chen; Yulan Peng; Jiangli Lin
Journal: J Digit Imaging Date: 2020-10 Impact factor: 4.056

8. Improving EEG-Based Emotion Classification Using Conditional Transfer Learning.

Authors: Yuan-Pin Lin; Tzyy-Ping Jung
Journal: Front Hum Neurosci Date: 2017-06-27 Impact factor: 3.169

9. Histopathological Breast Cancer Image Classification by Deep Neural Network Techniques Guided by Local Clustering.

Authors: Abdullah-Al Nahid; Mohamad Ali Mehrabi; Yinan Kong
Journal: Biomed Res Int Date: 2018-03-07 Impact factor: 3.411

10. Dataset of breast ultrasound images.

Authors: Walid Al-Dhabyani; Mohammed Gomaa; Hussien Khaled; Aly Fahmy
Journal: Data Brief Date: 2019-11-21

10 in total

9 in total

1. A Novel Hybrid Deep Learning Model for Metastatic Cancer Detection.

Authors: Shahab Ahmad; Tahir Ullah; Ijaz Ahmad; Abdulkarem Al-Sharabi; Kalim Ullah; Rehan Ali Khan; Saim Rasheed; Inam Ullah; Md Nasir Uddin; Md Sadek Ali
Journal: Comput Intell Neurosci Date: 2022-06-24

2. CT-ML: Diagnosis of Breast Cancer Based on Ultrasound Images and Time-Dependent Feature Extraction Methods Using Contourlet Transformation and Machine Learning.

Authors: Behnam Hajipour Khire Masjidi; Soufia Bahmani; Fatemeh Sharifi; Mohammad Peivandi; Mohammad Khosravani; Adil Hussein Mohammed
Journal: Comput Intell Neurosci Date: 2022-05-24

3. An optimized deep learning architecture for breast cancer diagnosis based on improved marine predators algorithm.

Authors: Essam H Houssein; Marwa M Emam; Abdelmgeid A Ali
Journal: Neural Comput Appl Date: 2022-06-08 Impact factor: 5.102

4. Boosting Breast Cancer Detection Using Convolutional Neural Network.

Authors: Saad Awadh Alanazi; M M Kamruzzaman; Md Nazirul Islam Sarker; Madallah Alruwaili; Yousef Alhwaiti; Nasser Alshammari; Muhammad Hameed Siddiqi
Journal: J Healthc Eng Date: 2021-04-03 Impact factor: 2.682

5. Breast Cancer Classification from Ultrasound Images Using Probability-Based Optimal Deep Learning Feature Fusion.

Authors: Kiran Jabeen; Muhammad Attique Khan; Majed Alhaisoni; Usman Tariq; Yu-Dong Zhang; Ameer Hamza; Artūras Mickus; Robertas Damaševičius
Journal: Sensors (Basel) Date: 2022-01-21 Impact factor: 3.576

6. Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms.

Authors: A Mary Sowjanya; Owk Mrudula
Journal: Appl Nanosci Date: 2022-02-03 Impact factor: 3.674

7. Semi-supervised vision transformer with adaptive token sampling for breast cancer classification.

Authors: Wei Wang; Ran Jiang; Ning Cui; Qian Li; Feng Yuan; Zhifeng Xiao
Journal: Front Pharmacol Date: 2022-07-22 Impact factor: 5.988

8. Evolution of research trends in artificial intelligence for breast cancer diagnosis and prognosis over the past two decades: A bibliometric analysis.

Authors: Asif Hassan Syed; Tabrej Khan
Journal: Front Oncol Date: 2022-09-23 Impact factor: 5.738

9. A Comparative Analysis of Machine Learning Algorithms to Predict Alzheimer's Disease.

Authors: Morshedul Bari Antor; A H M Shafayet Jamil; Maliha Mamtaz; Mohammad Monirujjaman Khan; Sultan Aljahdali; Manjit Kaur; Parminder Singh; Mehedi Masud
Journal: J Healthc Eng Date: 2021-07-02 Impact factor: 2.682

9 in total