| Literature DB >> 35345799 |
Hager Saleh1, Sara F Abd-El Ghany2, Hashem Alyami3, Wael Alosaimi4.
Abstract
Breast cancer is a dangerous disease with a high morbidity and mortality rate. One of the most important aspects in breast cancer treatment is getting an accurate diagnosis. Machine-learning (ML) and deep learning techniques can help doctors in making diagnosis decisions. This paper proposed the optimized deep recurrent neural network (RNN) model based on RNN and the Keras-Tuner optimization technique for breast cancer diagnosis. The optimized deep RNN consists of the input layer, five hidden layers, five dropout layers, and the output layer. In each hidden layer, we optimized the number of neurons and rate values of the dropout layer. Three feature-selection methods have been used to select the most important features from the database. Five regular ML models, namely decision tree (DT), support vector machine (SVM), random forest (RF), naive Bayes (NB), and K-nearest neighbor algorithm (KNN) were compared with the optimized deep RNN. The regular ML models and the optimized deep RNN have been applied the selected features. The results showed that the optimized deep RNN with the selected features by univariate has achieved the highest performance for CV and the testing results compared to the other models.Entities:
Mesh:
Year: 2022 PMID: 35345799 PMCID: PMC8957426 DOI: 10.1155/2022/1820777
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The main steps of the proposed system of predicting BC.
The breast cancer data set description.
| SI. no. | Attribute | Attribute | Description |
|---|---|---|---|
| 2 | Diagnosis | Diagnosis | The identification of breast tissues (M = malignant, B = benign) |
| 3 | radius_mean | ra_m | Distances from the centre to the perimeter's points are averaged. |
| 4 | texture_mean | tex_m | Gray-scale value standard deviation |
| 5 | perimeter_mean | per_m | The tumor's average size |
| 6 | area_mean | ar_m | _ |
| 7 | smoothness_mean | smo_m | Local difference in radius lengths mean |
| 8 | compactness_mean | com_m | Mean of perimeter2/area−1.0 |
| 9 | concavity_mean | con_m | The average severity of the contour's concave parts |
| 10 | Concave points_mean | Con_po_m | The number of concave contour parts |
| 11 | symmetry_mean | sym_m | _ |
| 12 | fractal_dimension_mean | fra_dim_m | Mean for “approximation of the shoreline”−1 |
| 13 | radius_se | ra_s | Standard error for the mean of lengths from the centre to peripheral points |
| 14 | texture_se | te_s | Grayscale standard deviation standard error |
| 15 | perimeter_se | pe_s | _ |
| 16 | area_se | ar_s | _ |
| 17 | smoothness_se | smo_s | Standard error for local variation in radius lengths |
| 18 | compactness_se | com_s | Standard error for perimeter2/area−1.0 |
| 19 | concavity_se | con_s | Standard error for the severity of the contour's concave parts |
| 20 | Concave points_se | Con_po_s | Number of concave parts of the contour standard error |
| 21 | symmetry_se | sym_s | _ |
| 22 | fractal_dimension_se | fra_dim_s | Standard error for “a rough estimate of the coastline”−1 |
| 23 | radius_worst | rad_w | “Worst” or largest mean value for the average of the distances between the centre and the points on the periphery |
| 24 | texture_worst | tex_w | “Worst” or largest mean value for gray-scale values' standard deviation |
| 25 | perimeter_worst | per_w | _ |
| 26 | area_worst | ar_w | _ |
| 27 | smoothness_worst | smo_w | “Worst” or largest mean value for variation in radius lengths on a local scale |
| 28 | compactness_worst | com_w | “Worst” or largest mean value for perimeter2/area−1.0 |
| 29 | concavity_worst | con_w | “Worst” or largest mean value for the degree to which the contour is concave |
| 30 | Concave points_worst | Con_po_w | “Worst” or largest mean value for the contour's number of concave sections |
| 31 | symmetry_worst | sym_w | _ |
| 32 | fractal_dimension_worst | fra_di_w | “Worst” or largest mean value for” approximation of the shoreline”–1 |
Figure 2The architecture of the optimized deep RNN model.
The values of hyperparameters for the optimized deep RNN.
| Hyperparameters | Values |
|---|---|
| Dropout rate | 0.1–0.9 |
| Number of neurons | 50–700 |
Figure 3Correlation matrix for breast cancer data set.
The performance of applying regular ML models and DL model with selected features by correlation matrix.
| Approaches | Model | CV performance | Testing performance | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AC | PR | RE | FM | AC | PR | RE | FM | ||
| Regular ML approach | DT | 94.4 | 94.98 | 94.51 | 94.56 | 92.11 | 92.11 | 92.11 | 92.1 |
| KNN | 89.85 | 90.63 | 89.85 | 89.51 | 86.67 | 87.4 | 86.67 | 86.17 | |
| NB | 81.84 | 82.38 | 81.84 | 81.01 | 83.68 | 84.33 | 83.68 | 83.0 | |
|
|
|
|
|
|
|
|
|
| |
| SVM | 94.73 | 94.94 | 94.73 | 94.66 | 93.86 | 93.85 | 93.86 | 93.84 | |
|
| |||||||||
|
|
|
|
|
|
|
|
|
|
|
The number of neurons and dropout value in each layer in the optimized deep RNN for the selected features by correlation matrix.
| Number of layers | Number of neurons in unit | Dropout layer |
|---|---|---|
| Input layer | 190 | 0.8 |
| Hidden layer1 | 470 | 0.4 |
| Hidden layer2 | 90 | 0.7 |
| Hidden layer3 | 630 | 0.4 |
| Hidden layer4 | 370 | 0.4 |
| Hidden layer5 | 270 | 0.4 |
The scores of all features of applying univariate feature-selection method.
| Feature | Score |
|---|---|
| ar_m | 53 ,991.66 |
| ar_s | 8758.505 |
| tex_m | 93.897 51 |
| con_w | 39.516 92 |
| con_m | 19.712 35 |
| sym_w | 1.298 861 |
| con_s | 1.044 718 |
| smo_w | 0.397 366 |
| sym_m | 0.257 38 |
| fra_di_wt | 0.231 522 |
| smo_m | 0.149 899 |
| tex_s | 0.009 794 |
| fra_dim_s | 0.006 371 |
| smo_s | 0.003 266 |
| sym_s | 8.04E−05 |
| fra_di_m | 7.43E−05 |
The performance of applying regular ML models and DL model with the selected features by univariate.
| Approach | Model | CV performance | Testing performance | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AC | PR | RE | FM | AC | PR | RE | FM | ||
| Regular ML approach | DT | 95.17 | 95.11 | 94.73 | 94.72 | 89.04 | 89.32 | 89.04 | 89.1 |
| KNN | 89.85 | 90.63 | 89.85 | 89.51 | 86.67 | 87.4 | 86.67 | 86.17 | |
|
| 80.74 | 81.22 | 80.74 | 79.85 | 83.51 | 84.09 | 83.51 | 82.84 | |
|
|
|
|
|
|
|
|
|
| |
| SVM | 93.85 | 94.07 | 93.85 | 93.78 | 93.86 | 93.85 | 93.86 | 93.84 | |
|
| |||||||||
|
|
|
|
|
|
|
|
|
|
|
The number of neurons and dropout value in each layer in the optimized deep RNN for the selected features by univariate.
| Number of layers | Number of neurons in unit | Dropout layer |
|---|---|---|
| Input layer | 550 | 0.3 |
| Hidden layer1 | 230 | 0.9 |
| Hidden layer2 | 390 | 0.3 |
| Hidden layer3 | 490 | 0.9 |
| Hidden layer4 | 170 | 0.9 |
| Hidden layer5 | 330 | 0.4 |
Figure 4The ranking of features of applying REF.
The performance of applying regular ML models and DL model with selected features by REF.
| Approaches | Models | CV performance | Testing performance | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AC | PR | RE | FM | AC | PR | RE | FM | ||
| Regular ML approach | DT | 94.24 | 94.48 | 94.24 | 94.4 | 88.82 | 89.14 | 88.82 | 88.89 |
| KNN | 89.85 | 90.63 | 89.85 | 89.51 | 86.67 | 87.4 | 86.67 | 86.17 | |
| NB | 80.74 | 81.22 | 80.74 | 79.85 | 83.51 | 84.09 | 83.51 |
| |
|
|
|
|
|
|
|
|
|
| |
| SVM | 93.74 | 93.98 | 93.74 | 93.68 | 93.86 | 93.85 | 93.86 | 93.84 | |
|
| |||||||||
| DL approach |
|
|
|
|
|
|
|
|
|
The number of neurons and dropout value in each layer in the optimized deep RNN for the selected features by REF.
| Number of layers | Number of neurons in unit | Dropout layer |
|---|---|---|
| Input layer | 190 | 0.8 |
| Hidden layer1 | 470 | 0.4 |
| Hidden layer2 | 90 | 0.7 |
| Hidden layer3 | 630 | 0.4 |
| Hidden layer4 | 370 | 0.5 |
| Hidden layer5 | 270 | 0.4 |
Figure 5CV results for the optimized deep RNN.
Figure 6The testing results for the optimized deep RNN.