| Literature DB >> 31717809 |
Dina A Ragab1,2, Maha Sharkas1, Omneya Attallah1.
Abstract
Breast cancer is one of the major health issues across the world. In this study, a new computer-aided detection (CAD) system is introduced. First, the mammogram images were enhanced to increase the contrast. Second, the pectoral muscle was eliminated and the breast was suppressed from the mammogram. Afterward, some statistical features were extracted. Next, k-nearest neighbor (k-NN) and decision trees classifiers were used to classify the normal and abnormal lesions. Moreover, multiple classifier systems (MCS) was constructed as it usually improves the classification results. The MCS has two structures, cascaded and parallel structures. Finally, two wrapper feature selection (FS) approaches were applied to identify those features, which influence classification accuracy. The two data sets (1) the mammographic image analysis society digital mammogram database (MIAS) and (2) the digital mammography dream challenge were combined together to test the CAD system proposed. The highest accuracy achieved with the proposed CAD system before FS was 99.7% using the Adaboosting of the J48 decision tree classifiers. The highest accuracy after FS was 100%, which was achieved with k-NN classifier. Moreover, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was equal to 1.0. The results showed that the proposed CAD system was able to accurately classify normal and abnormal lesions in mammogram samples.Entities:
Keywords: feature selection; the computer-aided detection; the decision trees; the k-nearest neighbor; the pectoral muscle removal; the statistical features
Year: 2019 PMID: 31717809 PMCID: PMC6963468 DOI: 10.3390/diagnostics9040165
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1The block diagram of the computer-aided detection (CAD) system used.
Figure 2(a) Original abnormal mass case extracted from the Mammographic Image Analysis Society (MIAS) dataset [13] and (b) Enhanced image using contrast-limited adaptive histogram equalization method (CLAHE).
Figure 3(a) Original enhanced abnormal mass case extracted from MIAS dataset [13] and (b) Suppressed image from artifacts.
Figure 4(a) Original enhanced abnormal mass case extracted from MIAS dataset [13] and (b) Pectoral muscle removal.
The number of samples used for each dataset.
| Normal | Abnormal | Total | |
|---|---|---|---|
| MIAS | 480 | 372 | 852 |
| The digital mammography dream challenge | 300 | 272 | 572 |
| The combination of the two datasets | 800 | 508 | 1308 |
Classification results of the four individual classifiers and the multiple classifier systems (MCS) constructed with all the four classifiers and their ensembles for the MIAS dataset.
| Classifier | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|
| k-NN | 88.9% | 0.951 | 0.884 | 0.895 | 0.896 | 0.89 |
| Adaboosting k-NN | 88.9% | 0.951 | 0.884 | 0.895 | 0.896 | 0.89 |
| Bagged k-NN | 86.7% | 0.935 | 0.836 | 0.894 | 0.903 | 0.869 |
| J48 DT | 98.2% | 0.994 | 0.981 | 0.99 | 0.99 | 0.986 |
| Adaboosting J48 DT | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Bagged J48 DT | 97.7% | 0.996 | 0.952 | 0.99 | 0.99 | 0.971 |
| RF DT | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Adaboosting RF DT | 99.6% | 1.000 | 0.991 | 1.000 | 1.000 | 0.996 |
| Bagged RF DT | 98.4% | 0.999 | 0.971 | 0.99 | 0.99 | 0.981 |
| RT DT | 99.5% | 0.995 | 0.99 | 0.99 | 0.99 | 0.99 |
| Adaboosting RT DT | 99.4% | 0.994 | 0.99 | 0.99 | 0.99 | 0.99 |
| Bagged RT DT | 98.7% | 1.000 | 0.971 | 0.99 | 0.99 | 0.981 |
| k-NN + J48 DT + RF DT + RT DT | 99.4% | 1.000 | 0.981 | 0.99 | 0.99 | 0.986 |
| Adaboosting (k-NN + J48 DT + RF DT + RT DT | 99.8% | 1.000 | 1.000 | 0.991 | 0.99 | 0.995 |
| Bagging (k-NN + J48 DT + RF DT + RT DT | 98.7% | 0.998 | 0.961 | 0.98 | 0.98 | 0.971 |
The classification results before and after the feature selection results using best first (stepwise forward and backward search strategy) for the MIAS dataset.
| Classifier | Number of Features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 88.9% | 0.951 | 0.884 | 0.895 | 0.896 | 0.89 |
| Wrapper k-NN after | 5 | 99.5% | 1.000 | 1.000 | 0.992 | 0.992 | 0.996 |
| J48 DT | 8 | 98.2% | 0.994 | 0.981 | 0.99 | 0.99 | 0.986 |
| Wrapper J48 DT | 6 | 98.5% | 0.994 | 0.98 | 0.993 | 0.98 | 0.98 |
| RF DT | 8 | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Wrapper RF DT | 7 | 99.6% | 1.000 | 0.991 | 1.000 | 1.000 | 0.996 |
| RT DT | 8 | 99.5% | 0.995 | 0.99 | 0.99 | 0.99 | 0.99 |
| Wrapper RT DT | 6 | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
The classification results before and after the feature selection results using the random search method for the MIAS dataset.
| Classifier | Number of Features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 88.9% | 0.951 | 0.884 | 0.895 | 0.896 | 0.89 |
| Wrapper k-NN | 7 | 99.5% | 1.000 | 1.000 | 0.991 | 0.99 | 0.995 |
| J48 DT | 8 | 98.2% | 0.994 | 0.981 | 0.99 | 0.99 | 0.986 |
| Wrapper J48 DT | 8 | 98.8% | 0.994 | 0.981 | 0.99 | 0.99 | 0.986 |
| RF DT | 8 | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Wrapper RF DT | 7 | 99.6% | 1.000 | 0.991 | 1.000 | 1.000 | 0.996 |
| RT DT | 8 | 99.5% | 0.995 | 0.99 | 0.99 | 0.99 | 0.99 |
| Wrapper RT DT | 7 | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
The classification results of the four individual classifiers and the MCS constructed with all the four classifiers and their ensembles digital mammography dream challenge dataset.
| Classifier | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|
| k-NN | 95.9% | 0.99 | 1.000 | 0.929 | 0.924 | 0.961 |
| Adaboosting k-NN | 95% | 0.99 | 1.000 | 0.929 | 0.924 | 0.961 |
| Bagged k-NN | 94.4% | 0.98 | 1.000 | 0.855 | 0.83 | 0.908 |
| J48 DT | 88% | 0.922 | 0.991 | 0.817 | 0.777 | 0.872 |
| Adaboosting J48 DT | 96.3% | 1.000 | 1.000 | 0.935 | 0.93 | 0.964 |
| Bagged J48 DT | 94.5% | 0.996 | 1.000 | 0.906 | 0.896 | 0.946 |
| RF DT | 98% | 1.000 | 1.000 | 0.962 | 0.96 | 0.98 |
| Adaboosting RF DT | 97.2% | 1.000 | 1.000 | 0.958 | 0.956 | 0.978 |
| Bagged RF DT | 95.8% | 1.000 | 1.000 | 0.926 | 0.92 | 0.959 |
| RT DT | 93.8% | 0.942 | 1.000 | 0.893 | 0.88 | 0.937 |
| Adaboosting RT DT | 94.5% | 0.948 | 0.996 | 0.909 | 0.9 | 0.946 |
| Bagged RT DT | 96.5% | 0.999 | 1.000 | 0.938 | 0.933 | 0.966 |
| k-NN + J48 DT + RF DT + RT DT | 96.3% | 1.000 | 1.000 | 0.935 | 0.93 | 0.964 |
| Adaboosting (k-NN + J48 DT + RF DT + RT DT | 97.3% | 1.000 | 1.000 | 0.944 | 0.94 | 0.97 |
| Bagging (k-NN + J48 DT + RF DT + RT DT | 96.6% | 1.000 | 1.000 | 0.938 | 0.933 | 0.966 |
The classification results before and after feature selection results using best first (stepwise forward and backward search strategy) for the digital mammography dream challenge.
| Classifier | Number of features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 96% | 0.99 | 1.000 | 0.929 | 0.924 | 0.961 |
| Wrapper k-NN | 7 | 96.3% | 0.991 | 1.000 | 0.935 | 0.93 | 0.964 |
| J48 DT | 8 | 87.9% | 0.922 | 1.000 | 0.929 | 0.924 | 0.961 |
| Wrapper J48 DT | 7 | 90.73% | 0.949 | 0.83 | 0.993 | 0.83 | 0.83 |
| RF DT | 8 | 98% | 1.000 | 1.000 | 0.962 | 0.96 | 0.98 |
| Wrapper RF DT | 7 | 98.6% | 1.000 | 1.000 | 0.974 | 0.973 | 0.987 |
| RT DT | 8 | 93.88% | 0.942 | 1.000 | 0.893 | 0.88 | 0.937 |
| Wrapper RT DT | 3 | 96.1% | 0.963 | 1.000 | 0.926 | 0.92 | 0.959 |
The classification results before and after feature selection results using the random search method for the digital mammography dream challenge.
| Classifier | Number of Selected Features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 96% | 0.99 | 1.000 | 0.929 | 0.924 | 0.961 |
| Wrapper k-NN | 5 | 96.85% | 1.000 | 1.000 | 0.944 | 0.94 | 0.97 |
| J48 DT | 8 | 88% | 0.922 | 1.000 | 0.929 | 0.924 | 0.961 |
| Wrapper J48 DT | 4 | 95.1% | 0.96 | 1.000 | 0.915 | 0.907 | 0.952 |
| RF DT | 8 | 98% | 1.000 | 1.000 | 0.962 | 0.96 | 0.98 |
| Wrapper RF DT | 8 | 98% | 1.000 | 1.000 | 0.962 | 0.96 | 0.98 |
| RT DT | 8 | 93.88% | 0.942 | 1.000 | 0.893 | 0.88 | 0.937 |
| Wrapper RT DT | 6 | 95% | 0.951 | 0.996 | 0.915 | 0.907 | 0.95 |
The classification results of the four individual classifiers and the MCS constructed with all the four classifiers and their ensembles for the combination of the two datasets.
| Classifier | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|
| k-NN | 90% | 0.964 | 0.875 | 0.907 | 0.91 | 0.893 |
| Adaboosting k-NN | 88% | 0.934 | 0.854 | 0.894 | 0.899 | 0.876 |
| Bagged k-NN | 90% | 0.964 | 0.875 | 0.907 | 0.91 | 0.893 |
| J48 DT | 93.88% | 0.985 | 0.89 | 0.976 | 0.978 | 0.932 |
| Adaboosting J48 DT | 97.4% | 0.996 | 0.966 | 0.98 | 0.98 | 0.973 |
| Bagged J48 DT | 99.7% | 1.000 | 0.993 | 1.000 | 1.000 | 0.997 |
| RF DT | 98.7% | 0.99 | 0.988 | 0.987 | 0.986 | 0.987 |
| Adaboosting RF DT | 98.5% | 0.99 | 0.985 | 0.985 | 0.985 | 0.985 |
| Bagged RF DT | 99.2% | 0.99 | 0.993 | 0.994 | 0.994 | 0.994 |
| RT DT | 97.1% | 0.97 | 0.98 | 0.966 | 0.965 | 0.973 |
| Adaboosting RT DT | 98.7% | 0.99 | 0.988 | 0.987 | 0.986 | 0.987 |
| Bagged RT DT | 97.8% | 0.98 | 0.992 | 0.97 | 0.969 | 0.981 |
| k-NN + J48 DT + RF DT + RT DT | 97.8% | 0.99 | 0.98 | 0.979 | 0.978 | 0.979 |
| Adaboosting (k-NN + J48 DT + RF DT + RT DT | 99.5% | 1.000 | 1.000 | 0.994 | 0.993 | 0.997 |
| Bagging (k-NN + J48 DT + RF DT + RT DT | 98.1% | 0.99 | 0.982 | 0.982 | 0.981 | 0.982 |
The classification results before and after feature selection results using best first (stepwise forward and backward search strategy) for the combination of the two datasets.
| Classifier | Number of Features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 90% | 0.964 | 0.875 | 0.907 | 0.91 | 0.893 |
| Wrapper k-NN | 2 | 100% | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| J48 DT | 8 | 93.88% | 0.985 | 0.89 | 0.976 | 0.978 | 0.932 |
| Wrapper J48 DT | 7 | 98.85% | 1.000 | 0.994 | 0.986 | 0.985 | 0.99 |
| RF DT | 8 | 98.7% | 0.99 | 0.988 | 0.987 | 0.986 | 0.987 |
| Wrapper RF DT | 7 | 99.3% | 1.000 | 0.989 | 0.996 | 0.996 | 0.993 |
| RT DT | 8 | 97.1% | 0.97 | 0.98 | 0.966 | 0.965 | 0.973 |
| Wrapper RT DT | 7 | 99.4% | 0.995 | 1.000 | 0.991 | 0.99 | 0.995 |
The classification results before and after feature selection results using the random search method for the combination of the two datasets.
| Classifier | Number of Features | Accuracy | AUC | Sensitivity | Specificity | Precision | F1 Score |
|---|---|---|---|---|---|---|---|
| k-NN | 8 | 90% | 0.964 | 0.875 | 0.907 | 0.91 | 0.893 |
| Wrapper k-NN | 6 | 99.7% | 0.995 | 1.000 | 0.996 | 0.995 | 0.998 |
| J48 DT | 8 | 93.88% | 0.985 | 0.89 | 0.976 | 0.978 | 0.932 |
| Wrapper J48 DT | 8 | 93.88% | 0.985 | 0.883 | 0.976 | 0.978 | 0.929 |
| RF DT | 8 | 98.7% | 0.99 | 0.988 | 0.987 | 0.986 | 0.987 |
| Wrapper RF DT | 7 | 99.3% | 1.000 | 0.989 | 0.996 | 0.996 | 0.993 |
| RT DT | 8 | 97.1% | 0.97 | 0.98 | 0.966 | 0.965 | 0.973 |
| Wrapper RT DT | 6 | 99.4% | 0.99 | 0.996 | 1.000 | 1.000 | 0.998 |
Classification results for different breast classification methods using different classifiers.
| Reference | Contribution | Data Set | Accuracy | AUC |
|---|---|---|---|---|
| Fu et al. [ | Features were extracted from both the spatial domain and spectral domain, then selected by sequential forward search algorithm, and classified by SVM and GRNN | Nijmegen | - | 0.98 |
| El Toukhy et al. [ | Wavelet for feature extraction and SVM to classify normal and abnormal | MIAS | 95.84% | - |
| Beura et al. [ | Grey level co-occurrence matrix (GLCM) and DWT to extract the texture features, features were selected using the filter methods namely two-sample t-test and F-test, classified by BPN | MIAS | 98% | - |
| Pawar and Talbar [ | Features were extracted using the wavelet co-occurrence, selected using the wrapper method, and classified by the fuzzy classifier | MIAS | 89.47% | - |
| Phadke and Rege [ | Local and global feature extraction and SVM to classify normal and abnormal | MIAS | 93.1% | - |
| Mohanty et al. [ | Contourlet transform feature extraction, features were selected using wrapper forest optimization algorithm, and Naïve Bayes classifier for classifying normal and abnormal lesions | MIAS | 97.86% | - |
| The proposed System | Statistical feature extraction and Adaboosting J48 DT | MIAS | 100% | 1.000 |
| Statistical feature extraction and RF DT | 100% | 1.000 | ||
| Statistical feature extraction, best first FS, and wrapper RT DT | 100% | 1.000 | ||
| Statistical feature extraction, random search FS, and wrapper RT DT | 100% | 1.000 | ||
| Statistical feature extraction and RF DT | Dream Challenge | 98% | 1.000 | |
| Statistical feature extraction, best first FS, wrapper RF DT | 98.07% | 1.000 | ||
| Statistical feature extraction, random search FS, wrapper RF DT | 98% | 1.000 | ||
| Statistical feature extraction and Adaboosting J48 DT | MIAS and Dream Challenge | 99.7% | 1.000 | |
| Statistical feature extraction, random search FS, wrapper k-NN | 99.7% | 0.995 | ||
| Statistical feature extraction, best first FS, wrapper k-NN | 100% | 1.000 |
The name of the unselected features for each FS method for breast cancer datasets.
| Classifier | Number of Selected Features | Name of Unselected Features |
|---|---|---|
|
| ||
| Wrapper k-NN | 5 | Mean, RMS, and range |
| Wrapper J48 DT | 6 | Mean and range |
| Wrapper RF DT | 7 | Range |
| Wrapper RT DT | 6 | Mean and range |
|
| ||
| Wrapper k-NN | 7 | Range |
| Wrapper J48 DT | 7 | Range |
| Wrapper RF DT | 7 | Range |
| Wrapper RT DT | 7 | Range |
|
| ||
| Wrapper k-NN | 7 | Range |
| Wrapper J48 DT | 7 | Range |
| Wrapper RF DT | 7 | Range |
| Wrapper RT DT | 3 | Entropy, variance, standard deviation, RMS, range |
|
| ||
| Wrapper k-NN | 5 | Entropy, variance, and range |
| Wrapper J48 DT | 4 | Entropy, mean, max, and range |
| Wrapper RF DT | 8 | - |
| Wrapper RT DT | 6 | Mean and standard deviation |
|
| ||
| Wrapper k-NN | 2 | Entropy, variance, standard deviation, minimum, maximum, and range |
| Wrapper J48 DT | 7 | Range |
| Wrapper RF DT | 7 | Range |
| Wrapper RT DT | 7 | Range |
|
| ||
| Wrapper k-NN | 6 | Maximum and range |
| Wrapper J48 DT | 8 | - |
| Wrapper RF DT | 7 | Range |
| Wrapper RT DT | 6 | Entropy and Range |
Figure 5The minimum feature values versus the standard deviation feature values for the first 10 samples of images and their rotated versions.
Figure 6The variance feature values versus the minimum feature values for the second 10 samples of images and their rotated versions.
Statistical analysis for each feature of the breast cancer datasets.
| Feature | Statistical Analysis | |||
|---|---|---|---|---|
| Minimum | Maximum | Mean | Standard Deviation | |
|
| ||||
| Entropy | 3.27 | 5.13 | 4.32 | 0.33 |
| Mean | 62.11 | 122.29 | 96.73 | 10.08 |
| Standard Deviation | 15.23 | 34.48 | 19.34 | 2.41 |
| Minimum | 28.07 | 81.23 | 58.84 | 9.84 |
| Maximum | 110.52 | 174.14 | 141.43 | 10.89 |
| Variance | 375.05 | 2078.91 | 656.39 | 176.06 |
| RMS | 68.78 | 126.47 | 100.36 | 9.70 |
| Range | 65.25 | 126.47 | 87.03 | 11.76 |
|
| ||||
| Entropy | 2.92 | 5.81 | 4.79 | 0.51 |
| Mean | 47.5 | 100.79 | 76.57 | 9.17 |
| Standard Deviation | 25.46 | 48.35 | 36.64 | 4.69 |
| Minimum | 5.4 | 25.4 | 12.2 | 4.02 |
| Maximum | 111.82 | 195.9 | 161.71 | 16.814 |
| Variance | 966.37 | 3029.92 | 1899.42 | 394.69 |
| RMS | 56.21 | 111.7 | 86.46 | 9.87 |
| Range | 56.21 | 183.56 | 145.68 | 21.67 |
|
| ||||
| Entropy | 2.92 | 5.81 | 4.47 | 0.45 |
| Mean | 47.5 | 122.29 | 90.17 | 13.19 |
| Standard Deviation | 15.33 | 48.35 | 24.72 | 8.49 |
| Minimum | 5.66 | 81.23 | 44.01 | 22.63 |
| Maximum | 110.52 | 196.44 | 147.66 | 16 |
| Variance | 366.64 | 3029.92 | 1035.48 | 613.02 |
| RMS | 56.21 | 126.47 | 95.73 | 11.41 |
| Range | 56.21 | 186.36 | 101.66 | 29.2 |
Figure 7The histograms of features values of the MIAS dataset.
Figure 8The histograms of features values of the digital mammography dream challenge dataset.
Figure 9The histograms of features values of the combination of the MIAS and digital mammography dream challenge datasets.