| Literature DB >> 35018506 |
Harmandeep Singh1, Vipul Sharma2, Damanpreet Singh3.
Abstract
This paper introduces a comparative analysis of the proficiencies of various textures and geometric features in the diagnosis of breast masses on mammograms. An improved machine learning-based framework was developed for this study. The proposed system was tested using 106 full field digital mammography images from the INbreast dataset, containing a total of 115 breast mass lesions. The proficiencies of individual and various combinations of computed textures and geometric features were investigated by evaluating their contributions towards attaining higher classification accuracies. Four state-of-the-art filter-based feature selection algorithms (Relief-F, Pearson correlation coefficient, neighborhood component analysis, and term variance) were employed to select the top 20 most discriminative features. The Relief-F algorithm outperformed other feature selection algorithms in terms of classification results by reporting 85.2% accuracy, 82.0% sensitivity, and 88.0% specificity. A set of nine most discriminative features were then selected, out of the earlier mentioned 20 features obtained using Relief-F, as a result of further simulations. The classification performances of six state-of-the-art machine learning classifiers, namely k-nearest neighbor (k-NN), support vector machine, decision tree, Naive Bayes, random forest, and ensemble tree, were investigated, and the obtained results revealed that the best classification results (accuracy = 90.4%, sensitivity = 92.0%, specificity = 88.0%) were obtained for the k-NN classifier with the number of neighbors having k = 5 and squared inverse distance weight. The key findings include the identification of the nine most discriminative features, that is, FD26 (Fourier Descriptor), Euler number, solidity, mean, FD14, FD13, periodicity, skewness, and contrast out of a pool of 125 texture and geometric features. The proposed results revealed that the selected nine features can be used for the classification of breast masses in mammograms.Entities:
Keywords: Breast cancer; Classification; Machine learning; Mammography
Year: 2022 PMID: 35018506 PMCID: PMC8752652 DOI: 10.1186/s42492-021-00100-1
Source DB: PubMed Journal: Vis Comput Ind Biomed Art ISSN: 2524-4442
Fig. 1The schematic diagram of the proposed CAD system
Detailed description of mammographic image dataset [22]
| Total number of images included | Images containing a single mass | Images containing double masses | Images containing three masses | Total number of mass lesions | Number of benign masses | Number of malignant masses |
|---|---|---|---|---|---|---|
| 106 | 98 | 7 | 1 | 115 | 52 | 63 |
Fig. 2Sample of mammogram images containing benign and malignant masses. (a) Single benign mass; (b) Single malignant mass; (c) Two benign masses; (d) Two malignant masses [22]
Fig. 3Snapshots for describing the steps used in the extraction of mass lesions
Features extracted using different texture models
| Model | Extracted features |
|---|---|
| SGLCM [F1-F14] | ‘ASM’, ‘Contrast’, ‘Correlation’, ‘Sum_Squares’, ‘Inverse_Diff_Moment’, ‘Sum_Average’, ‘Sum_Variance’, ‘Sum_Entropy’, ‘Entropy’, ‘Diff_Variance’, ‘Diff_Entropy’, ‘Info_Measure1’, ‘Info_Measure2’, ‘Max_Corr_Coff’ |
| Gray level difference statistics (GLDS) [F15-F19] | ‘Homogeneity’, ‘Contrast’, ‘Mean’, ‘Energy’, ‘Entropy’ |
| First order statistical (FOS) [F20-F23] | ‘Mean’, ‘Variance’, ‘Skewness’, ‘Kurtosis’ |
| Statistical feature matrix (SFM) [F24-F27] | ‘Mean’, ‘Variance’, ‘Skewness’, ‘Kurtosis’ |
| Law’s texture energy measures (LTEM) [F28-F41] | ‘EE’, ‘SS’, ‘WW’, ‘RR’, ‘EL’, ‘SL’, ‘WL’, ‘RL’, ‘SE’, ‘WE’, ‘RE’, ‘WS’, ‘RS’, ‘RW’ |
| Fractal [F42-F43] | ‘H1’, ‘H2’ |
| Fourier power spectrum (FPS) [F44-F45] | ‘Sr’, ‘Stheta’ |
Fig. 4Masks employed in the extraction of law’s texture energy measures [30]
Various shape and margin features
| Features models | Feature index |
|---|---|
| Shape features | F46-F58 (area, major axis length, minor axis length, eccentricity, orientation, convex area, filled area, Euler number, equiv. diameter, solidity, extent, perimeter, perimeter cirratio) |
| Zernike moments | F59-F73 |
| Fourier descriptors | F74-F125 |
Classification results were obtained for six different state-of-the-art classifiers using a set of all 125 features
| Classifier | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| k-NN [ | 76.0 | 73.0 | 80.0 |
| SVM [ | |||
| DT [ | 68.7 | 70.8 | 66.0 |
| NB [ | 72.2 | 64.6 | 82.0 |
| RF [ | 73.1 | 80.0 | 64.0 |
| ET [ | 72.2 | 73.8 | 70.0 |
Classification results for various texture models
| Features | Fine k-NN | Medium k-NN | Cosine k-NN | Cubic k-NN | Weighted k-NN | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | |
| SGLCM | 54.0 | 58.0 | 49.0 | 59.0 | 53.0 | 68.0 | 56.0 | 60.0 | 51.0 | 56.0 | 51.0 | 64.0 | 57.0 | 62.0 | 50.0 |
| GLDS | 53.0 | 60.0 | 44.0 | 47.0 | 46.0 | 48.0 | 61.0 | 59.0 | 63.0 | 48.0 | 48.0 | 48.0 | 54.0 | 67.0 | 36.0 |
| FOS | 61.0 | 61.0 | 62.0 | 63.0 | 58.0 | 68.0 | 62.0 | 68.0 | 55.0 | 63.0 | 62.0 | 65.0 | 63.0 | 67.0 | 57.0 |
| SFM | 75.0 | 82.0 | 66.0 | 67.0 | 69.0 | 65.0 | 67.0 | 68.0 | 67.0 | 70.0 | 72.0 | 66.0 | 75.0 | 83.0 | 64.0 |
| LTEM | 56.0 | 60.0 | 51.0 | 61.0 | 69.0 | 51.0 | 63.0 | 68.0 | 59.0 | 60.0 | 67.0 | 52.0 | 64.0 | 77.0 | 48.0 |
| Fractal | 75.0 | 82.0 | 66.0 | 60.0 | 76.0 | 41.0 | 55.0 | 65.0 | 43.0 | 61.0 | 77.0 | 42.0 | 71.0 | 86.0 | 53.0 |
| FPS | 54.0 | 55.0 | 52.0 | 53.0 | 59.0 | 46.0 | 56.0 | 60.0 | 51.0 | 54.0 | 59.0 | 48.0 | 54.0 | 64.0 | 42.0 |
Classification results for various geometry feature models
| Features | Fine k-NN | Medium k-NN | Cosine k-NN | Cubic k-NN | Weighted k-NN | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | |
| SHAPE | 57.0 | 57.0 | 58.0 | 54.0 | 56.0 | 52.0 | 54.0 | 56.0 | 50.0 | 54.0 | 57.0 | 51.0 | 57.0 | 59.0 | 55.0 |
| ZM | 59.0 | 62.0 | 56.0 | 54.0 | 51.0 | 56.0 | 53.0 | 51.0 | 56.0 | 54.0 | 50.0 | 58.0 | 58.0 | 62.0 | 53.0 |
| FD | 74.0 | 72.0 | 77.0 | 72.0 | 62.0 | 85.0 | 71.0 | 62.0 | 85.0 | 71.0 | 60.0 | 85.0 | 72.0 | 66.0 | 80.0 |
Classification results for three sets of features including all textures, all geometric features, and a combination of texture and geometry
| Features | Fine k-NN | Medium k-NN | Cosine k-NN | Cubic k-NN | Weighted k-NN | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | |
| Texture (T) | 72.0 | 78.0 | 64.0 | 64.0 | 63.0 | 66.0 | 67.0 | 74.0 | 58.0 | 63.0 | 68.0 | 58.0 | 72.0 | 77.0 | 66.0 |
| Geometry (G) | 72.0 | 72.0 | 72.0 | 72.0 | 60.0 | 86.0 | 75.0 | 70.0 | 82.0 | 72.0 | 61.0 | 86.0 | 73.0 | 67.0 | 81.0 |
| Combined (T&G) | 73.0 | 67.0 | 82.0 | 71.0 | 59.0 | 88.0 | 76.0 | 73.0 | 80.0 | 72.0 | 59.0 | 89.0 | 75.0 | 67.0 | 84.0 |
Rank-wise lists of the top 20 most discriminative features selected by four different feature selection techniques
| Rank | Relief-F | Pearson correlation coefficient | Neighbourhood component analysis | Term variance | ||||
|---|---|---|---|---|---|---|---|---|
| Index | Feature name | Index | Feature name | Index | Feature name | Index | Feature name | |
| 1 | F99 | FD26 | F86 | FD13 | F1 | ASM | F45 | Stheta |
| 2 | F53 | Euler number | F99 | FD26 | F2 | Contrast | F44 | Sr |
| 3 | F55 | Solidity | F87 | FD14 | F3 | Correlation | F51 | Convex area |
| 4 | F56 | Extent | F82 | FD9 | F4 | Sum squares | F52 | Filled area |
| 5 | F4 | Sum squares | F83 | FD10 | F5 | Inverse diff moment | F46 | Area |
| 6 | F20 | Mean | F85 | FD12 | F6 | Sum average | F57 | Perimeter |
| 7 | F87 | FD14 | F53 | Euler number | F7 | Sum variance | F48 | Minor axis length |
| 8 | F6 | Sum average | F92 | FD19 | F8 | Sum entropy | F47 | Major axis length |
| 9 | F7 | Sum variance | F84 | FD11 | F9 | Entropy | F54 | Equiv diameter |
| 10 | F86 | FD13 | F95 | FD22 | F10 | Diff variance | F21 | Variance |
| 11 | F66 | ZM3–1 | F90 | FD17 | F11 | Diff entropy | F53 | Euler number |
| 12 | F67 | ZM31 | F33 | SL | F12 | Info measure1 | F50 | Orientation |
| 13 | F26 | Periodicity | F74 | FD1 | F13 | Info measure2 | F7 | Sum variance |
| 14 | F82 | FD9 | F97 | FD24 | F14 | Max Corr. Coff | F20 | Mean |
| 15 | F43 | H2 | F98 | FD25 | F15 | Homogeneity | F24 | Coarseness |
| 16 | F21 | Variance | F124 | FD51 | F16 | Contrast | F4 | Sum squares |
| 17 | F22 | Skewness | F79 | FD6 | F17 | Mean | F16 | Contrast |
| 18 | F25 | Contrast | F43 | H2 | F18 | Energy | F25 | Contrast |
| 19 | F8 | Sum entropy | F81 | FD8 | F19 | Entropy | F6 | Sum average |
| 20 | F9 | Entropy | F80 | FD7 | F20 | Mean | F23 | Kurtosis |
Classification results obtained using the top 20 features selected by four different feature selection algorithms
| Feature selection | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| Relief-F [ | |||
| Pearson correlation coefficient [ | 80.9 | 80.0 | 82.0 |
| Neighbourhood component analysis [ | 75.7 | 75.0 | 76.0 |
| Term variance [ | 60.0 | 66.0 | 52.0 |
Rank-wise lists of the top 20 most discriminative textures and the top 20 most discriminative geometric features selected by the Relief-F feature selection algorithm
| Rank | Rank wise top 20 texture features | Rank wise top 20 geometry features | ||||
|---|---|---|---|---|---|---|
| Index | Feature name | Feature model | Index | Feature name | Feature model | |
| 1 | F42 | H1 | FRACTAL | F99 | FD26 | Fourier descriptor |
| 2 | F43 | H2 | FRACTAL | F103 | FD30 | Fourier descriptor |
| 3 | F33 | SL | LTEM | F82 | FD9 | Fourier descriptor |
| 4 | F27 | Roughness | SFM | F84 | FD11 | Fourier descriptor |
| 5 | F7 | Sum variance | SGLCM | F66 | ZM3–1 | Zernike moments |
| 6 | F26 | Periodicity | SFM | F67 | ZM31 | Zernike moments |
| 7 | F25 | Contrast | SFM | F53 | Euler number | Shape descriptor |
| 8 | F4 | Sum squares | SGLCM | F87 | FD14 | Fourier descriptor |
| 9 | F20 | Mean | FOS | F86 | FD13 | Fourier descriptor |
| 10 | F6 | Sum average | SGLCM | F97 | FD24 | Fourier descriptor |
| 11 | F24 | Coarseness | SFM | F70 | ZM4–2 | Zernike moments |
| 12 | F32 | EL | LTEM | F72 | ZM42 | Zernike moments |
| 13 | F16 | Contrast | GLDS | F74 | FD1 | Fourier descriptor |
| 14 | F15 | Homogeneity | GLDS | F92 | FD19 | Fourier descriptor |
| 15 | F19 | Entropy | GLDS | F55 | Solidity | Shape descriptor |
| 16 | F17 | Mean | GLDS | F56 | Extent | Shape descriptor |
| 17 | F37 | WE | LTEM | F100 | FD27 | Fourier descriptor |
| 18 | F22 | Skewness | FOS | F119 | FD46 | Fourier descriptor |
| 19 | F23 | Kurtosis | FOS | F115 | FD42 | Fourier descriptor |
| 20 | F9 | Entropy | SGLCM | F104 | FD31 | Fourier descriptor |
Classification results obtained by employing the top 20 texture, top 20 geometric, and top 20 combined texture and geometric features with six different state-of-the-art classifiers
| Classifier | Feature set | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|
| k-NN | Texture features (T) | 73.9 | 83.1 | 62.0 |
| Geometry features (G) | 77.4 | 76.0 | 78.0 | |
| Combined (T&G) | ||||
| SVM | Texture features (T) | 71.3 | 73.8 | 68.0 |
| Geometry features (G) | 72.2 | 73.8 | 70.0 | |
| Combined (T&G) | ||||
| DT | Texture features (T) | 68.9 | 69.2 | 68.0 |
| Geometry features (G) | 71.2 | 67.6 | 76.0 | |
| Combined (T&G) | ||||
| NB | Texture features (T) | 67.9 | 64.6 | 72.0 |
| Geometry features (G) | 70.4 | 72.3 | 68.0 | |
| Combined (T&G) | ||||
| RF | Texture features (T) | 73.9 | 84.6 | 60.0 |
| Geometry features (G) | 74.1 | 75.4 | 72.0 | |
| Combined (T&G) | ||||
| ET | Texture features (T) | 74.4 | 83.0 | 64.0 |
| Geometry features (G) | 76.3 | 81.5 | 70.0 | |
| Combined (T&G) |
Classification performances of the top 20 features selected by Relief-F when employed individually
| Feature name | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| FD26 | 67.0 | 71.0 | 62.0 |
| Euler number | 58.0 | 62.0 | 54.0 |
| Solidity | 56.0 | 62.0 | 48.0 |
| Extent | 56.0 | 62.0 | 48.0 |
| Sum squares | 50.0 | 58.0 | 38.0 |
| Mean | 47.0 | 52.0 | 38.0 |
| FD14 | 62.0 | 62.0 | 62.0 |
| Sum average | 52.0 | 58.0 | 44.0 |
| Sum variance | 60.0 | 66.0 | 52.0 |
| FD13 | 62.0 | 69.0 | 52.0 |
| ZM3–1 | 44.0 | 52.0 | 34.0 |
| ZM31 | 44.0 | 52.0 | 34.0 |
| Periodicity | 50.0 | 57.0 | 42.0 |
| FD9 | 61.0 | 66.0 | 56.0 |
| H2 | 65.0 | 69.0 | 60.0 |
| Variance (FOS) | 47.0 | 54.0 | 38.0 |
| Skewness (FOS) | 51.0 | 52.0 | 50.0 |
| Contrast (SFM) | 43.0 | 51.0 | 34.0 |
| Sum entropy | 58.0 | 66.0 | 48.0 |
| Entropy | 61.0 | 69.0 | 50.0 |
Fig. 5Classification performance (accuracy, sensitivity, and specificity) of Relief-F method versus the number of selected features
Experimentally selected top nine most discriminative features out of 20 features selected by Relief-F method
| Rank | Feature index | Feature name | Feature model |
|---|---|---|---|
| 1 | F99 | FD26 | Fourier descriptor |
| 2 | F53 | Euler number | Shape descriptor |
| 3 | F55 | Solidity | Shape descriptor |
| 6 | F20 | Mean | FOS |
| 7 | F87 | FD14 | Fourier descriptor |
| 10 | F86 | FD13 | Fourier descriptor |
| 13 | F26 | Periodicity | SFM |
| 17 | F22 | Skewness (FOS) | FOS |
| 18 | F25 | Contrast (SFM) | SFM |
Classification results obtained for six different state-of-the-art classifiers using a set of nine most discriminative features
| Classifier | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| k-NN [ | |||
| SVM [ | 86.1 | 87.7 | 84.0 |
| DT [ | 73.0 | 80.0 | 64.0 |
| NB [ | 74.8 | 70.8 | 80.0 |
| RF [ | 78.4 | 78.4 | 78.0 |
| ET [ | 81.7 | 84.6 | 78.0 |
Comparison with previous work
| Reference | Dataset used | Feature selection | Classifier used | Accuracy (%) |
|---|---|---|---|---|
| Hans et al. [ | INbreast | Opposition-based Harries Hawk Optimization | k-NN | 78.8 |
| Present study | INbreast | Relief-F | k-NN | 90.4 |