| Literature DB >> 32164532 |
Annarita Fanizzi1, Teresa M A Basile2,3, Liliana Losurdo4, Roberto Bellotti2,3, Ubaldo Bottigli5, Rosalba Dentamaro1, Vittorio Didonna1, Alfonso Fausto6, Raffaella Massafra1, Marco Moschetta7, Ondina Popescu1, Pasquale Tamborra1, Sabina Tangaro3, Daniele La Forgia1.
Abstract
BACKGROUND: Screening programs use mammography as primary diagnostic tool for detecting breast cancer at an early stage. The diagnosis of some lesions, such as microcalcifications, is still difficult today for radiologists. In this paper, we proposed an automatic binary model for discriminating tissue in digital mammograms, as support tool for the radiologists. In particular, we compared the contribution of different methods on the feature selection process in terms of the learning performances and selected features.Entities:
Keywords: Computer-aided diagnosis; Digital mammograms; Feature selection; Haar wavelet transform; Microcalcifications; Minimum eigenvalue algorithm; Random forest; SURF
Mesh:
Year: 2020 PMID: 32164532 PMCID: PMC7069158 DOI: 10.1186/s12859-020-3358-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Image Haar decomposition. a One- and b two- level of decomposition
Fig. 2Scale space representation. a Traditional approach with a serial downsampling of an image. b Surf approach with a parallel upscaling of an image [29]
Fig. 3Shi-Tomasi score. In the (λ1,λ2) space, only when λ1 and λ2 are above a minimum value λ, the point is considered as a corner (green hatched area). The white and gray areas represent the conditions in which the point is marked as an edge and a flat region, respectively
Fig. 4Example of feature set extraction from an original ROI containing microcalcifications. The Statistical Features set is obtained from eight sub-ROIs decomposed by Haar wavelet transform, while the Interest Point and Corner sets are formed by counting the number of points and corners of interest extracted by SURF and MinEigen algorithms, respectively
Fig. 5Flow-chart of the proposed model. In a first phase, a set of features on each ROI is extracted, then the feature selection step is performed; finally the RF classifier is trained for the resolution of the binary problem - normal vs abnormal and benign vs malignant
Fig. 6Median accuracy (%) with the growth of the number of features. The median value is calculated on 100 rounds of 10-fold cross validation for increasing values of the number of features used to train the proposed model to classify ROIs into a normal/abnormal b and benign/malignant. Two different feature selection approaches are used, that are embedded (red line) and filter (blue line) methods
Significant features on BCDR database
| Normal/Abnormal | Benign/Malignant | ||||||
|---|---|---|---|---|---|---|---|
| Embedded M. | freq (%) | Filter M. | freq (%) | Embedded M. | freq (%) | Filter M. | freq (%) |
| (k ≤2) | (k ≤6) | (k ≤10) | (k ≤26) | ||||
| # Interest Points | 100 | # Interest Points | 100 | Variance _LL2 | 100 | Variance _LL1 | 100 |
| # Interest Corners | 100 | Kurtosis _HL2 | 99.80 | # Interest Corners | 100 | Skewness _LL1 | 100 |
| # Interest Corners | 99.10 | Variance _LL1 | 99.90 | Entropy _LL1 | 100 | ||
| Kurtosis _HL1 | 97.80 | RelSmoothness _LL2 | 99.90 | RelSmoothness _LL1 | 100 | ||
| Kurtosis _LH1 | 76.40 | RelSmoothness _LL1 | 99.60 | Entropy _HL1 | 100 | ||
| Kurtosis _LH2 | 61.90 | # Interest Points | 91.30 | Entropy _HH1 | 100 | ||
| Variance _LH2 | 24.80 | Variance _HH1 | 77.70 | Kurtosis _HH1 | 100 | ||
| RelSmoothness _LH2 | 21.90 | RelSmoothness _HH1 | 77.40 | Variance _LL2 | 100 | ||
| Entropy _HH1 | 58.90 | Skewness _LL2 | 100 | ||||
| Entropy _HL1 | 44.80 | Entropy _LL2 | 100 | ||||
| Mean _HH1 | 41.20 | RelSmoothness _LL2 | 100 | ||||
| Kurtosis _LH2 | 100 | ||||||
| Kurtosis _HL2 | 100 | ||||||
| Kurtosis _HH2 | 100 | ||||||
| # Interest Points | 100 | ||||||
| # Interest Corners | 100 | ||||||
| Entropy _LH1 | 99.20 | ||||||
| Entropy _LH2 | 98.60 | ||||||
| Entropy _HH2 | 97.80 | ||||||
| Kurtosis _HL1 | 97.10 | ||||||
| RelSmoothness _HH1 | 96.10 | ||||||
| Variance _HH1 | 88.80 | ||||||
| Skewness _HL2 | 76.30 | ||||||
| Mean _LL1 | 59.00 | ||||||
The features whose occurrence in the first k positions of the rankings defined by the filter and embedded methods is significantly different from the case (p-value null model test ≤0.05) are reported. k is the number of features that maximizes the accuracy of normal vs abnormal and benign vs malignant classification problems
Best classification performance on BCDR database
| Normal/Abnormal | Benign/Malignant | ||
|---|---|---|---|
| AUC | 98.16 (97.87−98.48)∗∗ | 92.08 (91.61−92.58) | |
| Accuracy | 97.31 (96.92−97.31)∗∗ | 88.46 (87.69−89.23) | |
| Sensitivity | 94.62 (93.85−94.62) | 89.09 (87.27−90.91) | |
| Specificity | 100 (100−100)∗∗ | 88.00 (86.67−89.33) | |
| AUC | 98.67 (98.57−98.76) | 92.13 (91.66−92.78) | |
| Accuracy | 96.92 (96.54−96.92) | 87.69 (86.92−89.23) | |
| Sensitivity | 93.85 (93.85−94.62) | 89.09 (87.27−90.91) | |
| Specificity | 99.23 (99.23−100) | 87.33 (85.33−89.33) |
The classification performance calculated in correspondence with the best result highlighted in the 100 rounds of 10-fold cross-validation for increasing the number of selected features, are summarized. We tested the significance of the diversity of performance measures obtained with the two different feature selection techniques on the same classification problem. Statistical significance is measured with the Wilcoxon-Mann-Whitney test: ** p-value <0.01 (Bonferroni correction)
Benign vs Malignant microcalcifications: accuracy (Acc) and Area Under the Curve (AUC) performances
| Method | No. ROIs | Feature type | Classifier | Acc (%) | AUC (%) |
|---|---|---|---|---|---|
| Chen et al. (2015) [ | 300 | topological features | kNN | 85 | 91 |
| Ren et al. (2012) [ | 295 | statistical features | kNN | 82 | 86 |
| Khehra et al. (2013) [ | 380 | statistical, shape and | LS-SVM | 89 | 89 |
| textural features | |||||
| Strange et al. (2014) [ | 300 | mereotopological features | Barcodes | 80 | 82 |
| Hu et al. (2017) [ | 150 | textural features | ELM | - | 92 |