| Literature DB >> 35009746 |
Maha M Alshammari1, Afnan Almuhanna2, Jamal Alhiyafi3.
Abstract
A tumor is an abnormal tissue classified as either benign or malignant. A breast tumor is one of the most common tumors in women. Radiologists use mammograms to identify a breast tumor and classify it, which is a time-consuming process and prone to error due to the complexity of the tumor. In this study, we applied machine learning-based techniques to assist the radiologist in reading mammogram images and classifying the tumor in a very reasonable time interval. We extracted several features from the region of interest in the mammogram, which the radiologist manually annotated. These features are incorporated into a classification engine to train and build the proposed structure classification models. We used a dataset that was not previously seen in the model to evaluate the accuracy of the proposed system following the standard model evaluation schemes. Accordingly, this study found that various factors could affect the performance, which we avoided after experimenting all the possible ways. This study finally recommends using the optimized Support Vector Machine or Naïve Bayes, which produced 100% accuracy after integrating the feature selection and hyper-parameter optimization schemes.Entities:
Keywords: K-nearest neighbor; Naïve Bayes; benign; breast cancer; classification; decision tree; discriminant analysis; machine learning; malignant; support vector machine
Mesh:
Year: 2021 PMID: 35009746 PMCID: PMC8749541 DOI: 10.3390/s22010203
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Mammography-based CAD system.
Figure 2Schematic diagram for the proposed ML-based CAD system.
Figure 3Breast before and after drawing the tumor border.
Figure 4(A) Binary image; (B) ROI; (C) original image.
Extracted features.
| Based | Feature | Description | Equation | Equation Number |
| Density | Mean | The mean is one of the most basic notions in experimental sciences. It is used in many applications of everyday life [ | (1) | |
|
| (2) | |||
| SD | Standard deviation is used to measure the degree of dispersion and variation of the image [ |
| (3) | |
|
| (4) | |||
| Histogram | It represents the distribution of density values of ROI by graphical bars. | (5) | ||
| (6) | ||||
|
| (7) | |||
| Shape | ROI Compactness | One of the shape circularity measurement techniques in digital images associated with the shape perimeter and its area [ | (8) | |
| Kurtosis | Kurtosis measures the tailed-level compared to a normal distribution that indicates whether the data looks flat or not [ | (9) | ||
| Texture | Solidity | It illustrates the ratio of the pixels on the convex body in the ROI area by Equation (10) [ |
| (10) |
| Correlation | It measures the correlative value between the pairs of pixels [ | (11) | ||
| Variance | It is a squared of SD. |
| (12) | |
| Uniformity | In this feature, we make the density in the ROI’s histogram uniform by squaring the normalized histogram value. |
| (13) |
Figure 5Samples of the extracted features from a few cases.
Figure 6Experiment 1 results.
Modified KNN parameters.
| Properties | Value |
|---|---|
| Number of neighbors | 5 |
| Distance | Euclidean |
Figure 7Testing set selection.
Figure 8Models’ performance with the dataset splitting criteria at 60% training and 40% testing.
Figure 9Models’ performance with the dataset splitting criteria at 70% training and 30% testing.
Figure 10Models’ performance with the dataset splitting criteria at 75% training and 25% testing.
Figure 11Models’ performance with the dataset splitting criteria at 80% training and 20% testing.
Figure 12Models’ performance with a random selection of the training set.
Figure 13Models’ performance with manual selection of the training set.
Figure 14Optimized method.
Figure 15Proving the test results.
Figure 16DT optimization.
Figure 17KNN optimization.
Figure 18SVM optimization.
Figure 19DA optimization.
Figure 20Recommended model.