| Literature DB >> 35626290 |
Sadia Safdar1, Muhammad Rizwan1, Thippa Reddy Gadekallu2, Abdul Rehman Javed3, Mohammad Khalid Imam Rahmani4, Khurram Jawad4, Surbhi Bhatia5.
Abstract
Breast cancer is one of the most widespread diseases in women worldwide. It leads to the second-largest mortality rate in women, especially in European countries. It occurs when malignant lumps that are cancerous start to grow in the breast cells. Accurate and early diagnosis can help in increasing survival rates against this disease. A computer-aided detection (CAD) system is necessary for radiologists to differentiate between normal and abnormal cell growth. This research consists of two parts; the first part involves a brief overview of the different image modalities, using a wide range of research databases to source information such as ultrasound, histography, and mammography to access various publications. The second part evaluates different machine learning techniques used to estimate breast cancer recurrence rates. The first step is to perform preprocessing, including eliminating missing values, data noise, and transformation. The dataset is divided as follows: 60% of the dataset is used for training, and the rest, 40%, is used for testing. We focus on minimizing type one false-positive rate (FPR) and type two false-negative rate (FNR) errors to improve accuracy and sensitivity. Our proposed model uses machine learning techniques such as support vector machine (SVM), logistic regression (LR), and K-nearest neighbor (KNN) to achieve better accuracy in breast cancer classification. Furthermore, we attain the highest accuracy of 97.7% with 0.01 FPR, 0.03 FNR, and an area under the ROC curve (AUC) score of 0.99. The results show that our proposed model successfully classifies breast tumors while overcoming previous research limitations. Finally, we summarize the paper with the future trends and challenges of the classification and segmentation in breast cancer detection.Entities:
Keywords: K-nearest neighbor (KNN); breast cancer; computer-aided detection (CAD); deep learning; machine learning; support vector machine (SVM)
Year: 2022 PMID: 35626290 PMCID: PMC9140096 DOI: 10.3390/diagnostics12051134
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1WHO statistics of reported cases and causalities worldwide by cancer.
Existing techniques on ultrasound images dataset.
| Ref. | Disease | Dataset Source | Dataset Type | Dataset Description | Tools | Techniques | Accuracy |
|---|---|---|---|---|---|---|---|
| [ | Breast cancer | Seoul National University Hospital, Severance Hospital and Samsung Medical Center | Ultrasound images | 164 images | DL-CAD software, DL-CAD based quantitative features | Feature extraction | sensitivity 95% |
| [ | Malignant | Not given | Ultrasound images | 250 images | ML ANN, BPNN | Segmentation RIO | 95.4% |
| [ | Breast cancer | University Hospital, Amman, Jordan. | Ultrasound images | 1st = 380 images, 2nd dataset includes 163 images | CNN, SVM classifiers | Feature extraction, classification | 96.1% CONV feature 94.2% |
| [ | Benign or malignant | Not given | Ultrasound images | 151 images | CAD system, SVM, CNN | Segmentation, feature extraction, classification | SVM, RF, and CNN 80.0%, 77.78%, 85.42% |
Existing techniques on histopathological images dataset.
| Ref. | Disease | Dataset Source | Dataset Type | Dataset Description | Tools | Techniques | Accuracy |
|---|---|---|---|---|---|---|---|
| [ | Breast cancer | ICIAR 2018 | Histopathological images | 1568 images, 249 Bioimaging 2015, 400 ICIAR2018 | DNN, CNN, RNN, LSTM | Segmentation, feature extraction, classification | 90.5% for 4-class classification task |
| [ | Breast cancer | Open source | Histopathological images | BACH2018 (400 images), Bioimaging 2015 (249 images), Extended Bioimaging 2015 (1319 images) | CNN, RNN | Classification K-fold | Single Model 97.5%, Ensemble Model 97.5%, CNN 82.1% |
| [ | Breast cancer | ImageNet dataset, ICIAR, ISBI, ICPR, MICCAI | Histopathological images | 3771 images | RNN CNN SVM, NVIDIA GPUs | Classification | 91.3% for the 4-class classification task |
| [ | Breast cancer | Anatomy and Cytopathology Lab, Brazil. | Histopathological images | 7909 images | DNN, GCN, softmax classifier | Binary classification | 99.44% and 99.01% |
| [ | Breast cancer | Wisconsin UCI | Histopathological images | 92 images | DNN, RNN | Binary classification | DNN gave better results |
| [ | Breast cancer | Not given | Histopathological images | 400 images | CNN ML, DL, IHC-Net, Naïve Bayes, SVM and RFD | Segmentation, feature extraction, classification | (98.24%), Ensemble classifier 98.41% F-score and 97.66% |
Existing techniques on mammographic images dataset.
| Ref. | Disease | Dataset Source | Dataset Type | Dataset Description | Tools | Techniques | Accuracy |
|---|---|---|---|---|---|---|---|
| [ | Breast cancer | Massachusetts General Hospital | Mammographic images | DDSM 2500 images | FFNN, GLCM, GLRLM, DFO | Segmentation, Feature extraction, classification | 90%, FFNN 98% |
| [ | Breast cancer | Database for Mastology Research (DMR) | Mammographic images | 208 images | RFM AlexNet, GoogLe-Net, ResNet-18, VGG-16, VGG-19 | Segmentation, Feature extraction, classification | 78.16% 73.3–81.07% |
| [ | Breast cancer | US Chinese hospital | Mammographic images | DDSM OMI-DB | CNN, MIL | Classification | Not given |
| [ | Breast cancer | Open source | Mammographic images | DDSM 2620 cases CBISD DSM 1644 pics | DCNN AlexNet, DCNN SVM | Segmentation, feature extraction RIO | SVM 87.2%, AUC 94% |
| [ | Breast cancer | Not given | Mammographic images | FFDM, DDSM 14,860 images | CNN AlexNet, ImageNet | classification | 95% |
| [ | Breast cancer | Private | Mammographic images | 400 images | DL, SVM Soft-Max function, Sigmoid function | Segmentation, classification | SVM Show better result than DL |
| [ | Breast cancer | Society (MIAS) database | Mammographic images | 322 images | HDF, OKFCA, OKFC Algorithm, fuzzy | Segmentation | MFKFCS produces 80.42% |
Figure 2Proposed breast cancer classification model.
Comparison of existing bio-imaging studies.
| Reference | Bioimaging Type | Methodology | Accuracy |
|---|---|---|---|
| [ | Ultrasound images | DL-CAD | 95% |
| [ | Ultrasound images | ML, ANN, BPNN | 95.4% |
| [ | Ultrasound images | CNN, SVM | 96.1% |
| [ | Ultrasound images | SVM, RF, CNN | 80.0%, 77.78%, 85.42% |
| [ | Histopathological images | DNN, CNN, RNN | 90.5% |
| [ | Histopathological images | RNN, CNN | 97.5%, 82.1% |
| [ | Histopathological images | RNN, CNN, SVM | 91.3% |
| [ | Mammographic images | RFM, AlexNet, | 78% |
| [ | Mammographic images | DCNN AlexNet, DCNN SVM | 94% |
| [ | Mammographic images | CNN AlexNet, ImageNet | 95% |
| [ | Mammographic images | HFD, OK-FCA, OKFC, Fuzzy | 80.42% |
| This paper | Mammographic images | SVM, KNN, Logistic regression | 97.7% |
Figure 3Results of K-Nearest Neighbor. (a) Confusion matrix; (b) AUC of KNN.
Accuracy of KNN model.
| KNN Model | Accuracy | Prediction Speed | Training Time |
|---|---|---|---|
| Fine | 94.6% | 2500 obs/s | 2.9811 s |
| Medium | 96.3% | 1500 obs/s | 3.6813 s |
| Coarse | 92.8% | 1600 obs/s | 3.9217 s |
| Cosine | 96.1% | 1800 obs/s | 4.9151 s |
| Cubic | 95.8% | 320 obs/s | 10.718 s |
| Weighted | 97.0% | 2500 obs/s | 6.1157 s |
Figure 4Results of Logistic Regression. (a) Confusion matrix; (b) AUC of LR.
Accuracy of Logistic Regression Model.
| Logistic Regression Model | Accuracy | Prediction Speed | Training time |
|---|---|---|---|
| Logistic regression | 94.0% | 2400 obs/s | 52.778 s |
Figure 5Results of Support Vector Machine. (a) Confusion Matrix; (b) AUC of SVM.
Figure 6Results Plot Support Vector Machine. (a) Parallel co-ordination of SVM; (b) Scattered plot of SVM.
Accuracy of SVM model.
| SVM Model | Accuracy | Prediction Speed | Training Time |
|---|---|---|---|
| Linear | 97.5% | 2000 obs/s | 3.5090 s |
| Quadratic | 97.7% | 3700 obs/s | 2.4081 s |
| Cubic | 97.7% | 2300 obs/s | 4.7405 s |
| Fine Gaussian | 77.7% | 1900 obs/s | 6.0672 s |
| Medium Gaussian | 97.4% | 3500 obs/s | 6.4526 s |
| Coarse Gaussian | 95.3% | 3700 obs/s | 6.7769 s |