| Literature DB >> 35884503 |
Jesus A Basurto-Hurtado1,2, Irving A Cruz-Albarran1,2, Manuel Toledano-Ayala3, Mario Alberto Ibarra-Manzano4, Luis A Morales-Hernandez1, Carlos A Perez-Ramirez2.
Abstract
Breast cancer is one the main death causes for women worldwide, as 16% of the diagnosed malignant lesions worldwide are its consequence. In this sense, it is of paramount importance to diagnose these lesions in the earliest stage possible, in order to have the highest chances of survival. While there are several works that present selected topics in this area, none of them present a complete panorama, that is, from the image generation to its interpretation. This work presents a comprehensive state-of-the-art review of the image generation and processing techniques to detect Breast Cancer, where potential candidates for the image generation and processing are presented and discussed. Novel methodologies should consider the adroit integration of artificial intelligence-concepts and the categorical data to generate modern alternatives that can have the accuracy, precision and reliability expected to mitigate the misclassifications.Entities:
Keywords: artificial intelligence; breast cancer; image processing; magnetic resonance; mammography; thermography; ultrasound
Year: 2022 PMID: 35884503 PMCID: PMC9322973 DOI: 10.3390/cancers14143442
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1BC detection using image processing strategies.
Figure 2Mammography procedure.
Figure 3Ultrasound procedure.
Figure 4BMRI procedure.
Summary of the used breast image generation technologies.
| Imagining Technique | Advantages | Disadvantages | Recommended Population | Some Types of Cancer Detected | Sensitivity and/or Specificity |
|---|---|---|---|---|---|
| Mammography | 1. Equipment is widely available worldwide. | 1. The rate of both false positive and false negatives increases since there is no possibility to determine if the masses are benign | Women whose age is greater than 40 years, have low-dense breast and an average risk of contracting the disease. | 1. Ductal Carcinoma in Situ | Sensitivity up to 85%. |
| Ultrasound | 1. Can be used in young patients or have dense breast. | 1. Calcifications could not be detected. | Women with heterogeneously or extremely dense breast tissue [ | 1. Ductal Carcinoma in Situ. | Sensitivity ranging between 40–75% in younger high-risk women [ |
| Magnetic Resonance Imaging | 1. Effective for detecting suspicious masses in high-risk population [ | 1. Equipment is only available in specialized hospitals. | 1. Women that may carry mutation in ATM, BRCA1, BRCA2, CHEK2, PALB2, PTEN, TP53 genes. | 1. Ductal in situ carcinomas | Sensitivity ranging from 83 to 100% [ |
Figure 5Artificial Neural Network.
Figure 6Convolutional Neural Network.
Summary of the used image classification algorithms.
| Type of Classifier | Classifier | Advantages | Disadvantages | Number of | Performance Metrics |
|---|---|---|---|---|---|
|
| K-means |
Easiness of implementation. Fast implementation. Fast computing (distance to the centroids is only required). |
The initial value of the centroids length influences the performance. Samples must be presented in an organized and normalized way. The centroid distance of the classes might induce misclassifications. |
Some works have used the Wisconsin Breast Cancer Dataset with 569 instances [ |
Accuracy: up to 92% [ Specificity: up to 99% [ Sensitivity: up to 100% [ |
| Hierarchical Clustering |
No distance measurement is required. Similarity measures could be employed. Easy to implement. |
Large datasets increase the time complexity to deliver a result. Outliers degrade the classifier performance. Normalization of the samples values is required. |
117 images are analyzed [ |
Accuracy: 88.0% Specificity: 89.3% Sensitivity: 85.7% [ | |
|
| Decision Trees |
Its construction no imposes any probabilistic distribution to the data. Can deal with large datasets. Easy to understand. |
They can be too complex if the training data is not carefully chosen. Their performance will decrease if several classes exist in the data. |
Some works have analyzed from 283 [ |
Accuracy: up to 89%, Specificity: up to 89% Sensitivity: up to 90% [ |
| Random Forest |
Non-linear relationships between the features are well processed. Outliers do not degrade the classifier performance. Noisy measurements do not affect the accuracy. |
Training time increases due to the number of trees generated. The classifier complexity is increased as the number of trees needed to be evaluated. |
Several works have used different number of images from 59 [ On the other hand, some authors have used ten different datasets, the shortest with 155 images and the largest with 569 images [ |
Accuracy: up to 80%. Specificity: up to 80%. Sensitivity: up to 90% [ | |
| AdaBoost |
Base classifiers only need to have an accuracy greater than 50%. They can be from different domains (spatial, frequency, among others) |
Noise can degrade the classifier performance, as the weight assigned to each weakly classifier is increased to reduce the error. Sensitive to the base classifiers employed. |
Some works have used from 1062 [ |
Accuracy: up to 90%. Specificity: up to 90%. Sensitivity: up to 90% [ | |
| Support Vector Machines |
Can deal with high-dimensional data (features). Robust against outliers. Overfitting is reduced due to the training process. |
Accuracy is kernel dependent. Large datasets are not properly handled. Overlapping and noise degrade the accuracy. Uncertainty cannot be incorporated. |
Some authors have used different number of images from 207 [ |
Accuracy: up to 90%. Specificity: up to 90%. Sensitivity: up to 90% [ | |
| Artificial Neural Networks |
Can deal with highly non-linear relationships. Can deal with noisy data. Uncertainty can be incorporated. Fine-tuning could be done using different activation functions. |
High-dimensional data might cause instabilities to the training algorithms. Prone to overfitting. Selection of the number of neurons could be troublesome. |
Other authors have been used 111 [ |
Accuracy: up to 95%. Sensitivity: up to 100%. Specificity: up to 90% [ | |
| Convolutional Neural Networks |
Can process the image without any preprocessing stage. They can perform feature extraction task automatically. Moderate noisy images can be properly handled. |
They require a large dataset to avoid overfitting. They require a high computational load to their training. |
Some authors have used different number of images from 87 [ |
Accuracy: up to 99%. Sensitivity: up to 99% Specificity: up to 99.6% [ |
Figure 7Electromagnetic spectrum.
Figure 8Proposed experimental set up for the breast thermal images acquisition.
Summary of the breast lesions detection using infrared thermography.
| Authors | Number of Patients | IR System | Image Processing and Classification Algorithms | Accuracy (%) | Room | Acclimation Time (min) | |
|---|---|---|---|---|---|---|---|
| Features | Classification | ||||||
| Ekici and Jawzal [ | 140 | FLIR SC-620 | Bio-data, image analysis, and image statistics | CNNs optimized by Bayes algorithm | 98.95 | 17–24 | 15 |
| AlFayez et al. [ | Public dataset DMR-IR | Geometrical and textural features | Extreme Learning Machine (ELM) and Multilayer Perceptron (MLP) | ELM—100 | Public dataset DMR-IR | ||
| Rani et al. [ | 60 | FLIR T650SC | Temperature and intensity | SVM with Radial basis function kernel | 83.22 | 20–24 | 15 |
| Saxena et al. [ | 32 | FLIR A320 | ROI thermal | Cut-off value | 88 | 22 ± 0.5 | Not specified |
| Tello-Mijares [ | 63 | FLIR SC-620 | Shape, colour, texture, and left and right breast relation | CNN | 100 | 20–22 | 15 |
| Garduño-Ramón et al. [ | 454 | FLIR A300 | Temperature | Difference of temperature | 79.60 | 18–22 | 15 |
| Raghavendra et al. [ | 50 | Thermo TVS200 | Student’s | Decision Tree | 98 | 20–22 | 15 |
| Lashkari et al. [ | 67 | Thermoteknix VisIR 640 | 23 features, including statistical, morphological, frequency domain, histogram and Gray Level Co-occurrence Matrix | Adaboost, SVM, kNN, Naive, PNN | 85.33 and 87.42 | 18–23 | ice test: 20 min |
| Francis et al. [ | 22 | med2000™ IRIS | Statistical and texture features are extracted from thermograms in the curvelet domain | SVM | 90.91 | 25 | 15 |
| Milosevic et al. [ | 40 images | VARIOSCAN 3021 ST | Texture measures derived from the Gray Level Co-occurrence Matrix | K-Nearest Neighbor | 92.5 | 20–23 | Few minutes |
| Araujo et al. [ | 50 | FLIR | Thermal interval for each breast | Linear discriminant classifier, minimum distance classifier, and Parzen window | - | 24–28 | At least 10 min |
Figure 9Autoencoder structure.