| Literature DB >> 28216571 |
Carlos E Galván-Tejada1, Laura A Zanella-Calzada2, Jorge I Galván-Tejada3, José M Celaya-Padilla4, Hamurabi Gamboa-Rosales5, Idalia Garza-Veloz6, Margarita L Martinez-Fierro7.
Abstract
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.Entities:
Keywords: CAD; breast cancer; genetic algorithm; machine learning algorithms; mammography descriptors; mammography image features; multivariate model
Year: 2017 PMID: 28216571 PMCID: PMC5373018 DOI: 10.3390/diagnostics7010009
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Model obtained from Galgo algorithm with Random Forest (RF) method.
Figure 2Gene rank stability graph obtained from Galgo algorithm with RF method.
Figure 3Genes behavior through Galgo algorithm with RF method.
Odds Ratio (OR) values from the model obtained by RF method.
| Features | Odds Ratio | 2.5% | 97.5% |
|---|---|---|---|
| 6.35 × 10 | 6.85 × 10 | 5.19 × 10 | |
| 7.33 × 10 | 6.445 | 8.86 × 10 | |
| 1.002 | 1.001 | 1.002 | |
| 1.16 × 10 | 1.71 × 10 | 1.32 × 10 | |
| 8.95 × 10 | 7.61 × 10 | 1.35 × 10 |
Figure 4Model obtained from Galgo algorithm with Nearest Centroid (NC) method.
Figure 5Gene rank stability graph obtained from Galgo algorithm with NC method.
Figure 6Genes behavior through Galgo algorithm with NC method.
Odds Ratio (OR) values from the model obtained by NC method.
| Features | OR | 2.5% | 97.5% |
|---|---|---|---|
| 3.99 × 10 | 5.48 × 10 | 7.16 × 10 | |
| 1.093 | 1.23 × 10 | 7.36 × 10 | |
| 1.20 × 10 | 2.78 × 10 | 4.69 × 10 |
Figure 7Model obtained from Galgo algorithm with K-Nearest Neighbors (K-NN) method.
Figure 8Frequency graph obtained from Galgo algorithm with K-NN method.
Figure 9Genes behavior through Galgo algorithm with K-NN method.
OR values from the model obtained by K-NN method.
| Features | OR | 2.5% | 97.5% |
|---|---|---|---|
| 9.79 × 10 | 5.56 × 10 | 1.88 × 10 | |
| 2.11 × 10 | 2.07 × 10 | 3.64 × 10 | |
| 7.61 × 10 | 8.25 × 10 | 5.28 × 10 | |
| 4.58 × 10 | 6.858 | 6.36 × 10 | |
| 6.35 × 10 | 6.85 × 10 | 5.19 × 10 |
Values comparison between the three cost functions: RF, K-NN, and NC.
| Cost Function | False Positives | False Negatives |
|---|---|---|
| RF | 10 | 5 |
| K-NN | 8 | 19 |
| NC | 13 | 23 |