| Literature DB >> 34178808 |
Hadi Lotfnezhad Afshar1, Nasrollah Jabbari2, Hamid Reza Khalkhali3, Omid Esnaashari4.
Abstract
BACKGROUND: The low breast cancer survival rates in less developed countries are critical. The machine learning techniques predict cancers survival with high accuracy. Missing data are the most important limitation for using the highest potential of these techniques to predict cancers survival. Multiple imputation (MI) was implemented and analyzed in detail to impute the missing data of a breast cancer dataset.Entities:
Keywords: Breast neoplasms; Imputation; Machine learning; Observer variation; Survival
Year: 2021 PMID: 34178808 PMCID: PMC8214598 DOI: 10.18502/ijph.v50i3.5606
Source DB: PubMed Journal: Iran J Public Health ISSN: 2251-6085 Impact factor: 1.429
Numbers and percentages of missing data in dataset variables
| Age | 0 | 0 |
| Primary Site | 15 | 1.8 |
| Histology | 9 | 1.1 |
| Tumor Size | 9 | 1.1 |
| Metastases | 12 | 1.5 |
| Stage | 13 | 1.6 |
| Behavior | 12 | 1.5 |
| Grade | 19 | 2.3 |
| Positive regional node | 16 | 1.9 |
| Removed regional node | 22 | 2.7 |
| Surgery | 3 | 0.4 |
| Radiation | 0 | 0 |
| Her2 | 173 | 21.1 |
| ER | 66 | 8.1 |
| PR | 66 | 8.1 |
| Survival | 0 | 0 |
The performance of C5.0
| LD
| 79.03 | 88.17 | 51.61 | 41.57 | 0.7910 |
| MI
| 84.66 | 90.68 | 68.89 | 60.82 | 0.8667 |
| MI dataset 2 | 81.6 | 88.98 | 62.22 | 52.65 | 0.8142 |
| MI dataset 3 | 85.28 | 91.53 | 68.89 | 62.12 | 0.8456 |
| MI dataset 4 | 85.28 | 94.92 | 60 | 59.85 | 0.8093 |
| MI dataset 5 | 85.28 | 94.92 | 60 | 59.58 | 0.8491 |
Listwise Deletion: The dataset that missing data of it have been removed.
Multiple Imputations: The dataset that missing data of it have been imputed
The most important rules extracted from datasets
| LD dataset | 1. If cancer is | 19 | 0 |
| 2. If cancer is | 48 | 6 | |
| 3. If | 21 | 6 | |
| MI dataset 1 | 1. If cancer is | 52 | 2 |
| 2. If cancer is | 61 | 15 | |
| 3. If | 19 | 2 | |
| MI dataset 2 | 1. If cancer is | 99 | 20 |
| 2. If cancer is | 28 | 1 | |
| 3. If | 10 | 0 | |
| MI dataset 3 | 1. If cancer is | 97 | 18 |
| 2. If cancer is | 29 | 1 | |
| 3. If | 22 | 6 | |
| MI dataset 4 | 1. If cancer is | 48 | 3 |
| 2. If cancer is | 60 | 12 | |
| 3. If | 20 | 4 | |
| MI dataset 5 | 1. If cancer is | 42 | 3 |
| 2. If cancer is | 22 | 2 | |
| 3. If cancer is | 56 | 13 |