| Literature DB >> 25946945 |
Hadi Lotfnezhad Afshar1, Maryam Ahmadi, Masoud Roudbari, Farahnaz Sadoughi.
Abstract
The collection of large volumes of medical data has offered an opportunity to develop prediction models for survival by the medical research community. Medical researchers who seek to discover and extract hidden patterns and relationships among large number of variables use knowledge discovery in databases (KDD) to predict the outcome of a disease. The study was conducted to develop predictive models and discover relationships between certain predictor variables and survival in the context of breast cancer. This study is Cross sectional. After data preparation, data of 22,763 female patients, mean age 59.4 years, stored in the Surveillance Epidemiology and End Results (SEER) breast cancer dataset were analyzed anonymously. IBM SPSS Statistics 16, Access 2003 and Excel 2003 were used in the data preparation and IBM SPSS Modeler 14.2 was used in the model design. Support Vector Machine (SVM) model outperformed other models in the prediction of breast cancer survival. Analysis showed SVM model detected ten important predictor variables contributing mostly to prediction of breast cancer survival. Among important variables, behavior of tumor as the most important variable and stage of malignancy as the least important variable were identified. In current study, applying of the knowledge discovery method in the breast cancer dataset predicted the survival condition of breast cancer patients with high confidence and identified the most important variables participating in breast cancer survival.Entities:
Mesh:
Year: 2015 PMID: 25946945 PMCID: PMC4802184 DOI: 10.5539/gjhs.v7n4p392
Source DB: PubMed Journal: Glob J Health Sci ISSN: 1916-9736
The missing values of predictor variables
| Variables | Frequency | Percentage |
|---|---|---|
| Race | 85 | 0.5 |
| Marital status | 674 | 3.8 |
| Primary site code | 4196 | 23.8 |
| Histology | 0 | 0 |
| Behavior | 0 | 0 |
| Grade | 3033 | 17.2 |
| Extension of tumor | 384 | 2.2 |
| Lymph node involvement | 647 | 3.7 |
| Radiation | 131 | 0.7 |
| Stage | 388 | 2.2 |
| Site specific surgery code | 68 | 0.4 |
| ERStatus | 0 | 0 |
| PRStatus | 0 | 0 |
| Age | 0 | 0 |
| Tumor size | 2930 | 17.9 |
| Number of positive nodes | 102 | 0.6 |
| Number of nodes | 96 | 0.5 |
| Number of primaries | 0 | 0 |
Predictor variables
| Race | 18 | ||
| Marital status | 5 | ||
| Primary site code | 8 | ||
| Histology | 55 | ||
| Behavior | 2 | ||
| Grade | 4 | ||
| Extension of tumor | 8 | ||
| Lymph node involvement | 9 | ||
| Radiation | 8 | ||
| Stage | 4 | ||
| Site specific surgery code | 40 | ||
| ERStatus | 4 | ||
| PRStatus | 4 | ||
| Age | 59.4 | 13.5 | 17-103 |
| Tumor size | 18.3 | 18.2 | 0-555 |
| Number of positive nodes | 1.1 | 3 | 0-45 |
| Number of nodes | 6.2 | 7.6 | 0-90 |
| Number of primaries | 1.3 | 0.6 | 1-6 |
Standard deviation.
The comparison of data mining models in the prediction of breast cancer survival
| Sensitivity | Specificity | Accuracy | Adjusted Propensity scores | |
|---|---|---|---|---|
| SVM | 97.7% | 95.6% | 96.7 % | 0.977 |
| Bayes Net | 81.8% | 86.1% | 83.9% | 0.880 |
| CHAID | 82.2% | 82.7% | 82.4% | 0.829 |
Figure 1The relative importance of predictor variables identified by SVM in predicting the breast cancer survival*