| Literature DB >> 29167776 |
Houriyeh Ehtemam1, Mitra Montazeri2,3, Reza Khajouei4, Raziyeh Hosseini5, Ali Nemati6, Vahid Maazed6.
Abstract
BACKGROUND: Breast cancer is one of the most common cancers with a high mortality rate among women. Prognosis and early diagnosis of breast cancer among women society reduce considerable rate of their mortality. Nowadays, due to this illness, try to be setting up intelligent systems, which can predict and early diagnose this cancer, and reduce mortality of women society.Entities:
Keywords: Breast cancer; Data mining models; Diagnosis; Ductal and lobular
Year: 2017 PMID: 29167776 PMCID: PMC5696697
Source DB: PubMed Journal: Iran J Public Health ISSN: 2251-6085 Impact factor: 1.429
Fig. 1:Steps of knowledge discovery in databases with data mining process (16)
Fig. 2:Flow chart of proposed method
P-value which compares P-value between two type of breast cancer (Ductal and Lobular)
| Group 1 | Ductal | 198 | .95 | .50 | .000 |
| Group 2 | Lobular | 10 | .05 | ||
| Total | 208 | 1.00 |
Grouping of nominal risk factors of breast cancer
| ER | Positive | 137 | 65.2 | 65.2 | 66.2 |
| Negative | 71 | 33.8 | 33.8 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| PR | Positive | 148 | 70.5 | 70.5 | 71.4 |
| Negative | 60 | 28.6 | 28.6 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| MS | Married | 196 | 93.3 | 94.2 | 94.2 |
| Single | 12 | 5.7 | 5.8 | 100.0 | |
| Total | 208 | 99.0 | 100.0 | - | |
| ET | Yes | 179 | 86.1 | 86.1 | 86.1 |
| No | 29 | 13.9 | 13.9 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| HBCUOCFDR | Yes | 15 | 7.2 | 7.2 | 7.2 |
| No | 193 | 92.8 | 92.8 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| HBCUOCSDR | Yes | 17 | 8.2 | 8.2 | 8.2 |
| No | 191 | 91.8 | 91.8 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| HOCFDR | Yes | 19 | 9.1 | 9.1 | 9.1 |
| No | 189 | 90.9 | 90.9 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| HOCSDR | Yes | 18 | 8.7 | 8.7 | 8.7 |
| No | 190 | 91.3 | 91.3 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - | |
| Parity | 0–5 | 179 | 85.2 | 86.1 | 86.1 |
| 6–11 | 27 | 12.9 | 13.0 | 99.0 | |
| 12–17 | 2 | 1.0 | 1.0 | 100.0 | |
| Total | 208 | 99.0 | 100.0 | - | |
| TC | Ductal | 198 | 95.2 | 95.2 | 95.2 |
| Lobular | 10 | 4.8 | 4.8 | 100.0 | |
| Total | 208 | 100.0 | 100.0 | - |
Represents the percentages of all data, including the missing data, established by each category.
Valid percent presents only the non-missing cases.
Cumulative percent brings an easier way to compare different sets of data.
Amount of precision and accuracy of the each model
| 1. | Bayes-Net ( | 95.67 | 95.70 |
| 2. | Naïve-Bayes | 91.83 | 95.00 |
| 3. | Naïve-Bayes-Updateable | 91.83 | 95.00 |
| 4. | Logistic | 90.86 | 95.00 |
| 5. | Multilayer-Perceptron | 91.83 | 95.50 |
| 6. | RBF-Network | 94.23 | 95.10 |
| 7. | Simple-Logistic | 95.19 | 95.20 |
| 8. | Sequential-Minimal Optimization ( | 95.19 | 95.20 |
| 9. | Voted-Perceptron | 95.19 | 95.20 |
| 10. | Instance-Based-Learning-algorithms | 90.86 | 95.00 |
| 11. | IBK | 90.38 | 95.00 |
| 12. | K-Star | 91.82 | 95.00 |
| 13. | Locally-Weighted-Learning | 94.71 | 95.20 |
| 14. | AdaBoost-ML | 95.19 | 95.20 |
| 15. | Attribute-Selected-Classifier ( | 95.19 | 95.20 |
| 16. | Bagging | 95.19 | 95.20 |
| 17. | Classification-Via-Clustering | 69.23 | 94.70 |
| 18. | Classification-Via-Regression | 94.71 | 95.20 |
| 19. | Cross-Validation-Parameter-Selection ( | 95.19 | 95.20 |
| 20. | Dagging | 95.19 | 95.20 |
| 21. | Decorate ( | 95.19 | 95.20 |
| 22. | Ensembles of Nested Dichotomies ( | 95.19 | 95.20 |
| 23. | Ensemble-Selection ( | 95.19 | 95.20 |
| 24. | Filtered-Classifier ( | 95.19 | 95.20 |
| 25. | Grading | 95.19 | 95.20 |
| 26. | Logit-Boost | 95.19 | 95.20 |
| 27. | Multi-Boost-AB ( | 95.19 | 95.20 |
| 28. | Multi-Class-Classifier | 90.86 | 95.00 |
| 29. | Multi-Scheme | 95.19 | 95.20 |
| 30. | Ordinal-Class-Classifier ( | 95.19 | 95.20 |
| 31. | Raced-Incremental-Logit-Boost ( | 95.19 | 95.20 |
| 32. | Random-Committee | 94.23 | 95.10 |
| 33. | Random-Sub-Space ( | 95.19 | 95.20 |
| 34. | Rotation-Forest | 95.19 | 95.20 |
| 35. | Stacking | 95.19 | 95.20 |
| 36. | Stacking-C | 95.19 | 95.20 |
| 37. | Threshold-Selector | 94.23 | 95.10 |
| 38. | Vote | 95.19 | 95.20 |
| 39. | Hyper-Pipes ( | 95.19 | 95.20 |
| 40. | classification by Voting Feature Intervals | 74.52 | 95.60 |
| 41. | Conjunctive-Rule | 95.19 | 95.20 |
| 42. | Decision-Table | 95.19 | 95.20 |
| 43. | Decision-Table-Naïve-Bayes ( | 95.19 | 95.20 |
| 44. | J-Repeated-incremental-pruning ( | 95.19 | 95.20 |
| 45. | Non-Nested-generalized-exemplars | 92.79 | 95.10 |
| 46. | One-R ( | 95.19 | 95.20 |
| 47. | PART | 94.23 | 95.10 |
| 48. | Ridor( | 95.19 | 95.20 |
| 49. | Zero-R | 95.19 | 95.20 |
| 50. | Alternating-Decision Tree ( | 95.19 | 95.20 |
| 51. | Best-FirstTree | 95.19 | 95.20 |
| 52. | Decision-Stump | 95.19 | 95.20 |
| 53. | Functional trees | 94.71 | 95.20 |
| 54. | J48 ( | 95.19 | 95.20 |
| 55. | J48-graft ( | 95.19 | 95.20 |
| 56. | LAD-Tree | 91.35 | 95.20 |
| 57. | NB-Tree ( | 95.19 | 95.60 |
| 58. | Random-Forest | 93.75 | 95.10 |
| 59. | Random-Tree | 90.38 | 95.40 |
| 60. | REP-Tree ( | 95.19 | 95.20 |
| 61. | Simple-Cart | 95.24 | 95.20 |
| 62. | Class-Balanced-Nested-Dichotomies ( | 95.19 | 95.20 |
| 63. | (Data-Near-Balanced-ND ( | 95.19 | 95.20 |
| 64. | Nested-Dichotomies | 95.19 | 95.20 |
Fig. 3:ROC curve of four best models in WEKA software (BN, MP, NB-Tree, and RT)