| Literature DB >> 35698545 |
Reza Rabiei1, Seyed Mohammad Ayyoubzadeh2, Solmaz Sohrabei3, Marzieh Esmaeili2, Alireza Atashi4.
Abstract
Background: Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. Machine learning has the potential to predict breast cancer based on features hidden in data. Objective: This study aimed to predict breast cancer using different machine-learning approaches applying demographic, laboratory, and mammographic data. Material andEntities:
Keywords: Artificial Intelligence; Breast Cancer; Computing Methodologies; Genetic Algorithm; Machine Learning
Year: 2022 PMID: 35698545 PMCID: PMC9175124 DOI: 10.31661/jbpe.v0i0.2109-1403
Source DB: PubMed Journal: J Biomed Phys Eng ISSN: 2251-7200
The relevant features of breast cancer
| Feature name | Description | Type | Values |
|---|---|---|---|
| Age | age at diagnosis | Demographic | <100 Years |
| Age.menop | age of menopause | Demographic | 38-65 Years |
| First pregnancy | age at first pregnancy | Demographic | 13-42 Years |
| Age.menarch | age of menarche | Demographic | 11-18 Years |
| BMI | Body mass index | Demographic | Underweight (Below 18.5) =0, Normal (18.5 - 24.9) =1, Overweight (25.0 - 29.9) =2, Obese (30.0 and Above) =3 |
| Lactation | Breastfeeding status | Demographic | 0-96 Mount |
| Physical Activity | Have a regular Physical Activity | Demographic | Yes=1 No=0 |
| Education | Academic education | Demographic | Illiterate=1, primary=2, high school=3, university=4 |
| Life event stress | life event statues | Demographic | No=0, death of father=1, family problems=2, death of mother=3, death of child=4, death of husband=5, divorced=6 |
| Smoking | Smoking status | Demographic | Yes=1, No=0 |
| Marital | marital status | Demographic | Single=0 other=1 |
| Duration Ocp.used | Mount of used Oral Contraceptive Pills | Laboratory | 0-120 Mount |
| Duration HRT used | mount of Hormone replacement therapy use | Laboratory | 0-120 Mount |
| Personal. Other. Cancer | Personal. Other. Cancer | Laboratory | No=0, ovary=1, endometrium=2, colon=3, meningioma=4, lymphoma=5 |
| Family.BC | FAMILY Breast Cancer | Laboratory | Yes=1 No=0 |
| Exposure X-ray | Exposure X-ray to chest | Laboratory | Negative=0 positive=1 |
| Vitamin D3 | Amount vitamin D in body | Laboratory | >10 mg=0 deficiency 10-30 mg=1 insufficiency 30-100 mg=2 sufficient >100 mg=3 Overdose |
| Biopsy | pathology of biopsy | Laboratory | no malignancy detected= 0 lobular carcinoma insitu=1 ductal carcinoma insitu=2 ductal carcinoma insitu=3 invasive lobular carcinoma=4 medullary=5 microinvasion=6 |
| Hysterectomy | history of hysterectomy | Laboratory | Yes=1 No=0 |
| Personal.BC | Personal Breast Cancer history | Laboratory | Yes=1 No=0, surgery=2, RT (Radio Therapy) =3 |
| Breast density | screening | Mammography | Fatty tissue=0, glandular and fibrous tissue=1, dense =2, heterogeneously dense extremely dense=3 |
| Micro lobulated | screening | Mammography | None=0, Fibroadenoma=1, Papilloma=2, Phyllodes tumor=3, DCIS=4, IDC=5, ILC=6, Lactating and tubular adenomas =7 |
| Circumscribed | screening | Mammography | None=0 cysts=1, complicated cyst=2, clustered microcyst=3, solid mass=4 |
| Micro calcification, Macro calcification | screening | Mammography | Probably benign Punctate Intermediate=1 concern Coarse heterogeneous Amorphous =2 Higher probability of malignancy Fine pleomorphic Fine linear/branching=3 |
| Class | Breast Cancer | malignant=1 benign=0 | |
DCIS: Ductal carcinoma in situ, IDC: Invasive ductal carcinoma, ILC: Invasive lobular carcinoma
Figure 1Block diagram of methods
Confusion matrix of a binominal classifier
| Predicted | |||
|---|---|---|---|
| Negative | Positive | ||
| Actual | Negative | TN | FP |
| Positive | FN | TP | |
TN: True Negative, FN: False Negative, FP: False Positive, TP: True Positive
Figure 2The weight of the features in breast cancer prediction
Performance comparison of the breast cancer prediction models
| Models | Features | AUC | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|---|---|---|---|---|---|
| Random Forest | Demographics | 0.53 | 93 | 83 | 79 |
| Demographics + Mammography | 0.53 | 95 | 83 | 80 | |
| Gradient Boosting | Demographics | 0.59 | 63 | 87 | 62 |
| Demographics + Mammography | 0.59 | 82 | 86 | 74 | |
| Multi-Layer Perceptron | Demographics | 0.56 | 78 | 85 | 71 |
| Demographics + Mammography | 0.56 | 82 | 84 | 73 |
AUC: Area under the ROC curve, ROC: Receiver operating characteristic
Figure 3Receiver operating characteristic (ROC) curve of models
Area under the Receiver operating characteristic (ROC) curve
| Test Result Model(s) | Area |
|---|---|
| GBT | 0.59 |
| MLP | 0.56 |
| RF | 0.53 |
GBT: Gradient Boosting Tree, MLP: Multi-Layer-Perceptron, RF: Random Forest