| Literature DB >> 34837911 |
Sam Khozama1, Ali Mahmoud Mayya2.
Abstract
OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer.Entities:
Keywords: Cancer prediction; Machine Learning; breast cancer; risk factors
Mesh:
Year: 2021 PMID: 34837911 PMCID: PMC9068177 DOI: 10.31557/APJCP.2021.22.11.3543
Source DB: PubMed Journal: Asian Pac J Cancer Prev ISSN: 1513-7368
Dataset Risk factors Description
| No. | Risk Factor | Description |
|---|---|---|
| 1 | Menopause | Pre=0(23.47%), Post or age>55=1(68.65%), Unknown=9(7.6%) |
| 2 | Age group | Group1=35-39(1.79%); Group2=40-44(12.1%); Group3=45-49(16.18%); Group4=50-54(17.9%); Group5=55-59(13.96); Group6 =60-64(11.1%); Group7=65-69(9.69%); Group8=70-74(8.49%); Group9=75-79(6.06%); Group10=80-84(2.91%). |
| 3 | Density | Breast density: Almost entirely fatty: 1(6.19%), Scattered fibro-glandular densities:2(32.69%), |
| 4 | Race | 1=white(72.63%); 2=Asian/Pacific Islander(4.3%); 3=black(5.08%); 4=Native American |
| 5 | Hispanic | No:0(73.1%) Yes:1(6.58%), Unknown:9(20.3%) |
| 6 | BMI | 1=10-24.99(21.27%); 2=25-29.99(13.6%); 3=30-34.99(6.05%); 4=35 or more(3.25%); |
| 7 | Age at first birth (agefirst) | 0=Age<30(30.18%); 1=Age 30 or greater(5.9%); 2=Nulliparous(8.41%); 9=unknown(55.51%) |
| 8 | Number of first degree relatives with breast cancer (nrelbc) | 0=zero(71.81%); 1=one(12.36%); 2=2 or more(0.65%); 9=unknown(15.18%) |
| 9 | Previous breast procedure (brstproc) | 0=no(71.97%); 1=yes(17.57%); 9=unknown(10.46%) |
| 10 | last mammogram before the index mammogram | 0=negative(75.22%); 1=false positive(1.42%); 9=unknown(23.36%) |
| 11 | Surgical menopause | 0=natural(30%); 1=surgical(17.86%); 9=unknown or not menopausal(52.14%) |
| 12 | Hormone therapy | 0=no(30.47%); 1= yes(28.56%); 9=unknown (40.97%) |
| 13 | Count | Frequency of each record in the dataset |
Figure 1Proposed System Methodology
Figure 2Proposed Breast Cancer Prediction Tool
Balancing Approaches and Their Corresponding Classes Percentages
| Balancing method | Majority class sample number | Majority class percentage | Minor class sample number | Minor class percentage |
|---|---|---|---|---|
| Oversampling | 271,355 | 85.36% | 46,525 | 14.64% |
| Down-sampling | 77,000 | 89.22% | 9,305 | 10.78% |
| Mixed | 225,562 | 82.90% | 46,525 | 17.10% |
Breast Cancer Risk Factors DOI Based on the Medical Questionnaire
| No. | Risk Factor | High | Median | Low | DOIQ |
|---|---|---|---|---|---|
| 1 | Menopause | 30% | 47.50% | 22.50% | 0.37 |
| 2 | Age group | 27.50% | 62.50% | 10.00% | 0.415 |
| 3 | Density | 25% | 45.00% | 30.00% | 0.33 |
| 4 | Race | 25% | 40.00% | 35.00% | 0.31 |
| 5 | Hispanic | 19.40% | 16.70% | 63.90% | 0.183 |
| 6 | BMI | 25.60% | 38.50% | 35.90% | 0.307 |
| 7 | agefirst | 27.50% | 45.00% | 27.50% | 0.345 |
| 8 | nrelbc | 56.40% | 25.60% | 17.90% | 0.44 |
| 9 | brstproc | 34.20% | 23.70% | 42.10% | 0.30 |
| 10 | lastmamm | 34.20% | 32.10% | 33.70% | 0.33 |
| 11 | Surgical menopause | 7.70% | 30.80% | 61.50% | 0.169 |
| 12 | Hormone therapy | 42.50% | 37.50% | 20% | 0.405 |
Breast Cancer Risk Factors DOI Based on the Medical Reports
| No. | Risk Factor | Essential† | Secondary | DOIR | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |||
| 1 | Menopause | 1 | 1 | 1 | 1 | 0.3 | ||||
| 2 | Age group | 1 | 1 | 1 | 1 | 0.9 | ||||
| 3 | Density | 1 | 1 | 1 | 1 | 0.5 | ||||
| 4 | Race | 1 | 1 | 1 | 0.675 | |||||
| 5 | Hispanic | 1 | 1 | 0.25 | ||||||
| 6 | BMI | 1 | 1 | 1 | 1 | 0.3 | ||||
| 7 | agefirst | 1 | 1 | 1 | 1 | 0.5 | ||||
| 8 | nrelbc | 1 | 1 | 1 | 1 | 0.7 | ||||
| 9 | brstproc | 1 | 1 | 0.05 | ||||||
| 10 | Lastmamm | - | ||||||||
| 11 | Surgical menopause | 1 | 0.025 | |||||||
| 12 | Hormone therapy | 1 | 1 | 1 | 1 | 0.5 | ||||
The DOIf of the Breast Cancer Risk Factors
| No. | Risk Factor | DOIf † | STW |
|---|---|---|---|
| 1 | Menopause | 0.3357 | 1 |
| 2 | Age group | 0.65751 | 4 |
| 3 | Density | 0.4156 | 1 |
| 4 | Race | 0.49253 | 3 |
| 5 | Hispanic | 0.21659 | 1 |
| 6 | BMI | 0.30358 | 1 |
| 7 | agefirst | 0.42255 | 2 |
| 8 | nrelbc | 0.572 | 3 |
| 9 | brstproc | 0.1751 | 1 |
| 10 | lastmamm | 0.16511 | 1 |
| 11 | Surgical menopause | 0.09712 | 1 |
| 12 | Hormone therapy | 0.45254 | 3 |
†Numbers 1-12 indicates the weight order.
Evaluation of the Risk Estimation Model Using the Weighted and Non-Weighted Version of the Risk Factors
| Data status | Majority class FNR | Minor class | Majority class FDR | Minor class | Overall Validation | Training Time |
| With Weighting | 3.30% | 10.50% | 1.80% | 17.50% | 95.70% | 38.65 |
| Without Weighting | 8.30% | 28.10% | 5.00% | 40.10% | 88.80% | 41.09 |
Evaluation of the Risk Estimation Model Using Different Selections of the Weighted Risk Factor
| Deleted risk factor | Majority class FNR | Minor class FNR | Majority class FDR | Minor class FDR | Overall Validation Accuracy | Training Time |
|---|---|---|---|---|---|---|
| Age | 3.50% | 16.40% | 2.80% | 19.60% | 94.60% | 36.05 |
| Race | 6.40% | 21.60% | 3.80% | 32.10% | 91.40% | 43.23 |
| Nrelbc | 4.90% | 16.10% | 2.80% | 25.40% | 93.50% | 39.09 |
| Hormone Therapy | 4.00% | 13.10% | 2.30% | 21.20% | 94.70% | 38.10 |
| surgmeno | 3.90% | 13.10% | 2.30% | 20.80% | 94.70% | 38.20 |
| lastmamo | 4.70% | 15.30% | 2.70% | 24.40% | 93.80% | 40.32 |
| brstproc | 4.30% | 13.70% | 2.40% | 22.60% | 94.30% | 37.12 |
| agefirst | 4.80% | 17.40% | 3.00% | 25.50% | 93.30% | 41.37 |
| bmi | 4.90% | 16.60% | 2.90% | 25.70% | 93.40% | 39.70 |
| Hispanic | 4.90% | 16.50% | 2.90% | 25.60% | 93.40% | 41.37 |
| Density | 4.40% | 15.10% | 2.60% | 23.00% | 94.10% | 40.20 |
| menopause | 3.50% | 11.30% | 2.00% | 18.80% | 95.30% | 39.20 |
| Age & Race | 6.60% | 33.50% | 5.80% | 36.60% | 89.50% | 36.07 |
| Race & Nrelbc | 7.80% | 30.50% | 5.40% | 39.50% | 88.90% | 36.20 |
| Nrelbc & Age & Race | 5.50% | 50.50% | 8.40% | 39.40% | 87.90% | 33.59 |
| menopaus & brstproc & surgmeno | 5.40% | 20.90% | 3.70% | 28.50% | 92.30% | 33.70 |
The Effect of Down-Weighting the Weak-Impact Risk Factors on the Performance of Risk Estimation Model on the Oversampled Risk Database
| Down-scaling | Majority class FNR | Minor class | Majority class FDR | Minor class | Overall |
|---|---|---|---|---|---|
| menopause=0.5 | 3.10% | 10.80% | 1.90% | 16.60% | 95.80% |
| menopause=0.5,brstproc=0.2 | 3.10% | 10.70% | 1.90% | 17.00% | 95.80% |
| menopause=0.5,brstproc=0.2, ,lastmamm=0.2 | 3.20% | 11.50% | 2% | 17.50% | 95.60% |
| brstproc=0.2,lastmamm=0.3, surgmeno=0.2 | 3.30% | 10.10% | 1.80% | 17.60% | 95.70% |
| menopause=0.5,brstproc=0.2, lastmamm=0.3,surgmeno=0.2 | 3.10% | 11% | 1.90% | 17% | 95.70% |
| menopause=0.5,Density=0.3, brstproc=0.2,lastmamm=0.3, surgmeno=0.2 | 3.10% | 10.70% | 1.90% | 16.60% | 95.80% |
Figure 3Effect of Scaling Risk Factors on the Performance of Risk Estimation Model on Three Different Balanced Database