| Literature DB >> 35875050 |
Wen-Cai Liu1,2, Ming-Xuan Li2, Shi-Nan Wu2, Wei-Lai Tong1,3, An-An Li1,3, Bo-Lin Sun1,3, Zhi-Li Liu1,3, Jia-Ming Liu1,3.
Abstract
Breast cancer (BC) was the most common malignant tumor in women, and breast infiltrating ductal carcinoma (IDC) accounted for about 80% of all BC cases. BC patients who had bone metastases (BM) were more likely to have poor prognosis and bad quality of life, and earlier attention to patients at a high risk of BM was important. This study aimed to develop a predictive model based on machine learning to predict risk of BM in patients with IDC. Six different machine learning algorithms, including Logistic regression (LR), Naive Bayes classifiers (NBC), Decision tree (DT), Random Forest (RF), Gradient Boosting Machine (GBM), and Extreme gradient boosting (XGB), were used to build prediction models. The XGB model offered the best predictive performance among these 6 models in internal and external validation sets (AUC: 0.888, accuracy: 0.803, sensitivity: 0.801, and specificity: 0.837). Finally, an XGB model-based web predictor was developed to predict risk of BM in IDC patients, which may help physicians make personalized clinical decisions and treatment plans for IDC patients.Entities:
Keywords: bone metastases; breast cancer; infiltrating ductal carcinoma; machine learning; prediction
Mesh:
Year: 2022 PMID: 35875050 PMCID: PMC9298922 DOI: 10.3389/fpubh.2022.922510
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Flow diagram of the study population selected from the Surveillance, Epidemiology, and End Results (SEER) database and the First Affiliated Hospital of Nanchang University. According to the inclusion and exclusion criteria, a total of 311,408 patients of SEER were included in this study, and they were randomly cut into the training and internal test sets in a 7:3 ratio. Data from the First Affiliated Hospital of Nanchang University (n = 1,243) as an external test set.
Clinical and pathological characteristics of study population.
|
|
|
|
|
|---|---|---|---|
|
|
|
| |
|
| |||
| <50 | 65,967 (21.2%) | 64,129 (21.1%) | 1,838 (23.6%) |
| ≥50 | 245,441 (78.8%) | 239,505 (78.9%) | 5,936 (76.4%) |
|
| |||
| Female | 308,805 (99.2%) | 301,162 (99.2%) | 7,643 (98.3%) |
| Male | 2,603 (0.8%) | 2,472 (0.8%) | 131 (1.7%) |
|
| |||
| American Indian/Alaska Native | 1,848 (0.6%) | 1,809 (0.6%) | 39 (0.5%) |
| Asian or Pacific Islander | 28,929 (9.3%) | 28,312 (9.3%) | 617 (7.9%) |
| Black | 35,011 (11.2%) | 33,749 (11.1%) | 1,262 (16.2%) |
| White | 245,620 (78.9%) | 239,764 (79.0%) | 5,856 (75.3%) |
|
| |||
| Grade I (well differentiated) | 65,791 (21.1%) | 65,252 (21.5%) | 539 (6.9%) |
| Grade II (moderately differentiated) | 132,463 (42.5%) | 128,913 (42.5%) | 3,550 (45.7%) |
| Grade III (poorly differentiated) | 112,515 (36.1%) | 108,858 (35.9%) | 3,657 (47.0%) |
| Grade IV (undifferentiated) | 639 (0.2%) | 611 (0.2%) | 28 (0.4%) |
|
| |||
| HR-/HER2- (triple negative) | 38,740 (12.4%) | 37,927 (12.5%) | 813 (10.5%) |
| HR-/HER2+ (HER2 enriched) | 15,803 (5.1%) | 15,246 (5.0%) | 557 (7.2%) |
| HR+/HER2- (Luminal A) | 219,700 (70.6%) | 214,764 (70.7%) | 4,936 (63.5%) |
| HR+/HER2+ (Luminal B) | 37,165 (11.9%) | 35,697 (11.8%) | 1,468 (18.9%) |
|
| |||
| T1 | 191,204 (61.4%) | 190,153 (62.6%) | 1,051 (13.5%) |
| T2 | 93,067 (29.9%) | 90,289 (29.7%) | 2,778 (35.7%) |
| T3 | 15,307 (4.9%) | 14,031 (4.6%) | 1,276 (16.4%) |
| T4 | 11,830 (3.8%) | 9,161 (3.0%) | 2,669 (34.3%) |
|
| |||
| N0 | 215,120 (69.1%) | 213,308 (70.3%) | 1,812 (23.3%) |
| N1 | 72,080 (23.1%) | 68,326 (22.5%) | 3,754 (48.3%) |
| N2 | 15,459 (5.0%) | 14,400 (4.7%) | 1,059 (13.6%) |
| N3 | 8,749 (2.8%) | 7,600 (2.5%) | 1,149 (14.8%) |
|
| |||
| Left | 157,489 (50.6%) | 153,472 (50.5%) | 4,017 (51.7%) |
| Right | 153,919 (49.4%) | 150,162 (49.5%) | 3,757 (48.3%) |
|
| |||
| Married | 261,707 (84.0%) | 255,829 (84.3%) | 5,878 (75.6%) |
| Unmarried | 49,701 (16.0%) | 47,805 (15.7%) | 1,896 (24.4%) |
BM, Bone metastasis; NBM, No bone metastasis; HR, hormone receptor; HER2, human epidermal growth factor receptor 2.
Clinical and pathological characteristics of training set and test set.
|
|
|
|
| |||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| <50 | 45,017 (21.2) | 1,270 (23.2) | 19,112 (21.0) | 568 (24.1) | 358 (33.5) | 69 (39.4) |
| ≥50 | 167,550 (78.8) | 4,148 (76.8) | 71,955 (79.0) | 1,788 (75.9) | 710 (66.5) | 106 (60.6) |
|
| ||||||
| Female | 210,814 (99.2) | 5,322 (98.2) | 90,348 (99.2) | 2,321 (98.5) | 1,064 (99.6) | 173 (98.9) |
| Male | 1,753 (0.8) | 96 (1.8) | 719 (0.8) | 35 (1.5) | 4 (0.4) | 2 (1.1) |
|
| ||||||
| American Indian/Alaska Native | 1,267 (0.6) | 24 (0.4) | 542 (0.6) | 15 (0.6) | 0 (0.0) | 0 (0.0) |
| Asian or Pacific Islander | 19,981 (9.4) | 415 (7.7) | 8,331 (9.1) | 202 (8.6) | 1,068 (100.0) | 175 (100.0) |
| Black | 23,645 (11.1) | 888 (16.4) | 10,104 (11.1) | 374 (15.9) | 0 (0.0) | 0 (0.0) |
| White | 167,674 (78.9) | 4,091 (75.5) | 72,090 (79.2) | 1,765 (74.9) | 0 (0.0) | 0 (0.0) |
|
| ||||||
| Grade I (well differentiated) | 45,558 (21.4) | 385 (7.1) | 19,694 (21.6) | 154 (6.5) | 202 (18.9) | 11 (6.3) |
| Grade II (moderately differentiated) | 90,360 (42.5) | 2,461 (45.4) | 38,553 (32,631) | 1,090 (46.3) | 494 (46.3) | 82 (46.9) |
| Grade III (poorly differentiated) | 76,227 (35.9) | 2,555 (47.2) | 32,631 (35.8) | 1,102 (46.8) | 369 (34.6) | 81 (46.3) |
| Grade IV (undifferentiated) | 422 (0.2) | 18 (0.3) | 189 (0.2) | 10 (0.4) | 3 (0.3) | 1 (0.6) |
|
| ||||||
| HR-/HER2- (triple negative) | 26,586 (12.5) | 569 (10.5) | 11,341 (12.5) | 244 (10.4) | 95 (8.9) | 9 (5.1) |
| HR-/HER2+ (HER2 enriched) | 10,737 (5.1) | 377 (7.0) | 4,509 (5.0) | 180 (7.6) | 73 (6.8) | 16 (9.1) |
| HR+/HER2- (Luminal A) | 150,391 (70.7) | 3,453 (63.7) | 64,373 (70.7) | 1,483 (62.9) | 765 (71.6) | 118 (67.4) |
| HR+/HER2+ (Luminal B) | 24,853 (11.7) | 1,019 (18.8) | 10,844 (11.9) | 449 (19.1) | 135 (12.6) | 32 (18.3) |
|
| ||||||
| T1 | 132,936 (62.5) | 750 (13.8) | 57,217 (62.8) | 301 (12.8) | 639 (59.8) | 12 (6.9) |
| T2 | 63,445 (29.8) | 1,929 (35.6) | 26,844 (29.5) | 849 (36.0) | 350 (32.8) | 62 (35.4) |
| T3 | 9,829 (4.6) | 866 (16.0) | 4,202 (4.6) | 410 (17.4) | 52 (4.9) | 33 (18.9) |
| T4 | 6,357 (3.0) | 1,873 (34.6) | 2,804 (3.1) | 796 (33.8) | 27 (2.5) | 68 (38.9) |
|
| ||||||
| N0 | 149,389 (70.3) | 1,261 (23.3) | 63,919 (70.2) | 551 (23.4) | 746 (69.9 | 41 (23.4) |
| N1 | 47,801 (22.5) | 2,630 (48.5) | 20,525 (22.5) | 1,124 (47.7) | 241 (22.6) | 79 (45.1) |
| N2 | 10,019 (4.7) | 722 (13.3) | 4,381 (4.8) | 337 (14.3) | 60 (5.6) | 30 (17.1) |
| N3 | 5,358 (2.5) | 805 (14.9) | 2,242 (2.5) | 344 (14.6) | 21 (2.0) | 25 (14.3) |
|
| ||||||
| Left | 107,409 (50.5) | 2,763 (51.0) | 46,063 (50.6) | 1,254 (53.2) | 523 (49.0) | 95 (54.3) |
| Right | 105,158 (49.5) | 2,655 (49.0) | 45,004 (49.4) | 1,102 (46.8) | 545 (51.0) | 80 (45.7) |
|
| ||||||
| Married | 179,137 (84.3) | 4,079 (75.3) | 76,692 (84.2) | 1,799 (76.4) | 915 (85.7) | 133 (76.0) |
| Unmarried | 33,430 (15.7) | 1,339 (24.7) | 14,375 (15.8) | 557 (23.6) | 153 (14.3) | 42 (24.0) |
BM, Bone metastasis; NBM, No bone metastasis; HR, hormone receptor; HER2, human epidermal growth factor receptor 2.
Univariate analysis and multivariate logistic regression analysis of variables.
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
|
| 12.612 | <0.001 | ||
| <50 | Reference | |||
| ≥50 | 1.122 (1.053–1.196) | <0.001 | ||
|
| 56.360 | <0.001 | ||
| Female | Reference | |||
| Male | 1.185 (0.946–1.485) | 0.140 | ||
|
| 157.012 | <0.001 | ||
| American Indian/Alaska Native | Reference | |||
| Asian or Pacific Islander | 1.179 (0.764–1.818) | 0.456 | ||
| Black | 1.641 (1.071–2.516) | 0.023 | ||
| White | 1.550 (1.016–2.365) | 0.042 | ||
|
| 716.162 | <0.001 | ||
| Grade I (well differentiated) | Reference | |||
| Grade II (moderately differentiated) | 1.546 (1.380–1.732) | <0.001 | ||
| Grade III (poorly differentiated) | 1.201 (1.066–1.353) | 0.003 | ||
| Grade IV (undifferentiated) | 1.314 (0.786–2.197) | 0.298 | ||
|
| 317.014 | <0.001 | ||
| HR-/HER2- (triple negative) | Reference | |||
| HR-/HER2+ (HER2 enriched) | 1.314 (1.143–1.510) | <0.001 | ||
| HR+/HER2- (Luminal A) | 1.767 (1.600–1.951) | <0.001 | ||
| HR+/HER2+ (Luminal B) | 2.003 (1.793–2.239) | <0.001 | ||
|
| 17,446.197 | <0.001 | ||
| T1 | Reference | |||
| T2 | 3.379 (3.464–4.149) | <0.001 | ||
| T3 | 8.810 (7.896–9.830) | <0.001 | ||
| T4 | 27.233 (24.635–30.104) | <0.001 | ||
|
| 6,878.970 | <0.001 | ||
| N0 | Reference | |||
| N1 | 3.061 (2.843–3.295) | <0.001 | ||
| N2 | 2.645 (2.387–2.932) | <0.001 | ||
| N3 | 4.390 (3.953–4.875) | <0.001 | ||
|
| 0.461 | 0.497 | ||
| Left | - | - | ||
| Right | - | - | ||
|
| 318.307 | <0.001 | ||
| Married | Reference | |||
| Unmarried | 1.363 (1.272–1.460) | <0.001 | ||
BM, Bone metastasis; NBM, No bone metastasis; HR, hormone receptor; HER2, human epidermal growth factor receptor 2.
P < 0.05.
Figure 2Results of correlation analysis between all variables.
Figure 3Relative feature importance of different models. The plot showed the ranking of the relative importance of features in all models.
Figure 4Ten-fold cross-validation results of different machine learning models in the training set. DT, Decision tree; LR, Logistic regression; GBM, Gradient Boosting Machine; NBC, Naive Bayes classification; RF, Random Forest; XGB, Extreme gradient boosting.
Figure 6Prediction performances of different machine learning models. (A) Internal validation of different machine learning models. (B) External validation of different machine learning models.
Comparison of prediction performances among different models for bone metastasis.
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| LR | 0.839 | 0.795 | 0.796 | 0.761 | 0.878 | 0.777 | 0.768 | 0.834 |
| NBC | 0.839 | 0.822 | 0.825 | 0.686 | 0.863 | 0.836 | 0.823 | 0.761 |
| DT | 0.831 | 0.658 | 0.662 | 0.836 | 0.863 | 0.660 | 0.713 | 0.849 |
| RF | 0.847 | 0.765 | 0.764 | 0.780 | 0.862 | 0.780 | 0.771 | 0.834 |
| GBM | 0.850 | 0.783 | 0.784 | 0.777 | 0.880 | 0.787 | 0.779 | 0.834 |
| XGB | 0.857 | 0.787 | 0.787 | 0.791 | 0.888 | 0.803 | 0.801 | 0.837 |
DT, Decision tree; LR, Logistic regression; GBM, Gradient Boosting Machine; NBC, Naive Bayes classification; RF, Random Forest; XGB, Extreme gradient boosting.
Figure 7The web calculator for predicting bone metastases in breast infiltrating duct carcinoma patients.