| Literature DB >> 34849027 |
Wen-Cai Liu1,2, Ming-Xuan Li2, Wen-Xing Qian3, Zhi-Wen Luo1,4, Wei-Jie Liao1,4, Zhi-Li Liu1,4, Jia-Ming Liu1,4.
Abstract
OBJECTIVE: This study aimed to develop and validate a machine learning model for predicting bone metastases (BM) in prostate cancer (PCa) patients.Entities:
Keywords: SEER; bone metastasis; machine learning; prediction model; prostate cancer
Year: 2021 PMID: 34849027 PMCID: PMC8627242 DOI: 10.2147/CMAR.S330591
Source DB: PubMed Journal: Cancer Manag Res ISSN: 1179-1322 Impact factor: 3.989
Figure 1Flow diagram of the study population selected from the Surveillance, Epidemiology, and End Results (SEER) database and the First Affiliated Hospital of Nanchang University. According to the inclusion and exclusion criteria, a total of 207,137 patients of SEER were included in this study, and they were randomly cut into the training and internal test sets in a 7:3 ratio. Data from the First Affiliated Hospital of Nanchang University as an external test set.
Clinical and Pathological Characteristics of Study Population
| Variables | ALL | NBM | BM |
|---|---|---|---|
| N=207137 | N=200412 | N=6725 | |
| ≤39 | 100 (<0.1%) | 97 (<0.1%) | 3 (<0.1%) |
| 40–49 | 5902 (2.8%) | 5760 (2.9%) | 142 (2.1%) |
| 50–59 | 46,532 (22.5%) | 45,478 (22.7%) | 1054 (15.7%) |
| 60–69 | 92,231 (44.5%) | 89,924 (44.9%) | 2307 (34.3%) |
| ≥70 | 62,372 (30.1%) | 59,153 (29.5%) | 3219 (47.9%) |
| American Indian/Alaska Native | 813 (0.4%) | 778 (0.4%) | 35 (0.5%) |
| Asian or Pacific Islander | 10,032 (4.8%) | 9673 (4.8%) | 359 (5.3%) |
| Black | 33,822 (16.3%) | 32,574 (16.3%) | 1248 (18.6%) |
| White | 162,470 (78.4%) | 157,387 (78.5%) | 5083 (75.6%) |
| Grade I | 20,130 (9.7%) | 20,071 (10.0%) | 59 (0.9%) |
| Grade II | 254 (0.1%) | 212 (0.1%) | 42 (0.6%) |
| Grade III | 102,729 (49.6%) | 96,633 (48.2%) | 6096 (90.6%) |
| Grade IV | 84,024 (40.6%) | 83,496 (41.7%) | 528 (7.9%) |
| T1 | 87,738 (42.4%) | 85,198 (42.5%) | 2540 (37.8%) |
| T2 | 90,974 (43.9%) | 88,612 (44.2%) | 2362 (35.1%) |
| T3 | 26,494 (12.8%) | 25,520 (12.7%) | 974 (14.5%) |
| T4 | 1931 (0.9%) | 1082 (0.5%) | 849 (12.6%) |
| N0 | 199,991 (96.6%) | 195,282 (97.4%) | 4709 (70.0%) |
| N1 | 7146 (3.4%) | 5130 (2.6%) | 2016 (30.0%) |
| 6.5 [4.8;10.4]* | 6.4 [4.8;9.9]* | 77.4 [20.6;98.0]* | |
| ≤6 | 82,688 (39.9%) | 82,508 (41.2%) | 180 (2.7%) |
| 7 | 83,693 (40.4%) | 82,688 (41.3%) | 1005 (14.9%) |
| 8 | 21,937 (10.6%) | 20,265 (10.1%) | 1672 (24.9%) |
| ≥9 | 18,819 (9.1%) | 14,951 (7.5%) | 3868 (57.5%) |
| Married | 181,913 (87.8%) | 176,389 (88.0%) | 5524 (82.1%) |
| Unmarried | 25,224 (12.2%) | 24,023 (12.0%) | 1201 (17.9%) |
Note: *Median [interquartile range, IQR].
Abbreviations: BM, bone metastasis; NBM, no bone metastasis; PSA, prostate specific antigen.
Clinical and Pathological Characteristics of Training Set and Test Set
| Variables | Training Set | BM(%)(n=4735) | Internal Test Set | BM(%)(n=1990) | External Test Set | BM(%)(n=117) |
|---|---|---|---|---|---|---|
| NBM(%)(n=140,260) | NBM(%)(n=60,152) | NBM(%)(n=527) | ||||
| ≤39 | 67 (<0.1) | 2 (<0.1) | 30 (0.1) | 1 (<0.1) | 0 (0.0) | 0 (0.0) |
| 40–49 | 4028 (2.8) | 101 (2.1) | 1732 (2.9) | 41 (2.1) | 8 (1.5) | 2 (1.7) |
| 50–59 | 31,862 (22.7) | 762 (16.1) | 13,616 (22.6) | 292 (14.7) | 91 (17.3) | 19 (16.2) |
| 60–69 | 62,724 (44.7) | 1626 (34.3) | 27,200 (45.2) | 681 (34.2) | 211 (40.0) | 27 (23.1) |
| ≥70 | 41,579 (29.6) | 2244 (47.4) | 17,574 (29.2) | 975 (49.0) | 217 (41.1) | 69 (59.0) |
| American Indian/Alaska Native | 557 (0.4) | 24 (0.5) | 221 (0.4) | 11 (0.6) | 0 (0) | 0 (0) |
| Asian or Pacific Islander | 6720 (4.8) | 246 (5.2) | 2953 (4.9) | 113 (5.7) | 527 (100) | 117 (100) |
| Black | 22,724 (16.2) | 868 (18.3) | 9850 (16.4) | 380 (19.1) | 0 (0) | 0 (0) |
| White | 110,259 (78.6) | 3597 (76.0) | 47,128 (78.3) | 1486 (74.7) | 0 (0) | 0 (0) |
| Grade I | 14,050 (10.0) | 39 (0.8) | 6021 (10.0) | 20 (1.0) | 10 (1.9) | 2 (1.7) |
| Grade II | 152 (0.1) | 29 (0.6) | 60 (0.1) | 13 (0.7) | 1 (0.2) | 1 (0.9) |
| Grade III | 67,619 (48.2) | 4291 (90.6) | 29,014 (48.2) | 1805 (90.7) | 310 (58.8) | 108 (92.3) |
| Grade IV | 58,439 (41.7) | 376 (7.9) | 25,057 (41.7) | 152 (7.6) | 206 (39.1) | 6 (5.1) |
| T1 | 59,594 (42.5) | 1773 (37.4) | 25,604 (42.6) | 767 (38.5) | 292 (55.4) | 41 (35.0) |
| T2 | 62,041 (44.2) | 1675 (35.4) | 26,571 (44.2) | 687 (34.5) | 193 (36.6) | 34 (29.1) |
| T3 | 17,844 (12.7) | 684 (14.4) | 7676 (12.8) | 290 (14.6) | 41 (7.8) | 26 (22.2) |
| T4 | 781 (0.6) | 603 (12.7) | 301 (0.5) | 246 (12.4) | 1 (0.2) | 16 (13.7) |
| N0 | 136,654 (97.4) | 3325 (70.2) | 58,628 (97.5) | 1384 (69.5) | 521 (98.9) | 85 (72.6) |
| N1 | 3606 (2.5) | 1410 (29.8) | 1524 (2.5) | 606 (30.5) | 6 (1.1) | 32 (27.4) |
| 6.4 (5.1)* | 77.0 (77.9)* | 6.4 (5.1)* | 79.6 (76.3)* | 7.4 (6.8)* | 98.0 (62.9)* | |
| ≤6 | 57,814 (41.2) | 124 (2.6) | 24,694 (41.1) | 56 (2.8) | 224 (42.5) | 3 (2.6) |
| 7 | 57,791 (41.2) | 772 (15.2) | 24,897 (41.4) | 283 (14.2) | 195 (37.0) | 14 (12.0) |
| 8 | 14,120 (10.1) | 1153 (24.4) | 6145 (10.2) | 519 (26.1) | 64 (12.1) | 27 (23.1) |
| ≥9 | 10,535 (7.5) | 2736 (57.8) | 4416 (7.3) | 1132 (56.9) | 44 (8.3) | 73 (62.4) |
| Married | 123,372 (88.0) | 3925 (82.9) | 53,017 (88.1) | 1599 (80.4) | 489 (92.8) | 15 (12.8) |
| Unmarried | 16,888 (12.0) | 810 (17.1) | 7135 (11.9) | 391 (19.6) | 38 (7.2) | 102 (87.2) |
Note: *Median (interquartile range, IQR).
Abbreviations: BM, bone metastasis; NBM, no bone metastasis; PSA, prostate specific antigen.
Univariate Analysis and Multivariate Logistic Regression Analysis of Variables
| Variables | Univariate Analysis | Multivariate Logistic Analysis | P value |
|---|---|---|---|
| <0.001* | |||
| ≤39 | Reference | ||
| 40–49 | 0.555 (0.076–4.060) | 0.562 | |
| 50–59 | 0.508 (0.071–3.661) | 0.502 | |
| 60–69 | 0.518 (0.072–3.370) | 0.514 | |
| ≥70 | 0.632 (0.088–4.555) | 0.650 | |
| <0.001* | |||
| American Indian/Alaska Native | Reference | ||
| Asian or Pacific Islander | 0.938 (0.548–1.603) | 0.814 | |
| Black | 0.896 (0.533–1.506) | 0.677 | |
| White | 1.214 (0.727–2.030) | 0.459 | |
| <0.001* | |||
| Grade I | Reference | ||
| Grade II | 0.709 (0.368–1.365) | 0.303 | |
| Grade III | 0.773 (0.519–1.150) | 0.204 | |
| Grade IV | 0.840 (0.576–1.226) | 0.367 | |
| <0.001* | |||
| T1 | Reference | ||
| T2 | 0.917 (0.843–0.997) | 0.043* | |
| T3 | 0.511 (0.456–0.572) | <0.001* | |
| T4 | 2.015 (2.641–3.231) | <0.001* | |
| <0.001* | |||
| N0 | Reference | ||
| N1 | 2.921 (2.641–3.231) | <0.001* | |
| <0.001* | |||
| ≤6 | Reference | ||
| 7 | 4.502 (3.563–5.689) | <0.001* | |
| 8 | 15.828 (12.238–20.472) | <0.001* | |
| ≥9 | 32.566 (25.243–42.014) | <0.001* | |
| <0.001* | 1.039 (1.038–1.040) | <0.001* | |
| <0.001* | |||
| Unmarried | Reference | ||
| Married | <0.001 | 0.965 (0.869–1.072) | 0.511 |
Note: *P < 0.05.
Abbreviations: BM, bone metastasis; NBM, no bone metastasis; PSA, prostate specific antigen.
Figure 2Results of correlation analysis between all variables. The heat map shows the correlation between the variables.
Figure 3Feature importance of different models. The plot shows the ranking of the relevant importance of features in all models.
Comparison Prediction Performances of Different Models for Bone Metastasis
| Models | AUC | Accuracy | Sensitivity (Recall Rate) | Specificity | |
|---|---|---|---|---|---|
| 0.938 | 0.833 | 0.883 | 0.831 | ||
| 0.903 | 0.849 | 0.867 | 0.848 | ||
| 0.947 | 0.876 | 0.898 | 0.875 | ||
| 0.941 | 0.880 | 0.885 | 0.879 | ||
| 0.950 | 0.879 | 0.902 | 0.879 | ||
| 0.955 | 0.881 | 0.905 | 0.880 | ||
| 0.944 | 0.877 | 0.846 | 0.884 | ||
| 0.905 | 0.849 | 0.867 | 0.848 | ||
| 0.950 | 0.874 | 0.906 | 0.867 | ||
| 0.934 | 0.869 | 0.914 | 0.859 | ||
| 0.949 | 0.874 | 0.880 | 0.873 | ||
| 0.962 | 0.884 | 0.906 | 0.879 |
Abbreviations: DT, Decision tree; LR, Logistic regression; MLP, Multilayer Perceptron; NBC, Naive Bayes classification; RF, Random Forest; XGB, eXtreme gradient boosting.
Figure 4Ten-fold cross-validation results of different machine learning models in the training set.
Figure 5The roc curves of different machine learning models in internal test set and external test set.
Figure 6Prediction performances of different models.
Figure 7Prediction results of the different models. The heat map shows the predicted results of all models versus the actual situation in internal test set and external test set. Each column in the heat map represents the models’ predicted results of bone metastases for all patients in the dataset. Dark colors represent bone metastases cases and light colors are non-bone metastases.
Figure 8The machine learning model-based web predictor for predicting bone metastases in prostate cancer patients.