| Literature DB >> 33709570 |
Wen-Cai Liu1,2, Zhi-Qiang Li1,3, Zhi-Wen Luo1,3, Wei-Jie Liao1,3, Zhi-Li Liu1,3, Jia-Ming Liu1,3.
Abstract
OBJECTIVES: This study aimed to establish a machine learning prediction model that can be used to predict bone metastasis (BM) in patients with newly diagnosed thyroid cancer (TC).Entities:
Keywords: SEER; bone metastasis; machine learning; random forest; thyroid cancer
Year: 2021 PMID: 33709570 PMCID: PMC8026946 DOI: 10.1002/cam4.3776
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
FIGURE 1Flow diagram of the study population selected from the Surveillance, Epidemiology, and End Results (SEER) database. Based on the inclusion and exclusion criteria, 17,138 patients were included in this study
FIGURE 2(A) Area under the curve (AUC) values for ntree values from 1 iterates to 500 in the improved random forest model. (B) Ten‐fold cross‐validation of the improved random forest model
Clinical and pathological characteristics of training set and test set
| Variables | Training set | Test set |
| ||
|---|---|---|---|---|---|
| NBM ( | BM ( | NBM ( | BM ( | ||
| Age | 0.498 | ||||
| <50 | 5779 (48.6) | 20 (17.9) | 2510 (49.3) | 4 (7.4) | |
| ≥50 | 6106 (51.4) | 92 (82.1) | 2577 (50.7) | 50 (92.6) | |
| Sex | 0.988 | ||||
| Male | 2996 (25.2) | 53 (47.3) | 1281 (25.2) | 25 (46.3) | |
| Female | 8889 (74.8) | 59 (52.7) | 3806 (74.8) | 29 (53.7) | |
| Race | 0.386 | ||||
| Black | 859 (7.2) | 19 (17.0) | 383 (7.5) | 10 (18.5) | |
| Other | 1403 (11.8) | 9 (8.0) | 627 (12.3) | 8 (14.8) | |
| White | 9626 (81.0) | 84 (75.0) | 4077 (80.1) | 36 (66.7) | |
| Grade | 0.709 | ||||
| Grade I | 9373 (78.9) | 33 (29.5) | 4054 (79.7) | 11 (20.4) | |
| Grade II | 1700 (14.3) | 15 (13.4) | 708 (13.9) | 7 (13.0) | |
| Grade III | 365 (3.1) | 13 (11.6) | 149 (2.9) | 14 (25.9) | |
| Grade IV | 447 (3.8) | 51 (45.5) | 176 (3.5) | 22 (40.7) | |
| Histology | 0.316 | ||||
| ATC | 339 (2.9) | 44 (39.3) | 125 (2.5) | 17 (31.5) | |
| FTC | 747 (6.3) | 26 (23.2) | 266 (5.2) | 10 (18.5) | |
| MTC | 93 (0.8) | 2 (1.8) | 45 (0.9) | 2 (3.7) | |
| PTC | 10,706 (90.1) | 40 (35.7) | 4651 (91.4) | 25 (46.3) | |
| T stage | 0.237 | ||||
| T0 | 5 (0.0) | 1 (0.9) | 1 (0.0) | 0 | |
| T1 | 6571 (55.3) | 8 (7.1) | 2901 (57.0) | 7 (13.0) | |
| T2 | 2030 (17.1) | 13 (11.6) | 802 (15.8) | 2 (3.7 | |
| T3 | 2469 (20.8) | 28 (25.0) | 1068 (21.0) | 13 (24.1) | |
| T4 | 810 (6.8) | 62 (55.4) | 315 (6.2) | 32 (59.3) | |
| N stage | 0.736 | ||||
| N0 | 8799 (74.0) | 53 (47.3) | 3783 (74.4) | 23 (42.6) | |
| N1 | 3086 (26.0) | 59 (52.7) | 1304 (25.6) | 31 (57.4) | |
| Laterality | 0.816 | ||||
| Unilateral | 11,815 (99.4) | 111 (99.1) | 5042 (99.1) | 53 (98.1) | |
| Bilateral | 70 (0.6) | 1 (0.9) | 45 (0.9) | 1 (1.9) | |
| Insurance status | 0.921 | ||||
| Insured | 11,607 (97.7) | 110 (98.2) | 4979 (97.9) | 53 (98.1) | |
| Uninsured | 278 (2.3) | 2 (1.8) | 108 (2.1) | 1 (1.9) | |
| Marital status | 0.926 | ||||
| Married | 9215 (77.5) | 92 (82.1) | 3901 (76.7) | 48 (88.9) | |
| Unmarried | 2670 (22.5) | 20 (17.9) | 1186 (23.3) | 6 (11.1) | |
Abbreviations: ATC, anaplastic thyroid cancer; BM, bone metastasis; FTC, follicular thyroid cancer; MTC, medullary thyroid cancer; NBM, no bone metastasis; PTC, papillary thyroid cancer.
FIGURE 3Results of Pearson correlation analysis between all variables. The heat map shows the correlation between the variables
Multivariable logistic regression model with enter variable selection
| Variables | OR (95% CI) |
|
|---|---|---|
| Age | ||
| <50 | Reference | |
| ≥50 | 2.045 (1.181–3.543) | 0.011 |
| Sex | ||
| Male | Reference | |
| Female | 0.611 (0.411–0.908) | 0.015 |
| Race | ||
| Black | Reference | |
| Other | 0.380 (0.219–0.658) | <0.001 |
| White | 1.296 (0.638–2.633) | 0.473 |
| Grade | ||
| Grade I | Reference | |
| Grade II | 1.079 (0.370–3.146) | 0.89 |
| Grade III | 1.713 (0.563–5.218) | 0.343 |
| Grade IV | 3.318 (1.143–9.707) | 0.029 |
| Histology | ||
| ATC | Reference | |
| FTC | 2.458 (1.028–5.879) | 0.043 |
| MTC | 0.928 (0.203–4.242) | 0.923 |
| PTC | 0.141 (0.079–0.250) | <0.001 |
| T stage | ||
| T0 | Reference | |
| T1 | 0.210 (0.018–2.416) | 0.211 |
| T2 | 2.024 (0.948–4.319) | 0.068 |
| T3 | 3.090 (1.253–7.616) | 0.014 |
| T4 | 8.804 (3.214–24.114) | <0.001 |
| N stage | ||
| N0 | Reference | |
| N1 | 1.935 (1.219–3.072) | 0.005 |
| Laterality | ||
| Unilateral | Reference | |
| Bilateral | 1.287 (0.166–9.987) | 0.809 |
| Insurance status | ||
| Insured | Reference | |
| Uninsured | 0.700 (0.161–3.047) | 0.634 |
| Marital status | ||
| Married | Reference | |
| Unmarried | 0.995 (0.586–1.689) | 0.985 |
Abbreviations: ATC, anaplastic thyroid cancer; FTC, follicular thyroid cancer; MTC, medullary thyroid cancer; PTC, papillary thyroid cancer.
p < 0.05.
FIGURE 4Feature importance derived from random forest model. The plot shows relative importance of the variables in random forest model
Comparison prediction performances of different models for BM
| Models | AUC | Accuracy | Recall rate (sensitivity) | Specificity |
|---|---|---|---|---|
| Initial | ||||
| LR1 | 0.791 | 0.743 | 0.741 | 0.742 |
| RF1 | 0.908 | 0.877 | 0.796 | 0.878 |
| Improved | ||||
| Ada | 0.886 | 0.887 | 0.812 | 0.888 |
| DT | 0.853 | 0.817 | 0.833 | 0.816 |
| LR2 | 0.822 | 0.708 | 0.833 | 0.707 |
| NBC | 0.910 | 0.871 | 0.852 | 0.871 |
| RF2 | 0.917 | 0.904 | 0.833 | 0.905 |
| SVM | 0.752 | 0.739 | 0.685 | 0.740 |
Abbreviations: Ada, AdaBoost classifier; AUC, area under the curve; DT, decision tree; LR1, initial logistic regression; LR2, logistic regression improved; NBC, Naive Bayes classification; RF1, Initial random forest; RF2, Random forest improved; SVM, support vector machine.
FIGURE 5(A) The receiver operating characteristic (ROC) curve of the initial random forest (RF1) model and initial logistic regression (LR1) model. (B) The ROC curve of different improved machine learning models
Prediction results of the improved random forest model
| Predictive | Actual | |
|---|---|---|
| BM | NBM | |
| BM | 45 (TP) | 486 (FP) |
| NBM | 9 (FN) | 4601 (TN) |
Abbreviations: BM, bone metastasis; FN, false negative cases; FP, false positive cases; NBM, no bone metastasis; TN, true negative cases; TP, true positive cases.