| Literature DB >> 32685470 |
Zhangheng Huang1, Chuan Hu1,2, Changxing Chi3, Zhe Jiang4, Yuexin Tong1, Chengliang Zhao1.
Abstract
Non-small-cell lung cancer (NSCLC) patients often develop bone metastases (BM), and the overall survival for these patients is usually perishing. However, a model with high accuracy for predicting the survival of NSCLC with BM is still lacking. Here, we aimed to establish a model based on artificial intelligence for predicting the 1-year survival rate of NSCLC with BM by using extreme gradient boosting (XGBoost), a large-scale machine learning algorithm. We selected NSCLC patients with BM between 2010 and 2015 from the Surveillance, Epidemiology, and End Results database. In total, 5973 cases were enrolled and divided into the training (n = 4183) and validation (n = 1790) sets. XGBoost, random forest, support vector machine, and logistic algorithms were used to generate predictive models. Receiver operating characteristic curves were used to evaluate and compare the predictive performance of each model. The parameters including tumor size, age, race, sex, primary site, histological subtype, grade, laterality, T stage, N stage, surgery, radiotherapy, chemotherapy, distant metastases to other sites (lung, brain, and liver), and marital status were selected to construct all predictive models. The XGBoost model had a better performance in both training and validation sets as compared with other models in terms of accuracy. Our data suggested that the XGBoost model is the most precise and personalized tool for predicting the 1-year survival rate for NSCLC patients with BM. This model can help the clinicians to design more rational and effective therapeutic strategies.Entities:
Mesh:
Year: 2020 PMID: 32685470 PMCID: PMC7338972 DOI: 10.1155/2020/3462363
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Demographic and clinicopathologic features of 5973 NSCLC patients with BM in the SEER database.
| Variables | Training set | Validation set |
|
|
|---|---|---|---|---|
| Age (mean ± SD) | 66.41 ± 11.01 | 66.51 ± 10.92 | 0.336 | 0.737 |
| Size (mean ± SD) | 51.79 ± 34.44 | 52.20 ± 33.81 | 0.423 | 0.673 |
| Race | 1.191 | 0.551 | ||
| Black | 519 (12.4%) | 221 (12.3%) | ||
| Other | 392 (9.4%) | 184 (10.3%) | ||
| White | 3272 (78.2%) | 1385 (77.4%) | ||
| Sex | 0.146 | 0.702 | ||
| Female | 1789 (42.8%) | 756 (42.2%) | ||
| Male | 2394 (57.2%) | 1034 (57.8%) | ||
| Primary site | 0.350 | 0.950 | ||
| Main bronchus | 173 (4.1%) | 79 (4.4%) | ||
| Overlapping lesion of lung | 31 (0.7%) | 12 (0.7%) | ||
| Lung, NOS | 182 (4.4%) | 76 (4.2%) | ||
| Lobe | 3797 (90.8%) | 1623 (90.7%) | ||
| Histologic type | 6.010 | 0.050 | ||
| ADC | 2688 (64.3%) | 1163 (65.0%) | ||
| Others | 558 (13.3%) | 269 (15.0%) | ||
| SCC | 937 (22.4%) | 358 (20.0%) | ||
| Grade | 2.367 | 0.500 | ||
| I | 217 (5.2%) | 86 (4.8%) | ||
| II | 1259 (30.1%) | 536 (29.9%) | ||
| III | 2596 (62.1%) | 1131 (63.2%) | ||
| IV | 111 (2.7%) | 37 (2.1%) | ||
| Laterality | 0.277 | 0.599 | ||
| Left—origin of primary | 1750 (41.8%) | 762 (42.6%) | ||
| Right—origin of primary | 2433 (58.2%) | 1028 (57.4%) | ||
| T stage | 5.837 | 0.120 | ||
| T1 | 446 (10.7%) | 184 (10.3%) | ||
| T2 | 1168 (27.9%) | 555 (31.0%) | ||
| T3 | 1139 (27.2%) | 469 (26.2%) | ||
| T4 | 1430 (34.2%) | 582 (32.5%) | ||
| N stage | 3.715 | 0.294 | ||
| N0 | 887 (21.2%) | 343 (19.2%) | ||
| N1 | 373 (8.9%) | 170 (9.5%) | ||
| N2 | 2039 (48.7%) | 902 (50.4%) | ||
| N3 | 884 (21.1%) | 375 (20.9%) | ||
| M stage | 0.916 | 0.339 | ||
| M1a | 110 (2.6%) | 55 (3.1%) | ||
| M1b | 4073 (97.4%) | 1735 (96.9%) | ||
| Radiotherapy | 1.270 | 0.260 | ||
| No | 1734 (41.5%) | 714 (39.9%) | ||
| Yes | 2449 (58.5%) | 1076 (60.1%) | ||
| Chemotherapy | 0.099 | 0.753 | ||
| No | 1490 (35.6%) | 630 (35.2%) | ||
| Yes | 2693 (64.4%) | 1160 (64.8%) | ||
| Surgery | 1.452 | 0.228 | ||
| No | 4013 (95.9%) | 1729 (96.6%) | ||
| Yes | 170 (4.1%) | 61 (3.4%) | ||
| Brain metastasis | 2.261 | 0.133 | ||
| No | 3239 (77.4%) | 1354 (75.6%) | ||
| Yes | 944 (22.6%) | 436 (24.4%) | ||
| Liver metastasis | 0.002 | 0.960 | ||
| No | 3330 (79.6%) | 1426 (79.7%) | ||
| Yes | 853 (20.4%) | 364 (20.3%) | ||
| Lung metastasis | 0.576 | 0.448 | ||
| No | 2999 (71.7%) | 1266 (70.7%) | ||
| Yes | 1184 (28.3%) | 524 (29.3%) | ||
| Insurance status | 1.036 | 0.309 | ||
| Insured | 4059 (97.0%) | 1728 (96.5%) | ||
| Uninsured | 124 (3.0%) | 62 (3.5%) | ||
| Marital status | 0.099 | 0.753 | ||
| Married | 2412 (57.7%) | 1040 (58.1%) | ||
| Unmarried | 1771 (42.3%) | 750 (41.9%) |
NSCLC: non-small-cell lung cancer; BM: bone metastasis; ADC: adenocarcinoma; SCC: squamous cell carcinoma.
Univariate analysis and multivariate logistic analysis based on all variables for 1-year survival (training cohort).
| Characteristics | Univariate analysis | Multivariate logistic analysis | |
|---|---|---|---|
|
| HR (95% CI) |
| |
| Age | <0.001 | 1.015 (1.008–1.022) | <0.001 |
| Size | <0.001 | 1.008 (1.004–1.011) | <0.001 |
| Race | |||
| Black | <0.001 | Reference | |
| Other | 0.423 (0.311–0.575) | <0.001 | |
| White | 1.073 (0.851–1.352) | 0.552 | |
| Sex | |||
| Female | <0.001 | Reference | |
| Male | 1.265 (1.088–1.470) | <0.05 | |
| Primary site | |||
| Main bronchus | <0.05 | ||
| Overlapping lesion of lung | |||
| Lung, NOS | |||
| Lobe | |||
| Histologic type | |||
| ADC | <0.001 | Reference | |
| Others | 1.746 (1.351–2.255) | <0.001 | |
| SCC | 1.524 (1.246–1.865) | <0.001 | |
| Grade | |||
| I | <0.001 | Reference | |
| II | 1.037 (0.751–1.433) | 0.824 | |
| III | 1.653 (1.206-2.266) | <0.05 | |
| IV | 2.454 (1.292-4.659) | <0.05 | |
| Laterality | |||
| Left—origin of primary | 0.752 | ||
| Right—origin of primary | |||
| T stage | |||
| T1 | <0.001 | ||
| T2 | |||
| T3 | |||
| T4 | |||
| N stage | |||
| N0 | <0.001 | Reference | |
| N1 | 1.198 (0.901-1.592) | 0.215 | |
| N2 | 1.636 (1.351-1.982) | <0.001 | |
| N3 | 1.816 (1.443-2.284) | <0.001 | |
| M stage | |||
| M1a | 0.791 | ||
| M1b | |||
| Radiotherapy | |||
| No | 0.162 | ||
| Yes | |||
| Surgery | |||
| No | <0.001 | Reference | |
| Yes | 0.438 (0.311-0.617) | <0.001 | |
| Chemotherapy | |||
| No | <0.001 | Reference | |
| Yes | 0.211 (0.174 -0.256) | <0.001 | |
| Brain metastasis | |||
| No | 0.866 | ||
| Yes | |||
| Liver metastasis | |||
| No | <0.001 | Reference | |
| Yes | 1.948 (1.589-2.388) | <0.001 | |
| Lung metastasis | |||
| No | 0.949 | ||
| Yes | |||
| Insurance status | |||
| Insured | 0.732 | ||
| Uninsured | |||
| Marital status | |||
| Married | <0.001 | ||
| Unmarried | |||
ADC: adenocarcinoma; SCC: squamous cell carcinoma.
Figure 1Nomogram to predict the 1-year survival of NSCLC with BM.
Figure 2Heatmap of pair correlations. In dark blue are statistically positive significant correlations; in light yellow statistically significant inverse correlations, and in light blue not statistically significant correlations.
Figure 3Results of feature importance for using XGBoost. The bar plots present the exact number of times the top features are selected.
Figure 4ROC curves showing the predictions of the four models: XGBoost, SVM, RF, and logistic. (a) The training set; (b) the internal validation set.
Figure 5Comparison of prediction accuracy between XGBoost model and independent prognostic factor. (a) The training set; (b) the validation set.
Figure 6ROC curve of the XGBoost model in predicting the 1-year survival of NSCLC with BM in the external validation set.