| Literature DB >> 36016680 |
Jian Tao1, Ling Wang1, Liyu Zhang1, Zheyun Gu1, Xiaodan Zhou1.
Abstract
The prognosis of multiple myeloma (MM) patients was poor in white-American patients as compared to black-American patients. This study aimed to predict the death of MM patients in whites based on the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database. A total of 28,912 white MM patients were included in this study. Data were randomly divided into a training set and a test set (7 : 3). The random forest and 5-fold cross-validation were used for developing a prediction model. The performance of the model was determined by calculating the area under the curve (AUC) with 95% confidence interval (CI). MM patients in the death group had older age, higher proportion of tumor distant metastasis, bone marrow as the disease site, receiving radiotherapy, and lower proportion of receiving chemotherapy than that in the survival group (all P < 0.001). The AUC of the random forest model in the training set and testing set was 0.741 (95% CI, 0.740-0.741) and 0.703 (95% CI, 0.703-0.704), respectively. In addition, the AUC of the age-based model was 0.688 (95% CI, 0.688-0.689) in the testing set. The results of the DeLong test indicated that the random forest model had better predictive effect than the age-based model (Z = 7.023, P < 0.001). Further validation was performed based on age and marital status. The results presented that the random forest model was robust in different age and marital status. The random forest model had a good performance to predict the death risk of MM patients in whites.Entities:
Year: 2022 PMID: 36016680 PMCID: PMC9398791 DOI: 10.1155/2022/3050199
Source DB: PubMed Journal: Evid Based Complement Alternat Med ISSN: 1741-427X Impact factor: 2.650
Figure 1The flow chart of patients screening. All white patients with MM included in the study were randomly divided into a training set and a test set in a ratio of 7 : 3 for the model establishment and validation, respectively.
Characteristics of included patients in the training set.
| Variables | Total ( |
|---|---|
| Age, years, mean ± SD | 67.28 ± 12.05 |
|
| |
|
| |
| Female | 8640 (42.69) |
| Male | 11598 (57.31) |
|
| |
|
| |
| Married | 12878 (63.63) |
| Single | 2420 (11.96) |
| Widowed | 2950 (14.58) |
| Others | 1990 (9.83) |
|
| |
| Number of malignant tumors in situ, mean ± SD | 1.08 ± 0.30 |
|
| |
| 0 | 20131 (99.47) |
| >0 | 107 (0.53) |
|
| |
|
| |
| Distant | 19155 (94.65) |
| Others | 1083 (5.35) |
|
| |
|
| |
| Bone marrow | 19001 (93.89) |
| Others | 1237 (6.11) |
|
| |
|
| |
| No | 8055 (39.80) |
| Yes | 12183 (60.20) |
|
| |
|
| |
| No | 4412 (21.80) |
| Yes | 15826 (78.20) |
|
| |
|
| |
| Alive | 9437 (46.63) |
| Dead | 10801 (53.37) |
| Overall survival, months, M (Q1, Q3) | 28.00 (11.00, 55.00) |
Univariate analysis between the survival group and the death group in the training set.
| Variables | Total ( | Survival group ( | Death group ( | Statistic |
|
|---|---|---|---|---|---|
| Age, years, mean ± SD | 67.28 ± 12.05 | 62.97 ± 11.52 | 71.04 ± 11.21 |
| <0.001 |
|
| |||||
| Gender, |
| 0.751 | |||
| Female | 8640 (42.69) | 4040 (42.81) | 4600 (42.59) | ||
| Male | 11598 (57.31) | 5397 (57.19) | 6201 (57.41) | ||
|
| |||||
| Marital status, |
| <0.001 | |||
| Married | 12878 (63.63) | 6595 (69.88) | 6283 (58.17) | ||
| Single | 2420 (11.96) | 1182 (12.53) | 1238 (11.46) | ||
| Widowed | 2950 (14.58) | 759 (8.04) | 2191 (20.29) | ||
| Others | 1990 (9.83) | 901 (9.55) | 1089 (10.08) | ||
|
| |||||
| Number of malignant tumors in situ, mean ± SD | 1.08 ± 0.30 | 1.08 ± 0.30 | 1.08 ± 0.30 |
| 0.643 |
|
| |||||
| Number of benign tumors, |
| 0.574 | |||
| 0 | 20131 (99.47) | 9390 (99.50) | 10741 (99.44) | ||
| >0 | 107 (0.53) | 47 (0.50) | 60 (0.56) | ||
|
| |||||
| Metastasis, |
| <0.001 | |||
| Distant | 19155 (94.65) | 8722 (92.42) | 10433 (96.59) | ||
| Others | 1083 (5.35) | 715 (7.58) | 368 (3.41) | ||
|
| |||||
| Disease site, |
| <0.001 | |||
| Bone marrow | 19001 (93.89) | 8652 (91.68) | 10349 (95.82) | ||
| Others | 1237 (6.11) | 785 (8.32) | 452 (4.18) | ||
|
| |||||
| Chemotherapy, |
| <0.001 | |||
| No | 8055 (39.80) | 3480 (36.88) | 4575 (42.36) | ||
| Yes | 12183 (60.20) | 5957 (63.12) | 6226 (57.64) | ||
|
| |||||
| Radiotherapy, |
| <0.001 | |||
| No | 4412 (21.80) | 2177 (23.07) | 2235 (20.69) | ||
| Yes | 15826 (78.20) | 7260 (76.93) | 8566 (79.31) | ||
Figure 2Variable importance of random forest model for predicting the risk of death in white multiple myeloma (MM) patients. The variable importance of random forests indicates which variables contributed the most to the final model.
The performance of the random forest model.
| Group | Parameter (95% CI) | All-variable model | Age-based model |
|---|---|---|---|
|
| |||
| AUC | 0.741 (0.740–0.741) | 0.697 (0.697–0.698) | |
| Accuracy | 0.673 (0.667–0.680) | 0.641 (0.635–0.648) | |
| Sensitivity | 0.612 (0.603–0.621) | 0.533 (0.523–0.542) | |
| Specificity | 0.744 (0.735–0.752) | 0.765 (0.757–0.774) | |
| PPV | 0.732 (0.723–0.741) | 0.722 (0.712–0.732) | |
| NPV | 0.626 (0.617–0.635) | 0.589 (0.580–0.597) | |
|
| |||
|
| |||
| AUC | 0.703 (0.703–0.704) | 0.688 (0.688–0.689) | |
| Accuracy | 0.641 (0.631–0.651) | 0.636 (0.626–0.646) | |
| Sensitivity | 0.591 (0.576–0.605) | 0.533 (0.518–0.547) | |
| Specificity | 0.700 (0.686–0.714) | 0.754 (0.741–0.768) | |
| PPV | 0.694 (0.680–0.708) | 0.714 (0.699–0.729) | |
| NPV | 0.597 (0.583–0.611) | 0.583 (0.570–0.597) | |
Note: CI: confidence interval; AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value.
Figure 3Performance and evaluation of random forest model in the training set. (a) Receiver operator characteristic (ROC) curves; (b) calibration curves.
Figure 4Performance and evaluation of random forest model in the testing set. (a) Receiver operator characteristic (ROC) curves; (b) calibration curves.
The performance of the random forest models in age and marital status subgroups.
| Subgroup | Parameter (95% CI) | All-variable models |
|---|---|---|
|
| ||
| AUC | 0.681(0.681–0.682) | |
| Accuracy | 0.647(0.634–0.660) | |
| Sensitivity | 0.725(0.710–0.741) | |
| Specificity | 0.511(0.489–0.533) | |
| PPV | 0.721(0.705–0.736) | |
| NPV | 0.517(0.495–0.540) | |
|
| ||
|
| ||
| AUC | 0.614(0.613–0.614) | |
| Accuracy | 0.621(0.604–0.637) | |
| Sensitivity | 0.030(0.021–0.039) | |
| Specificity | 0.986(0.981–0.991) | |
| PPV | 0.565(0.448–0.682) | |
| NPV | 0.622(0.605–0.638) | |
|
| ||
|
| ||
| AUC | 0.662(0.661–0.663) | |
| Accuracy | 0.583(0.553–0.613) | |
| Sensitivity | 0.367(0.326–0.408) | |
| Specificity | 0.817(0.783–0.851) | |
| PPV | 0.684(0.630–0.738) | |
| NPV | 0.544(0.508–0.580) | |
|
| ||
|
| ||
| AUC | 0.693(0.693–0.693) | |
| Accuracy | 0.636(0.623–0.648) | |
| Sensitivity | 0.461(0.442–0.480) | |
| Specificity | 0.807(0.793–0.822) | |
| PPV | 0.701(0.680–0.723) | |
| NPV | 0.604(0.588–0.620) | |
|
| ||
|
| ||
| AUC | 0.695(0.694–0.696) | |
| Accuracy | 0.724(0.699–0.748) | |
| Sensitivity | 0.886(0.865–0.907) | |
| Specificity | 0.305(0.257–0.353) | |
| PPV | 0.767(0.741–0.792) | |
| NPV | 0.509(0.442–0.577) | |
|
| ||
|
| ||
| AUC | 0.642(0.641–0.644) | |
| Accuracy | 0.583(0.550–0.615) | |
| Sensitivity | 0.433(0.389–0.477) | |
| Specificity | 0.756(0.715–0.797) | |
| PPV | 0.673(0.621–0.725) | |
| NPV | 0.535(0.494–0.575) | |