| Literature DB >> 35954527 |
Ziwei Zheng1, Yuanyu Chen1, Yongzhong Yang1, Rui Meng1, Zhikang Si1, Xuelin Wang1, Hui Wang1, Jianhui Wu1.
Abstract
The dark and humid environment of underground coal mines had a detrimental effect on workers' skeletal health. Optimal risk prediction models can protect the skeletal health of coal miners by identifying those at risk of abnormal bone density as early as possible. A total of 3695 male underground workers who attended occupational health physical examination in a coal mine in Hebei, China, from July to August 2018 were included in this study. The predictor variables were identified through single-factor analysis and literature review. Three prediction models, Logistic Regression, CNN and XG Boost, were developed to evaluate the prediction performance. The training set results showed that the sensitivity of Logistic Regression, XG Boost and CNN models was 74.687, 82.058, 70.620, the specificity was 80.986, 89.448, 91.866, the F1 scores was 0.618, 0.919, 0.740, the Brier scores was 0.153, 0.040, 0.156, and the Calibration-in-the-large was 0.104, 0.020, 0.076, respectively, XG Boost outperformed the other two models. Similar results were obtained for the test set and validation set. A two-by-two comparison of the area under the ROC curve (AUC) of the three models showed that the XG Boost model had the best prediction performance. The XG Boost model had a high application value and outperformed the CNN and Logistic regression models in prediction.Entities:
Keywords: XG Boost; bone density abnormalities; male underground coal mine workers
Mesh:
Substances:
Year: 2022 PMID: 35954527 PMCID: PMC9368504 DOI: 10.3390/ijerph19159165
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Model evaluation indexes.
| Indicators | Meaning |
|---|---|
| Sensitivity | The percentage of study participants who actually had BMD and were accurately determined to have BMD by the risk prediction model. |
| Specificity | The percentage of study participants who did not actually have BMD and were accurately determined to not have BMD by the risk prediction model. |
| Youden index | Correctness Index, the model correctly determined the total capacity of BMD patients and non-patients. |
| F1 score | The adjusted mean values of precision and recall, used to evaluate the comprehensive performance of the model. |
| AUC | Area under the ROC curves. |
| Brier score | The quantitative score of the model calibration, ranging from 0 to 0.25, the smaller the value, the better the calibration of the model. |
| Log loss | The error between the true value of the response and the predicted value of the model. |
| Calibration-in-the-large | The intercept of the calibration curve. |
General situation of male workers in coal mines.
| General Information | Category | Number | Abnormal Bone Mineral Density | χ2/H(K) |
| |
|---|---|---|---|---|---|---|
| Number | Prevalence Rate (%) | |||||
| Age | <30 | 419 | 30 | 7.160 | 447.518 * | <0.001 |
| 30~ | 1682 | 303 | 18.014 | |||
| 40~ | 945 | 354 | 37.460 | |||
| 50~ | 649 | 357 | 55.008 | |||
| Education level | Junior secondary school or lower | 1647 | 507 | 30.783 | 14.956 | 0.001 |
| High school and secondary school | 1083 | 308 | 28.440 | |||
| College and above | 965 | 229 | 23.731 | |||
| BMI (kg/m2) | ≤23.9 | 1321 | 516 | 39.061 | 180.677 * | <0.001 |
| 24.0~ | 1470 | 411 | 27.959 | |||
| 28.0~ | 904 | 117 | 12.942 | |||
| Marital Status | Unmarried | 126 | 24 | 19.048 | 6.312 | 0.043 |
| Married | 3445 | 980 | 28.447 | |||
| Other | 124 | 40 | 32.258 | |||
| Family per capita monthly income (Yuan) | <1000 | 432 | 124 | 28.704 | 4.526 * | 0.104 |
| 1000~ | 2947 | 847 | 28.741 | |||
| 3000~ | 316 | 73 | 23.101 | |||
| Hypertension | No | 2379 | 648 | 27.238 | 3.402 | 0.065 |
| Yes | 1316 | 396 | 30.091 | |||
| Diabetes | No | 3531 | 990 | 28.037 | 1.848 | 0.174 |
| Yes | 164 | 54 | 32.927 | |||
| Dyslipidemia | No | 2685 | 748 | 27.858 | 0.760 | 0.383 |
| Yes | 1010 | 296 | 29.307 | |||
| Fracture | No | 2817 | 694 | 24.636 | 76.564 | <0.001 |
| Yes | 878 | 350 | 39.863 | |||
| Smoking status | No smoking | 1460 | 301 | 20.616 | 69.733 | <0.001 |
| Quit smoking | 245 | 78 | 31.837 | |||
| smoking | 1990 | 665 | 33.417 | |||
| Drinking status | No drinking | 724 | 148 | 20.442 | 32.053 | <0.001 |
| Alcohol withdrawal | 164 | 37 | 22.561 | |||
| Drinking | 2807 | 859 | 30.602 | |||
| Exercise | No | 1574 | 501 | 31.830 | 17.291 | <0.001 |
| Yes | 2121 | 543 | 25.601 | |||
| Sleep time (h) | <7 | 1099 | 419 | 38.126 | 136.080 * | <0.001 |
| 7~ | 1236 | 387 | 31.311 | |||
| 8~ | 1360 | 238 | 17.500 | |||
* The K-W test was used for ordinal data.
Analysis of occupational exposure characteristics of male workers in coal underground.
| General Information | Category | Number | Abnormal Bone Mineral Density | χ2/H(K) |
| |
|---|---|---|---|---|---|---|
| Number | Prevalence Rate (%) | |||||
| Working ages | <10 | 1089 | 216 | 19.835 | 122.264 * | <0.001 |
| 10~ | 1652 | 434 | 26.271 | |||
| 20~ | 539 | 216 | 40.074 | |||
| 30~ | 415 | 178 | 42.892 | |||
| Shift situations | Never | 1341 | 204 | 15.213 | 254.064 | <0.001 |
| Once | 547 | 114 | 20.841 | |||
| Now | 1807 | 726 | 40.177 | |||
| Shift length | 0 | 1341 | 204 | 15.213 | 228.767 * | <0.001 |
| <10 | 1098 | 384 | 34.973 | |||
| 10~ | 867 | 262 | 30.219 | |||
| 20~ | 229 | 111 | 48.472 | |||
| 30~ | 160 | 83 | 51.875 | |||
| High intensity work | No | 1502 | 356 | 23.702 | 25.876 | <0.001 |
| Yes | 2193 | 688 | 31.373 | |||
| Medium intensity work | No | 437 | 70 | 16.018 | 36.606 | <0.001 |
| Yes | 3258 | 974 | 29.896 | |||
* The K-W test was used for ordinal data.
Assignment table for variables.
| Variable | Variable Meaning | Assignment Method |
|---|---|---|
| Y | Bone mineral density | 1 = normal, 2 = abnormal |
| X1 | Age | 1 = <30; 2 = 30~; 3 = 40~; 4 = ≥50 |
| X2 | Educational level | 1 = Junior secondary school or lower; 2 = High school and secondary school; 3 = College and above |
| X3 | BMI (kg/m2) | 1= ≤23.9; 2 = 24.0~; 3 = ≥28.0 |
| X4 | Marital Status | 1 = Unmarried; 2 = Married; 3 = Other |
| X5 | Hypertension | 1 = No; 2 = Yes |
| X6 | Diabetes | 1 = No; 2 = Yes |
| X7 | Fracture | 1 = No; 2 = Yes |
| X8 | Smoking status | 1 = No smoking; 2 = Quit smoking; 3 = smoking |
| X9 | Drinking status | 1 = No drinking; 2 = Alcohol withdrawal; 3 = Drinking |
| X10 | Exercise | 1 = No; 2 = Yes |
| X11 | Sleep time (h) | 1 = <7; 2 = 7~; 3 = ≥8 |
| X12 | Working age | 1 = <10; 2 = 10~; 3 = 20~; 4 = ≥30 |
| X13 | Shift situation | 1 = Never; 2 = Once; 3 = Now |
| X14 | Shift length | 1 = 0; 2 = <10; 3 = 10~; 4 = 20~; 5 = ≥30 |
| X15 | High intensity work | 1 = No, 2 = Yes |
| X16 | Medium intensity work | 1 = No, 2 = Yes |
Multicollinearity of independent variables.
| Variable | Tolerance | VIF |
|---|---|---|
| Age | 0.402 | 2.489 |
| Educational level | 0.806 | 1.241 |
| BMI (kg/m2) | 0.879 | 1.138 |
| Marital Status | 0.945 | 1.058 |
| Hypertension | 0.899 | 1.112 |
| Diabetes | 0.937 | 1.067 |
| Fracture | 0.985 | 1.016 |
| Smoking status | 0.938 | 1.066 |
| Drinking status | 0.962 | 1.039 |
| Exercise | 0.932 | 1.073 |
| Sleep time (h) | 0.930 | 1.075 |
| Working age | 0.331 | 3.021 |
| Shift situation | 0.324 | 3.090 |
| Shift length | 0.279 | 3.589 |
| High intensity work | 0.890 | 1.124 |
| Medium intensity work | 0.908 | 1.101 |
Logistic regression analysis of the influencing factors of abnormal bone mineral density.
| Variable | B | S.E | Wald |
| OR | 95% CI for OR | |
|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||
| Age | |||||||
| <30 | 353.284 | <0.001 | |||||
| 30~ | 2.272 | 0.261 | 75.697 | <0.001 | 9.699 | 5.814 | 16.182 |
| 40~ | 4.145 | 0.299 | 191.881 | <0.001 | 63.096 | 35.101 | 113.419 |
| ≥50 | 5.779 | 0.349 | 273.603 | <0.001 | 323.510 | 163.112 | 641.635 |
| Education level | |||||||
| College and above | 32.054 | <0.001 | |||||
| High school and secondary school | 0.222 | 0.117 | 3.594 | 0.058 | 1.249 | 0.993 | 1.571 |
| Junior secondary school or lower | 0.775 | 0.137 | 32.003 | <0.001 | 2.171 | 1.660 | 2.841 |
| BMI (kg/m2) | |||||||
| ≤23.9 | 353.668 | <0.001 | |||||
| 24.0~ | −1.784 | 0.124 | 207.232 | <0.001 | 0.168 | 0.132 | 0.214 |
| ≥28.0 | −2.978 | 0.165 | 324.298 | <0.001 | 0.051 | 0.037 | 0.070 |
| Hypertension | 0.243 | 0.105 | 5.327 | 0.021 | 1.275 | 1.037 | 1.567 |
| Diabetes | 0.502 | 0.225 | 4.990 | 0.025 | 1.652 | 1.064 | 2.567 |
| Fractures | 0.736 | 0.109 | 45.982 | <0.001 | 2.087 | 1.687 | 2.582 |
| Smoking status | |||||||
| No smoking | 37.743 | <0.001 | |||||
| Quit smoking | 0.601 | 0.194 | 9.620 | 0.002 | 1.825 | 1.248 | 2.668 |
| Smoking | 0.646 | 0.107 | 36.522 | <0.001 | 1.908 | 1.547 | 2.353 |
| Drinking status | |||||||
| No drinking | 43.725 | <0.001 | |||||
| Alcohol withdrawal | −0.136 | 0.277 | 0.243 | 0.622 | 0.872 | 0.507 | 1.501 |
| Drinking | 0.780 | 0.132 | 34.883 | <0.001 | 2.182 | 1.684 | 2.827 |
| Exercise | −0.322 | 0.100 | 10.294 | 0.001 | 0.725 | 0.595 | 0.882 |
| Sleep time (h) | |||||||
| <7 | 89.013 | <0.001 | |||||
| 7~ | −0.242 | 0.114 | 4.506 | 0.034 | 0.785 | 0.628 | 0.982 |
| ≥8 | −1.159 | 0.128 | 82.117 | <0.001 | 0.314 | 0.244 | 0.403 |
| Shift situation | |||||||
| Never | 181.498 | <0.001 | |||||
| Once | −0.663 | 0.319 | 4.314 | 0.038 | 0.516 | 0.276 | 0.963 |
| Now | 1.356 | 0.271 | 25.079 | <0.001 | 3.879 | 2.282 | 6.593 |
| High intensity work | 0.600 | 0.107 | 31.590 | <0.001 | 1.822 | 1.478 | 2.245 |
| Medium intensity work | 1.020 | 0.176 | 33.715 | <0.001 | 2.774 | 1.966 | 3.915 |
| Constant quantity | −6.318 | 0.430 | 215.395 | <0.001 | 0.002 | - | - |
Evaluation of three risk models.
| Evaluation Indicator | Training Set | Test Set | Validation Set | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Logistic | XG Boost | CNN | Logistic | XG Boost | CNN | Logistic | XG Boost | CNN | |
| Sensitivity (%) | 74.687 | 82.058 | 70.620 | 71.749 | 76.555 | 68.447 | 73.529 | 76.724 | 68.750 |
| Specificity (%) | 80.986 | 89.448 | 91.866 | 80.814 | 88.302 | 76.923 | 77.239 | 85.827 | 74.818 |
| Youden index | 0.557 | 0.715 | 0.625 | 0.526 | 0.649 | 0.454 | 0.508 | 0.626 | 0.436 |
| F1 Score | 0.618 | 0.919 | 0.740 | 0.631 | 0.753 | 0.571 | 0.583 | 0.787 | 0.600 |
| AUC (95% CI) | 0.778 | 0.858 (0.839~0.876) | 0.812 (0.792~0.833) | 0.763 (0.723~0.802) | 0.824 (0.787~0.861) | 0.727 (0.685~0.769) | 0.754 (0.696~0.811) | 0.813 (0.762~0.864) | 0.718 (0.656~0.779) |
| Brier Score | 0.153 | 0.040 | 0.156 | 0.333 | 0.107 | 0.172 | 0.153 | 0.040 | 0.156 |
| Log Loss | 0.540 | 0.147 | 0.492 | 1.124 | 0.358 | 0.538 | 0.540 | 0.147 | 0.494 |
| Calibration-in-the-large | 0.104 | 0.020 | 0.076 | 0.104 | 0.019 | 0.071 | 0.146 | 0.019 | 0.077 |
AUC comparison of three models.
| Data Set | Model | Difference Value of AUC | SE | 95% CI | χ2 |
| |
|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||
| training set | Logistic and XG Boost | 0.071 | 0.008 | 0.055 | 0.087 | 8.715 | 0.001 |
| Logistic and CNN | 0.019 | 0.007 | 0.006 | 0.032 | 2.886 | 0.004 | |
| XG Boost and CNN | 0.052 | 0.008 | 0.036 | 0.068 | 6.256 | 0.001 | |
| test set | Logistic and XG Boost | 0.074 | 0.016 | 0.042 | 0.106 | 4.545 | <0.001 |
| Logistic and CNN | 0.022 | 0.019 | −0.016 | 0.060 | 1.154 | 0.248 | |
| XG Boost and CNN | 0.096 | 0.020 | 0.058 | 0.135 | 4.923 | <0.001 | |
| validation set | Logistic and XG Boost | 0.039 | 0.025 | −0.009 | 0.088 | 1.580 | 0.114 |
| Logistic and CNN | 0.047 | 0.022 | 0.003 | 0.090 | 2.110 | 0.035 | |
| XG Boost and CNN | 0.086 | 0.020 | 0.047 | 0.125 | 4.281 | <0.001 | |
Figure 1ROC curves of three models (a) Training set; (b) Test set; (c) Validation set.
Figure 2Calibration curves of three models (a) Training set; (b) Test set; (c) Validation set.