| Literature DB >> 35774578 |
Hui Xu1, Wei Pan2, Meng Xin3, Wulin Pan4, Cheng Hu4, Dai Wanqiang4, Ge Huang4.
Abstract
The Healthy China Strategy puts realistic demands for residents' health levels, but the reality is that various factors can affect health. In order to clarify which factors have a great impact on residents' health, based on China's provincial panel data from 2011 to 2018, this paper selects 17 characteristic variables from the three levels of economy, environment, and society and uses the XG boost algorithm and Random forest algorithm based on recursive feature elimination to determine the influencing variables. The results show that at the economic level, the number of industrial enterprises above designated size, industrial added value, population density, and per capita GDP have a greater impact on the health of residents. At the environmental level, coal consumption, energy consumption, total wastewater discharge, and solid waste discharge have a greater impact on the health level of residents. Therefore, the Chinese government should formulate targeted measures at both economic and environmental levels, which is of great significance to realizing the Healthy China strategy.Entities:
Keywords: economic factors; environmental factors; machine learning; residents' health; social factors
Mesh:
Year: 2022 PMID: 35774578 PMCID: PMC9237364 DOI: 10.3389/fpubh.2022.896635
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Corresponding codes of characteristic variables.
|
|
|
|---|---|
| Imports and exports as a percentage of GDP | x1 |
| The level of urbanization | x2 |
| Number of industrial enterprises above the scale | x3 |
| Industrial value added | x4 |
| Population density | x5 |
| Carbon trading rights | x6 |
| Coal consumption | x7 |
| Energy consumption | x8 |
| Annual per capita health expenditure | x9 |
| The number of years of education per capita | x10 |
| Ratio of the number of higher education | x11 |
| Total wastewater discharge | x12 |
| Total SO2 emissions | x13 |
| Total NO emissions | x14 |
| Total particulate emissions | x15 |
| Total solid waste emissions | x16 |
| GDP per capita | x17 |
Extract and integrate literature and data related to health problems.
Classification of characteristic variables.
|
|
|
|---|---|
| Economic dimension | x1, x2, x3, x4, x5, x17 |
| Environmental dimension | x6, x7, x8, x12, x13, x14, x15, x16 |
| Social dimension | x9, x10, x11 |
Extract and integrate literature and data related to health problems.
XG-boost parameter values.
|
|
|
|---|---|
| gamma | 5 |
| max_depth | 7 |
| subsample | 0.5 |
| eta | 0.1 |
| nthread | −1 |
| num_round | 100 |
Parameter tuning during XGBoost model building.
Figure 1Feature ranking of XG-boost.
XG-boost's five-fold crossover experiment results.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Training samples (80%) | Root mean square error (RMSE) | 0.41944 | 0.420202 | 0.426395 | 0.426798 | 0.424173 |
| Test samples (20%) | RMSE | 0.57756 | 0.573808 | 0.5055197 | 0.516337 | 0.514387 |
Output of fitting results of XGBoost model in this study.
Figure 2Error variation of XGBoost model over 100 iterations.
Figure 3Comparison of XGBoost model predicted value and actual value.
Random forest parameter values.
|
|
|
|---|---|
| n_estimators | 100 |
| max_depth | 3 |
| random_state | 10 |
| min_samples_split | 2 |
| min_samples_leaf | 1 |
| criterion: | “mse” |
Parameter tuning during Random forest model building.
Figure 4Feature ranking of random forest.
Random forest's five-fold crossover experiment results.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Training samples (80%) | RMSE | 0.464821 | 0.491444 | 0.481829 | 0.4803921 | 0.470389 |
| Test samples (20%) | RMSE | 0.536169 | 0.563798 | 0.549059 | 0.542879 | 0.535904 |
Output of fitting results of Random forest model in this study.
Figure 5Error variation of random forest model over 100 iterations.
Figure 6Comparison of random forest model predicted value and actual value.