| Literature DB >> 35668432 |
Markus Viljanen1, Lotta Meijerink2, Laurens Zwakhals2, Jan van de Kassteele2.
Abstract
BACKGROUND: Local policymakers require information about public health, housing and well-being at small geographical areas. A municipality can for example use this information to organize targeted activities with the aim of improving the well-being of their residents. Surveys are often used to gather data, but many neighborhoods can have only few or even zero respondents. In that case, estimating the status of the local population directly from survey responses is prone to be unreliable.Entities:
Keywords: Extreme gradient boosting; Health and welfare; Machine learning; Small area estimation
Mesh:
Year: 2022 PMID: 35668432 PMCID: PMC9169293 DOI: 10.1186/s12942-022-00304-5
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 5.310
Data sets used in this study
| Survey | Year | Respondents | Outcomes | Type | Population | Features |
|---|---|---|---|---|---|---|
| HeMo | 2020 | 539,895 | 34 | Binary | 13,845,474 | 14 |
| WoON | 2018 | 67,523 | 8 | Continuous | 13,510,237 | 22 |
| Noise | 2016 | 202,065 | 10 | Binary | 10,366,070 | 21 |
Subset of health-related indicators from the Public health monitor (‘HeMo’)
| Health-related indicator | Original name | Description |
|---|---|---|
| Drinker | lfala217 | Drank alcohol in the past 12 months |
| Drinker_over6gd | lfalaa213 | Drinks over 6 glasses of alcohol per day |
| Drinker_heavy | lfala213 | Drinks at least 1 times a week at least 6 (M) or 4 (F) glasses per day |
| Drinker_excess | lfals231 | Drinks alcohol in excess of recommendation |
| Drinker_excess_old | lfals230 | Drinks alcohol in excess of recommendation (old definition) |
| Drinker_under1gd | lfals232 | Drinks under 1 glass of alcohol per day |
| Weight_overweight | aggws204 | Overweight, BMI 25 or higher |
| Weight_obese | aggws205 | Obese, BMI 30 or higher |
| Weight_underweight | aggws206 | Underweight, BMI 18.5 or lower |
| Weight_healthy | aggws207 | Healthy weight, BMI 18.5–25 |
| Weight_overweight_moderate | aggws208 | Moderately overweight, BMI 25–30 |
| Smoker | lfrka205 | Smoker |
| Smoker_past | lfrka206 | Smoked in the past |
| Smoker_never | lfrka207 | Has never smoked |
| Health_reportgood | klgga208 | Considers own health as good |
| Illness_longterm | calga260 | Has an illness of duration 6 months or longer |
| Health_limited | calga264 | Is limited in daily life by problems with health |
| Health_limited_severe | calga265 | Is seriously limited in daily life by problems with health |
| Illness_longterm_limited | calga267 | Is limited in daily life by problems with health 6 months or longer |
| Disability_hearing | lgbps203 | Has a hearing disability (great difficulty with 1 of 2 OECD items) |
| Disability_vision | lgbps204 | Has a vision disability (great difficulty with 1 of 2 OECD items) |
| Disability_mobility | lgbps205 | Has a mobility disability (great difficulty with 1 of 3 OECD items) |
| Disability_any | lgbps209 | Has a hearing, vision, or mobility disability (1 of 7 OECD items) |
| Feels_lifecontrol | ggrls203 | Feels moderate or much control over their own life |
| Anxietydepression_moderate | ggada202 | Moderate or high risk of anxiety disorder or depression |
| Anxietydepression_high | ggada203 | High risk of anxiety disorder or depression |
| Exercise_guideline | ki_rlbew2017 | Complies with the 2017 exercise guideline |
| Exercise | kisporter | Core indicator of actual exercise |
| Lonely | ggees217 | Is lonely |
| Lonely_severe | ggees209 | Is seriously lonely |
| Lonely_emotional | ggees218 | Is emotionally lonely |
| Lonely_social | ggees219 | Is socially lonely |
| Volunteer | mmvwa201 | Does volunteer work |
| Difficultyfinancial_12m | mmika201 | In the past 12 months experienced difficulty with household income |
| Caregiver_informal | mcmzgs203 | Caregiver (at least 3 months and/or at least 8 hours a week) |
| Much_stress | ggsts16 | Experienced a lot of stress in the past 4 weeks |
| Severe_noise_disturb | woghba218 | Experiences severe noise disturbance by neighbors |
| Walk_to_work | wwlmwk | (Partly) walks to work at least 1 day a week |
| Bike_to_work | wwfmwk | (Partly) bikes to work at least 1 day a week |
| Walk_or_bike_to_work | wwfmwk, wwlmwk | (Partly) bikes or walks to work at least 1 day a week |
Perceived living quality ratings from the Housing survey (‘WoON’)
| Living quality rating | Original name | Description |
|---|---|---|
| Social cohesion | Cohesie | GSB indicator of social quality |
| Satisfaction house | Twoning | Satisfaction with the current home |
| Satisfaction surroundings | Twoonomg | Satisfaction with the surroundings |
| Satisfaction region | Tevrstr | Satisfaction with the region where you live |
| Bothered by neighborhood | Tvervele | It’s annoying to live in this neighborhood |
| At home in neighborhood | Brtthuis | I feel at home in this neighborhood |
| Afraid in neighborhood | Brtveilig | Afraid of being harassed or robbed in this neighborhood |
Experienced noise disturbance subset of the Public health monitor (’Noise’)
| Noise disturbance indicator | Original name | Description |
|---|---|---|
| Road_medhigh_gt50 | WOGHBA202 | Moderate or serious noise nuisance from road traffic |
| Road_high_gt50 | WOGHBA203 | Serious noise nuisance from road traffic |
| Road_medhigh_sm50 | WOGHBA205 | Moderate or serious noise nuisance from road traffic |
| Road_high_sm50 | WOGHBA206 | Serious noise nuisance from road traffic |
| Road_medhigh | WOGHBA202/205 | Moderate or serious noise nuisance from road traffic |
| Road_high | WOGHBA203/206 | Serious noise nuisance from road traffic |
| Rail_medhigh | WOGHBA208 | Moderate or serious noise nuisance from train traffic |
| Rail_high | WOGHBA209 | Serious noise nuisance from train traffic |
| Ail_medhigh | WOGHBA211 | Moderate or serious noise nuisance from air traffic |
| Ail_high | WOGHBA212 | Serious noise nuisance from air traffic |
Estimated noise level (dB) from the RIVM noise dispersion model
| Estimated noise level | Original name | Description |
|---|---|---|
| Air | lden_air | Noise from air traffic |
| Rail | lden_rail | Noise from railway traffic |
| Road | lden_wegv | Noise from road traffic |
| Road_gw | lden_wegv_gw | Noise from municipal roads |
| Road_pwrw | lden_wegv_pw, lden_wegv_rw | Noise from secondary and main motorways |
Summary of features used for prediction in HeMo and WoON
| Feature | Categories/median (min.–max.) | % missing |
|---|---|---|
| Age | 50 (18–108) | 0.0 |
| Sex | Male | 0.0 |
| Female | ||
| Ethnicity | Netherlands | 0.0 |
| Morocco | ||
| Turkey | ||
| Suriname | ||
| Netherlands Antilles | ||
| Other non-western | ||
| Other western | ||
| Marital status | Single | 0.6 |
| Married | ||
| Divorced | ||
| Widowed | ||
| Education (descriptions Table | Basis | 37.8 |
| VMBObk | ||
| VMBOgt | ||
| MBO23 | ||
| MBO4 | ||
| HAVO-VWO | ||
| HBO-WO-BAC | ||
| HBO-WO-M/PhD | ||
| Household type | Single person household | 0.0 |
| Unmarried without children | ||
| Married without children | ||
| Unmarried with children | ||
| Married with children | ||
| Single parent family | ||
| Other | ||
| Household size | 2 (1–10) | 0.0 |
| Household income source | Wage | 2.4 |
| Wage director/shareholder | ||
| Self-employed | ||
| Unemployment benefit | ||
| Social assistance benefit | ||
| Disability benefit | ||
| Old-age pension | ||
| Other benefit | ||
| Student loan | ||
| Property income | ||
| Home ownership | Homeowner | 2.0 |
| Rental no allowance | ||
| Rental with allowance | ||
| Household income (percentile) | 63 (1–100) | 2.4 |
| Household assets (percentile) | 56 (1–100) | 2.4 |
| Neighborhood address density | 71 (1–100) | 0.0 |
| 140,348 (13,666–277,711) | 0.0 | |
| 453,926 (306,922–611,538) | 0.0 |
Additional features used for prediction in WoON
| Feature | Median (min.–max.) | % missing |
|---|---|---|
| % in the neighborhood of | ||
| Uninhabited houses | 3 (0–100) | 0.2 |
| Single-family houses | 78 (0–100) | 0.2 |
| Owner-occupied houses | 62 (0–100) | 0.2 |
| Social rental houses | 26 (0–100) | 0.2 |
| Houses built before 2000 | 91 (0–100) | 0.2 |
| Distance (m) from home to closest | ||
| Forest | 1463 (0–16,271) | 0.0 |
| Backwater | 2080 (0–24,852) | 0.0 |
| Public green | 280 (0–10,139) | 0.0 |
Fig. 1Illustration of small area estimation at neighbourhood level in the Netherlands. Prevalence of “drank alcohol in the past 12 months” based on survey responses (raw estimates), XGBoost model predictions for the population (model estimates), and the percentage point difference of two models (XGBoost vs. STAR). XGBoost is based on X and Y-coordinates
XGBoost vs. STAR: percentage point difference in predicted prevalence
| Difference in prevalence | [0,0.025) | [0.025,0.05) | [0.05,0.075) | [0.075,0.1) | [0.1,0.125) |
|---|---|---|---|---|---|
| Total neighbourhoods (%) | 92.07 | 7.29 | 0.57 | 0.05 | 0.02 |
Different accuracy metrics for “drinker” indicator
| Model | Accuracy | AUC | MSE | NLL | Time |
|---|---|---|---|---|---|
| Null | 0.801 | 0.500 | 0.160 | 0.500 | |
| STAR | 0.814 | 0.737 | 0.138 | 0.437 | 28 min |
| XGBoost0_xy | 0.815 | 0.742 | 0.137 | 0.434 | 6s |
| XGBoost_xy | 3 min | ||||
| XGBoost0_ogc | 0.815 | 0.742 | 0.137 | 0.434 | 17s |
| XGBoost_ogc | 6 min |
Bold values indicated the best result, not statistical significance
Fig. 2ROC and calibration curves. The ROC curve measures discrimination by plotting the false positive and true positive rates at different threshold values. The calibration curve shows how well the predicted probabilities match the true probabilities: the mean of predicted probabilities is calculated at different quantiles of true probabilities with the diagonal line indicating a perfect fit
Fig. 3XGBoost vs. STAR predictions. We compare the predicted prevalences in each neighbourhood from the XGBoost and STAR models on a scatter plot (left). We also calculate the MSE stratified over individuals that belong to the same neighbourhood size quantile (right)
Comparison of models by MSE over all health-related indicators. XGBoost oblique coordinates vs. STAR: correlation between predictions (corr), MSE reduction in percentages (pred), training and test prediction time reduction in percentages (time)
| Indicator | Nullmodel | XGB_xy | XGB_ogc | STAR | Corr | Pred | Time |
|---|---|---|---|---|---|---|---|
| Drinker | 0.1597 | 0.1377 | 0.95 | 0.94 | 78.05 | ||
| Drinker_over6gd | 0.0510 | 0.0480 | 0.93 | 0.42 | 87.55 | ||
| Drinker_heavy | 0.0670 | 0.0645 | 0.91 | 0.47 | 89.94 | ||
| Drinker_excess | 0.1464 | 0.1409 | 0.89 | 0.50 | 79.73 | ||
| Drinker_excess_old | 0.0623 | 0.0611 | 0.85 | 0.16 | 85.96 | ||
| Drinker_under1gd | 0.2470 | 0.2106 | 0.2117 | 0.96 | 0.57 | 73.94 | |
| Weight_overweight | 0.2493 | 0.2265 | 0.2277 | 0.96 | 0.53 | 84.71 | |
| Weight_obese | 0.1296 | 0.1240 | 0.94 | 0.40 | 84.80 | ||
| Weight_underweight | 0.0135 | 0.0133 | 0.90 | 0.00 | 93.04 | ||
| Weight_healthly | 0.2484 | 0.2288 | 0.2299 | 0.95 | 0.48 | 85.35 | |
| Weight_overweight | 0.2340 | 0.2247 | 0.95 | 0.22 | 92.95 | ||
| Smoker | 0.1133 | 0.1048 | 0.1054 | 0.95 | 0.57 | 85.48 | |
| Smoker_past | 0.2441 | 0.2110 | 0.98 | 0.38 | 87.72 | ||
| Smoker_never | 0.2467 | 0.2162 | 0.96 | 0.69 | 80.51 | ||
| Health_reportgood | 0.1785 | 0.1592 | 0.97 | 0.57 | 87.26 | ||
| IIlness_longterm | 0.2322 | 0.2126 | 0.97 | 0.42 | 90.56 | ||
| Health_limited | 0.2288 | 0.1994 | 0.97 | 0.65 | 88.88 | ||
| Health_limited_severe | 0.0508 | 0.0482 | 0.94 | 0.83 | 90.42 | ||
| Illness_longterm_limit | 0.2216 | 0.1932 | 0.1945 | 0.97 | 0.67 | 87.97 | |
| Disability_hearing | 0.0457 | 0.0435 | 0.93 | 0.23 | 89.39 | ||
| Disability_vision | 0.0495 | 0.0472 | 0.95 | 0.21 | 89.99 | ||
| Disability_mobility | 0.0985 | 0.0801 | 0.98 | 0.62 | 86.37 | ||
| Disability_any | 0.1382 | 0.1159 | 0.98 | 0.52 | 85.34 | ||
| Feels_lifecontrol | 0.0892 | 0.0842 | 0.95 | 0.48 | 87.39 | ||
| Anxitydepression_mod | 0.2415 | 0.2229 | 0.2234 | 0.97 | 0.27 | 88.44 | |
| Anxitydepression_high | 0.0442 | 0.0422 | 0.95 | 0.47 | 90.18 | ||
| Exercise_guideline | 0.2483 | 0.2293 | 0.2300 | 0.94 | 0.35 | 80.00 | |
| Exercise_weekly | 0.2488 | 0.2170 | 0.95 | 0.97 | 84.73 | ||
| Lonely | 0.2471 | 0.2306 | 0.96 | 0.43 | 86.80 | ||
| Lonely_severe | 0.0841 | 0.0802 | 0.95 | 0.37 | 89.13 | ||
| Lonely_emotional | 0.2500 | 0.2305 | 0.2315 | 0.97 | 0.43 | 90.67 | |
| Lonely_social | 0.2415 | 0.2311 | 0.2317 | 0.94 | 0.26 | 88.60 | |
| Volunteer | 0.2051 | 0.1915 | 0.94 | 0.78 | 83.09 | ||
| Difficultyfinancial_12 | 0.0777 | 0.0676 | 0.95 | 2.07 | 86.25 | ||
| Caregiver_informal | 0.1318 | 0.1240 | 0.95 | 0.32 | 87.21 | ||
| Much_stress | 0.1129 | 0.1048 | 0.97 | 0.29 | 91.62 | ||
| Severe_noise_disturb | 0.0584 | 0.0574 | 0.95 | 0.35 | 96.10 | ||
| Walk_to_work | 0.1359 | 0.1257 | 0.95 | 0.48 | 91.47 | ||
| Bike_to_work | 0.1878 | 0.1602 | 0.1614 | 0.96 | 0.81 | 79.63 | |
| Walk_or_bike_work | 0.2222 | 0.1824 | 0.1840 | 0.97 | 0.92 | 84.52 |
Comparison of models by MSE over all living quality ratings
| Indicator | Nullmodel | XGB_xy | XGB_ogc | STAR | Corr | Pred | Time |
|---|---|---|---|---|---|---|---|
| Afraid_ngbh | 0.7266 | 0.6667 | 0.6676 | 0.88 | 0.31 | 99.1 | |
| Social_cohesion | 2.8648 | 2.5505 | 2.5615 | 0.91 | 0.55 | 99.03 | |
| Satisfied_region | 0.4222 | 0.4018 | 0.4001 | 0.83 | − 0.15 | 99.11 | |
| Annoyed_w_ngbh | 0.5648 | 0.5182 | 0.5203 | 0.88 | 0.4 | 99.24 | |
| Attached_to_ngbh | 1.1237 | 1.0082 | 1.0173 | 0.88 | 0.89 | 98.97 | |
| Satisfied_house | 0.6503 | 0.5389 | 0.5458 | 0.91 | 1.26 | 99.2 | |
| Satisfied_surroundings | 0.6608 | 0.6056 | 0.6054 | 0.89 | 0.13 | 99.01 | |
| At_home_in_ngbh | 0.6382 | 0.6011 | 0.6047 | 0.85 | 0.63 | 99.09 |
Comparison of models by MSE over all noise disturbance indicators based on a relevant noise measurement separated by ’-’
| Indicator | Nullmodel | XGB_xy | XGB_ogc | STAR | Corr | Pred | Time |
|---|---|---|---|---|---|---|---|
| Road_high-road | 0.0665 | 0.94 | 0.00 | 95.76 | |||
| Rail_high-rail | 0.0114 | 0.91 | 0.00 | 96.62 | |||
| Air_high-air | 0.0357 | 0.0314 | 0.96 | 0.00 | 95.88 | ||
| Road_medhigh-road | 0.2234 | 0.2060 | 0.2058 | 0.96 | 0.10 | 93.73 | |
| Rail_medhigh-rail | 0.0780 | 0.0671 | 0.0671 | 0.96 | 0.60 | 93.31 | |
| Air_medhigh-air | 0.1593 | 0.1295 | 0.98 | 0.00 | 93.42 | ||
| Road_high_gt50-road_pwrw | 0.0328 | 0.91 | 0.00 | 96.43 | |||
| Road_medhigh_gt50-road_pwrw | 0.1557 | 0.1465 | 0.1467 | 0.95 | 0.34 | 94.11 | |
| Road_high_sm50-road_gw | 0.0519 | 0.0502 | 0.92 | 0.20 | 96.73 | ||
| Road_medhigh_sm50-road_gw | 0.2045 | 0.1918 | 0.94 | 0.00 | 93.85 | ||
| Road_high_gt50-road | 0.0328 | 0.90 | 0.00 | 95.94 | |||
| Road_medhigh_gt50-road | 0.1557 | 0.1472 | 0.1467 | 0.93 | 0.14 | 92.50 | |
| Road_high_sm50-road | 0.0519 | 0.92 | 0.00 | 96.73 | |||
| Road_medhigh_sm50-road | 0.2045 | 0.1914 | 0.95 | 0.00 | 94.70 |
Fig. 4STAR model terms for “drinker” indicator.. Because a different model is fitted for each of the 25 GGD regions, we calculate the average term value over these models for a given feature value. Even though interpretation is seen as a strength of statistical models, it is quite non-trivial to interpret such a complex model. Compared to the XGBoost SHAP values below, the STAR model appears to have a similar interpretation
Fig. 5XGBoost SHAP values for “drinker” indicator.. Because SHAP values explain each individual’s prediction as a sum of the contribution of their features, we calculate the average SHAP value of these individuals for a given feature value. These have intuitive interpretations. Positive contributions to drinking are: age in early 20s, sex is man, being divorced, higher socioeconomic status. Negative contributions: being retired, sex is woman, ethnic backgrounds where Islam is the main religion, larger household size, income and assets around the lowest 25%
Description of Dutch education levels
| Education level | Description |
|---|---|
| Basis | Primary education |
| VMBObk | Lower secondary education (predominantly practical) |
| VMBOgt | Lower secondary education (predominantly theoretical) |
| MBO23 | Post-secondary vocational education (lower levels) |
| MBO4 | Post-secondary vocational education (highest level) |
| HAVO-VWO | Higher secondary education |
| HBO-WO-BAC | Undergraduate degree |
| HBO-WO-M/PhD | Graduate/doctoral degree |