| Literature DB >> 29518029 |
Lauren P Grant1, Chris Gennings2, Edmond P Wickham3, Derek Chapman4, Shumei Sun5, David C Wheeler6.
Abstract
In public health research, it has been well established that geographic location plays an important role in influencing health outcomes. In recent years, there has been an increased emphasis on the impact of neighborhood or contextual factors as potential risk factors for childhood obesity. Some neighborhood factors relevant to childhood obesity include access to food sources, access to recreational facilities, neighborhood safety, and socioeconomic status (SES) variables. It is common for neighborhood or area-level variables to be available at multiple spatial scales (SS) or geographic units, such as the census block group and census tract, and selection of the spatial scale for area-level variables can be considered as a model selection problem. In this paper, we model the variation in body mass index (BMI) in a study of pediatric patients of the Virginia Commonwealth University (VCU) Medical Center, while considering the selection of spatial scale for a set of neighborhood-level variables available at multiple spatial scales using four recently proposed spatial scale selection algorithms: SS forward stepwise regression, SS incremental forward stagewise regression, SS least angle regression (LARS), and SS lasso. For pediatric BMI, we found evidence of significant positive associations with visit age and black race at the individual level, percent Hispanic white at the census block group level, percent Hispanic black at the census tract level, and percent vacant housing at the census tract level. We also found significant negative associations with population density at the census tract level, median household income at the census tract level, percent renter at the census tract level, and exercise equipment expenditures at the census block group level. The SS algorithms selected covariates at different spatial scales, producing better goodness-of-fit in comparison to traditional models, where all area-level covariates were modeled at the same scale. These findings underscore the importance of considering spatial scale when performing model selection.Entities:
Keywords: body mass index; lasso; model selection; obesity; spatial scale
Mesh:
Year: 2018 PMID: 29518029 PMCID: PMC5877018 DOI: 10.3390/ijerph15030473
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Variables available for spatial scale selection algorithms. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables available at more than one spatial scale. CBG: census block group, CBK: census block, CT: census tract, VCU: Virginia Commonwealth University.
| Variable Number | Name | Description |
|---|---|---|
| 1 | VisitAge | Age at visit in years |
| 2 | Male | Indicator variable for male |
| 3 | Black | Indicator variable for black |
| 4 | MCVdist | Distance to VCU Medical Center (miles) |
| 5–7 | POPDENS | Population density (people/square mile) in 2010 CBK/CBG/CT |
| 8–9 | PBLACK | Percent of population that is black in CBG/CT |
| 10–11 | PHWHITE | Percent of population that is Hispanic white in CBG/CT |
| 12–13 | PHBLACK | Percent of population that is Hispanic black in CBG/CT |
| 14–15 | MEDHINC | Median household income in CBG/CT |
| 16–17 | PRENTER | Percent of households that are rented in CBG/CT |
| 18–19 | PVACANT | Percent of households that are vacant in CBG/CT |
| 20–21 | CRMCYTOTC | Total crime index in CBG/CT |
| 22–23 | CRMCYPERC | Crimes against persons index in CBG/CT |
| 24–25 | CRMCYPROC | Property crime index in CBG/CT |
| 26–27 | PARKDENS | Park density in CBG/CT |
| 28–29 | RESTDENS | Restaurant density in CBG/CT |
| 30–31 | EX_EQ | Expenditures per capita spent on sports/exercise equipment in CBG/CT |
Figure 1Average body mass index (BMI) z-score (BMIZ) by census tract among study patients in the Richmond metropolitan statistical area.
Figure 2Coefficient paths for spatial scale forward stepwise regression to explain BMI z-scores. The scale at which each covariate entered the model is indicated by the legend. The numbers on the right-hand side of the figure are the variable numbers listed in Table 1.
Figure 3Coefficient paths for spatial scale incremental forward stagewise regression to explain BMI z-scores. The scale at which each covariate entered the model is indicated by the legend. The numbers on the right-hand side of the figure are the variable numbers listed in Table 1.
Figure 4Coefficient paths for spatial scale least angle regression (LARS)/lasso to explain BMI z-scores. The scale at which each covariate entered the model is indicated by the legend. The numbers on the right-hand side of the figure are the variable numbers listed in Table 1. The dotted vertical line indicates the chosen model that had the minimum ordinary least squares (OLS)-based Akaike’s information criterion (AIC).
Number of covariates selected at the individual-level and at each spatial scale (SS) by SS forward stepwise, forward stagewise, and LARS/lasso models. The last row lists the total number of possible variables at each data scale.
| Individual-Level | Area-Level | Number Selected | |||
|---|---|---|---|---|---|
| CBK | CBG | CT | |||
| SS Stepwise | 3 | 0 | 2 | 7 | 12 |
| SS Stagewise | 3 | 0 | 3 | 5 | 11 |
| SS LARS/Lasso | 3 | 0 | 3 | 7 | 13 |
| No. of available variables | 4 | 13 | 17 | ||
Standardized coefficient estimates from spatial scale (SS) forward stepwise, forward stagewise, LARS, and lasso models of BMIZ. The blank cells indicate variables not selected for a particular model. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables considered at multiple spatial scales. The suffix CT or CBG indicates the spatial scale selected for the variable.
| Explanatory Variable | SS Stepwise | SS Stagewise | SS LARS/Lasso |
|---|---|---|---|
| Visit Age | 0.080 (*) | 0.071 (*) | 0.077 (*) |
| Black | 0.043 (*) | 0.030 (*) | 0.039 (*) |
| Distance to Medical Center | 0.013 (+) | 0.011 (*) | 0.013 (+) |
| Population Density_CT | −0.033 (*) | −0.035 (*) | −0.034 (*) |
| % Hispanic White_CBG | 0.023 (*) | 0.029 (*) | |
| % Hispanic White_CT | 0.034 (*) | ||
| % Hispanic Black_CT | 0.022 (*) | 0.011 (*) | 0.020 (*) |
| Median Household Income_CT | −0.058 (*) | −0.035 (*) | −0.052 (*) |
| % Renter_CT | −0.035 (*) | −0.023 (*) | |
| % Vacant_CT | 0.016 (*) | 0.010 (*) | 0.013 (+) |
| Personal Crime Index_CBG | −0.021 (*) | −0.004 | −0.010 |
| Property Crime Index_CT | −0.006 | −0.010 | |
| Park Density_CT | 0.011 (+) | 0.008 (+) | |
| Exercise Equipment_CBG | −0.040 (*) | −0.038 (*) | −0.035 (*) |
Notes: Values marked with (*) have a p-value < 0.05, and values marked with (+) have an associated p-value < 0.1 (when covariates selected by the SS stagewise and SS LARS/lasso algorithms were plugged into OLS regression models).
OLS-based Akaike’s information criterion comparisons across spatial scale (SS) forward stepwise, forward stagewise, and LARS/lasso models.
| SS Stepwise | SS Stagewise | SS LARS/Lasso | |
|---|---|---|---|
| Model 1: CBG | 77,666 | 77,673 | 77,668 |
| Model 2: CT | 77,658 | 77,668 | 77,657 |
| Model 3: CBG and CT | 77,648 | 77,657 | 77,647 |
OLS coefficient estimates for three models based on covariates that were selected by the spatial scale (SS) forward stepwise model of BMI z-score: (1) when constraining all selected area-level covariates to enter at the CBG level; (2) when constraining all selected area-level covariates to enter at the CT level; and (3) when allowing all selected area-level covariates to enter at the model-selected spatial scale. The horizontal dashed line separates the individual-level variables and the area-level variables.
| Explanatory Variable | CBG | CT | SS |
|---|---|---|---|
| Visit Age | 0.080 | 0.080 | 0.080 |
| Black | 0.045 | 0.044 | 0.043 |
| Distance to Medical Center | 0.017 | 0.014 | 0.013 |
| Population Density | −0.033 | −0.033 | −0.033 t |
| % Hispanic White | 0.035 | 0.035 | 0.034 t |
| % Hispanic Black | 0.020 | 0.021 | 0.022 t |
| Median Household Income | −0.043 | −0.063 | −0.058 t |
| % Renter | −0.028 | −0.035 | −0.035 t |
| % Vacant | 0.017 | 0.015 | 0.016 t |
| Personal Crime Index | −0.020 | −0.017 | −0.021 b |
| Park Density | 0.007 | 0.011 | 0.011 t |
| Exercise Equipment | −0.047 | −0.032 | −0.040 b |
Notes: Values marked with b denote area-level covariates selected at the CBG level, and values marked with t denote area-level covariates selected at the CT level.
OLS coefficient estimates for three models based on covariates that were selected by the spatial scale (SS) forward stagewise model of BMI z-score: (1) when constraining all selected area-level covariates to enter at the CBG level; (2) when constraining all selected area-level covariates to enter at the CT level; and (3) when allowing all selected area-level covariates to enter at the model-selected spatial scale. The horizontal dashed line separates the individual-level variables and the area-level variables.
| Variable | CBG | CT | SS |
|---|---|---|---|
| Visit Age | 0.081 | 0.081 | 0.081 |
| Black | 0.044 | 0.042 | 0.042 |
| Distance to Medical Center | 0.017 | 0.014 | 0.014 |
| Population Density | −0.043 | −0.048 | −0.047 t |
| % Hispanic White | 0.035 | 0.032 | 0.031 b |
| % Hispanic Black | 0.014 | 0.016 | 0.019 t |
| Median Household Income | −0.028 | −0.046 | −0.044 t |
| % Vacant | 0.018 | 0.017 | 0.017 t |
| Personal Crime Index | −0.019 | −0.010 | −0.014 b |
| Property Crime Index | −0.001 | −0.011 | −0.009 t |
| Exercise Equipment | −0.050 | −0.032 | −0.036 b |
Notes: Values marked with b denote area-level covariates selected at the CBG level, and values marked with t denote area-level covariates selected at the CT level.
OLS coefficient estimates for three models based on covariates that were selected by the spatial scale (SS) LARS/lasso model of BMI z-score: (1) when constraining all selected area-level covariates to enter at the CBG level; (2) when constraining all selected area-level covariates to enter at the CT level; and (3) when allowing all selected area-level covariates to enter at the model-selected spatial scale. The horizontal dashed line separates the individual-level variables and the area-level variables.
| Explanatory Variable | CBG | CT | SS |
|---|---|---|---|
| Visit Age | 0.080 | 0.080 | 0.081 |
| Black | 0.045 | 0.044 | 0.043 |
| Distance to Medical Center | 0.017 | 0.014 | 0.014 |
| Population Density | −0.033 | −0.033 | −0.033 t |
| % Hispanic White | 0.035 | 0.034 | 0.033 b |
| % Hispanic Black | 0.020 | 0.022 | 0.025 t |
| Median Household Income | −0.044 | −0.065 | −0.061 t |
| % Renter | −0.028 | −0.037 | −0.035 t |
| % Vacant | 0.017 | 0.014 | 0.015 t |
| Personal Crime Index | −0.018 | −0.007 | −0.013 b |
| Property Crime Index | −0.002 | −0.014 | −0.012 t |
| Park Density | 0.007 | 0.011 | 0.012 t |
| Exercise Equipment | −0.047 | −0.029 | −0.034 b |
Notes: Values marked with b denote area-level covariates selected at the CBG level, and values marked with t denote area-level covariates selected at the CT level.
Standardized coefficient estimates from the spatial scale (SS) forward stepwise model of BMI z-scores with no random effect (RE) in comparison to standardized coefficient estimates from three mixed models with a RE at the census block group (CBG), the census tract (CT), and the CBG and CT. The mixed models were fit using the covariates that were chosen by the SS stepwise algorithm. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables considered at multiple spatial scales.
| Explanatory Variable | No RE | RE at CBG | RE at CT | RE at CBG and CT |
|---|---|---|---|---|
| Visit Age | 0.080 (*) | 0.081 (*) | 0.081 (*) | 0.081 (*) |
| Black | 0.043 (*) | 0.042 (*) | 0.042 (*) | 0.042 (*) |
| Distance to Medical Center | 0.013 (+) | 0.014 (+) | 0.013 (+) | 0.014 (+) |
| Population Density_CT | −0.033 (*) | −0.028 (*) | −0.028 (*) | −0.027 (*) |
| % Hispanic White_CT | 0.034 (*) | 0.032 (*) | 0.032 (*) | 0.032 (*) |
| % Hispanic Black_CT | 0.022 (*) | 0.016 (+) | 0.019 (*) | 0.017 (+) |
| Median Household Income_CT | −0.058 (*) | −0.052 (*) | −0.053 (*) | −0.052 (*) |
| % Renter_CT | −0.035 (*) | −0.031 (*) | −0.030 (*) | −0.030 (*) |
| % Vacant_CT | 0.016 (*) | 0.016 (+) | 0.018 (*) | 0.017 (*) |
| Personal Crime Index_CBG | −0.021 (*) | −0.021 (*) | −0.024 (*) | −0.022 (*) |
| Park Density_CT | 0.011 (+) | 0.010 | 0.010 | 0.009 |
| Exercise Equipment_CBG | −0.040 (*) | −0.046 (*) | −0.043 (*) | −0.046 (*) |
Notes: Values marked with (*) have a p-value < 0.05, and values marked with (+) have an associated p-value < 0.1.
Standardized coefficient estimates from the spatial scale (SS) forward stagewise model of BMI z-scores with no random effect (RE) in comparison to standardized coefficient estimates from three mixed models with a RE at the census block group (CBG), the census tract (CT), and the CBG and CT. The mixed models were fit using the covariates that were chosen by the SS stagewise algorithm. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables considered at multiple spatial scales.
| Explanatory Variable | No RE | RE at CBG | RE at CT | RE at CBG and CT |
|---|---|---|---|---|
| Visit Age | 0.071 (*) | 0.081 (*) | 0.081 (*) | 0.081 (*) |
| Black | 0.030 (*) | 0.041 (*) | 0.041 (*) | 0.041 (*) |
| Distance to Medical Center | 0.011 (*) | 0.015 (+) | 0.014 (+) | 0.014 (+) |
| Population Density_CT | −0.035 (*) | −0.039 (*) | −0.038 (*) | −0.037 (*) |
| % Hispanic White_CBG | 0.023 (*) | 0.029 (*) | 0.030 (*) | 0.029 (*) |
| % Hispanic Black_CT | 0.011 (*) | 0.013 | 0.016 (+) | 0.013 |
| Median Household Income_CT | −0.035 (*) | −0.040 (*) | −0.042 (*) | −0.040 (*) |
| % Vacant_CT | 0.010 (*) | 0.017 (*) | 0.019 (*) | 0.018 (*) |
| Personal Crime Index_CBG | −0.004 | −0.015 | −0.018 | −0.016 |
| Property Crime Index_CT | −0.006 | −0.010 | −0.009 | −0.009 |
| Exercise Equipment_CBG | −0.038 (*) | −0.044 (*) | −0.040 (*) | −0.044 (*) |
Notes: Values marked with (*) have a p-value < 0.05, and values marked with (+) have an associated p-value < 0.1. For the SS stagewise model with no RE, covariates selected by the SS stagewise algorithm were plugged into an OLS regression model to obtain estimated p-values.
Standardized coefficient estimates from the spatial scale (SS) LARS/lasso model of BMI z-scores with no random effect (RE) in comparison to standardized coefficient estimates from three mixed models with a RE at the census block group (CBG), the census tract (CT), and the CBG and CT. The mixed models were fit using the covariates that were chosen by the SS LARS/lasso algorithm. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables considered at multiple spatial scales.
| Explanatory Variable | No RE | RE at CBG | RE at CT | RE at CBG and CT |
|---|---|---|---|---|
| Visit Age | 0.077 (*) | 0.081 (*) | 0.081 (*) | 0.081 (*) |
| Black | 0.039 (*) | 0.042 (*) | 0.042 (*) | 0.042 (*) |
| Distance to Medical Center | 0.013 (+) | 0.014 (+) | 0.014 (+) | 0.014 (+) |
| Population Density_CT | −0.034 (*) | −0.028 (*) | −0.028 (*) | −0.027 (*) |
| % Hispanic White_CBG | 0.029 (*) | 0.031 (*) | 0.031 (*) | 0.031 (*) |
| % Hispanic Black_CT | 0.020 (*) | 0.020 (*) | 0.022 (*) | 0.020 (*) |
| Median Household Income_CT | −0.052 (*) | −0.055 (*) | −0.056 (*) | −0.055 (*) |
| % Renter_CT | −0.023 (*) | −0.031 (*) | −0.030 (*) | −0.030 (*) |
| % Vacant_CT | 0.013 (+) | 0.015 (+) | 0.016 (*) | 0.015 (+) |
| Personal Crime Index_CBG | −0.010 | −0.013 | −0.016 | −0.014 |
| Property Crime Index_CT | −0.010 | −0.012 | −0.011 | −0.012 |
| Park Density_CT | 0.008 (+) | 0.010 | 0.010 | 0.010 |
| Exercise Equipment_CBG | −0.035 (*) | −0.041 (*) | −0.038 (*) | −0.041 (*) |
Notes: Values marked with (*) have a p-value < 0.05, and values marked with (+) have an associated p-value < 0.1. For the SS LARS/lasso model with no RE, covariates selected by the SS LARS/lasso algorithm were plugged into an OLS regression model to obtain estimated p-values.
AIC comparisons among spatial scale (SS) forward stepwise, forward stagewise, and LARS/lasso models with varying random effects (RE): no RE, RE at CBG, RE at CT, and RE at CBG and CT.
| SS Stepwise | SS Stagewise | SS LARS/Lasso | |
|---|---|---|---|
| No RE | 77,648 | 77,657 | 77,647 |
| RE at CBG | 77,636 | 77,640 | 77,635 |
| RE at CT | 77,640 | 77,643 | 77,639 |
| RE at CBG and CT | 77,637 | 77,640 | 77,638 |
Standardized coefficient estimates when the covariate male, select interaction terms, and covariates selected by the spatial scale (SS) forward stepwise, forward stagewise, and LARS/lasso algorithms were plugged into OLS regression models of BMI z-scores. The blank cells indicate variables not selected for a particular model. The horizontal dashed line separates the individual-level variables and the neighborhood-level variables considered at multiple spatial scales.
| Explanatory Variable | SS Stepwise | SS Stagewise | SS LARS/Lasso |
|---|---|---|---|
| Intercept | 0.584 (*) | 0.587 (*) | 0.584 (*) |
| Visit Age | 0.161 (*) | 0.163 (*) | 0.161 (*) |
| Male | −0.006 | −0.006 | −0.006 |
| Black | 0.084 (*) | 0.085 (*) | 0.085 (*) |
| Distance to Medical Center | 0.023 | 0.023 | 0.023 |
| Population Density_CT | −0.038 (*) | −0.052 (*) | −0.038 (*) |
| % Hispanic White_CBG | 0.038 (*) | 0.041 (*) | |
| % Hispanic White_CT | 0.040 (*) | ||
| % Hispanic Black_CT | 0.039 (*) | 0.035 (*) | 0.042 (*) |
| Median Household Income_CT | −0.070 (*) | −0.053 (+) | −0.075 (*) |
| % Renter_CT | −0.034 | −0.035 | |
| % Vacant_CT | 0.043 (*) | 0.043 (*) | 0.041 (*) |
| Personal Crime Index_CBG | −0.039 (*) | −0.017 | −0.024 |
| Property Crime Index_CT | −0.021 | −0.021 | |
| Park Density_CT | 0.010 | 0.011 | |
| Exercise Equipment_CBG | −0.078 (*) | −0.071 (*) | −0.071 (*) |
| Male:MHI_CT | 0.097 (*) | 0.097 (*) | 0.097 (*) |
| Black:Population Density_CT | −0.073 (*) | −0.062 (*) | −0.073 (*) |
| Black:Park Density_CT | 0.079 (*) | 0.078 (*) | |
| Black:Exercise Equipment_CBG | 0.167 (*) | 0.185 (*) | 0.167 (*) |
Notes: Values marked with (*) have a p-value < 0.05, and values marked with (+) have an associated p-value < 0.1.
AIC comparisons among SS forward stepwise-, SS forward stagewise, and SS LARS/lasso-based models with interactions and varying random effects (RE): no RE, RE at CBG, RE at CT, and RE at CBG and CT.
| SS Stepwise | SS Stagewise | SS LARS/Lasso | |
|---|---|---|---|
| No RE | 77,128 | 77,137 | 77,127 |
| RE at CBG | 77,123 | 77,131 | 77,123 |
| RE at CT | 77,126 | 77,133 | 77,125 |
| RE at CBG and CT | 77,125 | 77,132 | 77,124 |