| Literature DB >> 25983543 |
Lauren P Grant1, Chris Gennings2, David C Wheeler1.
Abstract
Environmental factors or socioeconomic status variables used in regression models to explain environmental chemical exposures or health outcomes are often in practice modeled at the same buffer distance or spatial scale. In this paper, we present four model selection algorithms that select the best spatial scale for each buffer-based or area-level covariate. Contamination of drinking water by nitrate is a growing problem in agricultural areas of the United States, as ingested nitrate can lead to the endogenous formation of N-nitroso compounds, which are potent carcinogens. We applied our methods to model nitrate levels in private wells in Iowa. We found that environmental variables were selected at different spatial scales and that a model allowing spatial scale to vary across covariates provided the best goodness of fit. Our methods can be applied to investigate the association between environmental risk factors available at multiple spatial scales or buffer distances and measures of disease, including cancers.Entities:
Keywords: cancer risk factors; environment; model selection; nitrate; spatial scale
Year: 2015 PMID: 25983543 PMCID: PMC4413908 DOI: 10.4137/CIN.S17302
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Variable definitions for the variables considered in the spatial scale forward stepwise, forward stagewise, LARS, and lasso models. The horizontal dashed line separates the individual-level variables and the area-based variables available at more than one buffer distance. Any variable that falls below the dashed line has a suffix indicating the associated spatial scale.
| VARIABLE NO. | NAME | DESCRIPTION |
|---|---|---|
| 1 | Latitude | Latitude value of well location (degrees) |
| 2 | Longitude | Longitude value of well location (degrees) |
| 3 | SampleYr | Well sample year |
| 4 | Well_Depth | Depth of measurement well (ft) |
| 5 | Elevation | Land-surface elevation at well point (ft) |
| 6 | Bdrk_Dpth | Depth (ft) to bedrock at well point |
| 7 | Bdrk_Flag | Flag indicating if well is within or above bedrock. 0 = Above bedrock; 1 = Within bedrock |
| 8 | NearAFO_Dist | Distance to nearest AFO (Animal Feeding Operation) facility (m) |
| 9 | NearAFO_Type_1 | Type of nearest AFO facility: Open Feedlot |
| 10 | NearAFO_Type_2 | Type of nearest AFO facility: Confined/Open (ie, mixed) |
| 11 | NearAFO_AnimalUnits | Total Animal Units at the nearest AFO facility |
| 12 | Count_10 kmConfmnts | Number of confinement-only AFOs within 10 km of the well point |
| 13 | Count_10 kmFeedlots | Number of feedlot-only AFOs within 10 km of the well point |
| 14 | Count_10 kmMixed | Number of mixed-only AFOs within 10 km of the well point |
| 15 | Count_10 kmHogs | Number of hog facilities within 10 km of the well point |
| 16 | precip | Estimated mean annual precipitation at well point for the time period 1981–2010 (millimeters times 100) |
| 17 | mintemp | Estimated mean annual minimum temperature at well point for the time period 1981–2010 (°C times 100) |
| 18 | maxtemp | Estimated mean annual maximum temperature at well point for the time period 1981–2010 (°C times 100) |
| 19 | SinkholeDist_m | Distance from well point to nearest sinkhole point (m) |
| 20 | K | Average horizontal hydraulic conductivity of all glacial deposits at well point (ft/day) |
| 21 | AvgK | Average horizontal hydraulic conductivity of all glacial deposits within a 4 × 4-mile square around the well point (ft/day) |
| 22 | Kz | Average vertical hydraulic conductivity of all glacial deposits at the well point (ft/day) |
| 23 | AvgKz | Average vertical hydraulic conductivity of all glacial deposits within a 4 × 4-mile square around the well point (ft/day) |
| 24 | Trans | Transmissivity of all glacial deposits at the well point (ft |
| 25 | AvgTrans | Average transmissivity of all glacial deposits within a 4 × 4-mile square around the well point (ft |
| 26 | MaxKz | Maximum kz within the 4 × 4-mile square around the well point (ft/day) |
| 27 | KKzT_Logs | Number of USGS water well logs within a 4 × 4-mile square around the well point (count) |
| 28–29 | Sand | Average percent sand within a 500-m/1-km buffer |
| 30–31 | Silt | Average percent silt within a 500-m/1-km buffer |
| 32–33 | Clay | Average percent clay within a 500-m/1-km buffer |
| 34–35 | OM | Average percent organic matter within a 500-m/1-km buffer |
| 36–37 | Db033 | Average bulk density at 1/3 bar within a 500-m/1-km buffer (g/cm3) |
| 38–39 | Dbovendry | Average oven dry bulk density at 1/3 bar within a 500-m/1-km buffer (g/cm3) |
| 40–41 | Ksat | Average saturated hydraulic conductivity within a 500-m/1-km buffer (μm/s) |
| 42–43 | AWC | Average available water capacity within a 500-m/1-km buffer (cm H2O/cm soil) |
| 44–45 | H2O15 | Average water content at 15 bar within a 500-m/1-km buffer (percent by weight) |
| 46–47 | AASHTOGr | Average AASHTO group classification within a 500-m/1-km buffer |
| 48–49 | Kw | Average K factor for whole soil within a 500-m/1-km buffer |
| 50–51 | Kf | Average K factor for rock free soil within a 500-m/1-km buffer |
| 52–53 | CaCO3 | Average calcium carbonate within a 500-m/1-km buffer (percent by weight) |
| 54–55 | CEC7 | Average cation-exchange capacity within a 500-m/1-km buffer (milliequivalents per 100 g) |
| 56–57 | pHH2O | Average pH (1 to 1 water) within a 500-m/1-km buffer |
| 58–59 | Slope | Average percent slope within a 500-m/1-km buffer |
| 60–61 | SlopeLength | Average slope length within a 500-m/1-km buffer (ft) |
| 62–63 | Runoff | Average runoff potential within a 500-m/1-km buffer (Scale: 1–6; negligible to very high) |
| 64–65 | T | Average soil loss tolerance within a 500-m/1-km buffer (tons/acre/year) |
| 66–67 | WEI | Average wind erodibility index within a 500-m/1-km buffer |
| 68–69 | Aspect | Average aspect (direction the surface of the soil faces) within a 500-m/1-km buffer (degrees) |
| 70–71 | MAP | Average mean annual precipitation within a 500-m/1-km buffer (mm) |
| 72–73 | FrostFDays | Average number of frost free days per year within a 500-m/1-km buffer |
| 74–75 | FrostAction | Average degree of frost action within a 500-m/1-km buffer (Scale: 0–3; none to high) |
| 76–77 | CorrosionCon | Average risk of concrete corrosion within a 500-m/1-km buffer (Scale: 1–3; low to high) |
| 78–79 | CorrosionSt | Average risk of steel corrosion within a 500-m/1-km buffer (Scale: 1–3; low to high) |
| 80–81 | IACSR | Average Iowa corn suitability rating within a 500-m/1-km buffer (Scale: 0–100) |
| 82–83 | WaterDepth | Average depth to water within a 500-m/1-km buffer (cm) |
| 84–85 | FloodingFreq | Average flooding frequency within a 500-m/1-km buffer (Scale: 0–4, none to very frequent) |
| 86–87 | PondingFreq | Average ponding frequency within a 500-m/1-km buffer (%) |
| 88–89 | DrainClass | Average drainage classification within a 500-m/1-km buffer (Scale: 1–7, very poorly drained to excessively drained) |
| 90–91 | FarmClass | Percent “not prime farmland” within a 500-m/1-km buffer |
| 92–93 | HELWater | Percent “not highly water erodable land” within a 500-m/1-km buffer |
| 94–95 | HELWind | Percent “not highly wind erodable land” within a 500-m/1-km buffer |
| 96–97 | Basements | Percent “very limited and somewhat limited” basement limitations within a 500-m/1-km buffer |
| 98–99 | SewageLag | Percent “very limited and somewhat limited” sewage lagoon limitations within a 500-m/1-km buffer |
| 100–101 | Trails | Percent “very limited and somewhat limited” path and trail limitations within a 500-m/1-km buffer |
| 102–103 | HydricClas | Percent “all hydric and partially hydric” hydric classifications within a 500-m/1-km buffer |
| 104–105 | TileDrn_USGS | Mean “estimated percent tile drainage on agricultural lands” within a 500-m/1-km buffer |
| 106–107 | TileDrn_IADNR | Mean “estimated percent tile drainage” within a 500-m/1-km buffer |
| 108–109 | PopDen90 | Mean population density within a 500-m/1-km buffer derived from U.S. Census 1990 (persons per km2) |
| 110–111 | PopDen00 | Mean population density within a 500-m/1-km buffer derived from U.S. Census 2000 (persons per km2) |
| 112–113 | Recharge | Estimated mean annual natural ground-water recharge within a 500-m/1-km buffer (millimeters per year) |
| 114–115 | FnGrn_Logs | Number of well logs within a 4 × 4-mile/6 × 6-mile square around the well point used to generate an interpolated total fine-grain thickness grid |
Figure 1Coefficient paths for spatial scale forward stepwise regression to explain log nitrate concentration in drinking wells in Iowa. The scale the variable entered the model is indicated by the legend.
Figure 4Coefficient paths for spatial scale lasso to explain log nitrate concentration in drinking wells in Iowa. The scale the variable entered the model is indicated by the legend. The dotted vertical line indicates the chosen model that had the minimum OLS-based AIC.
Figure 2Coefficient paths for spatial scale incremental forward stagewise regression to explain log nitrate concentration in drinking wells in Iowa. The scale the variable entered the model is indicated by the legend.
Figure 3Coefficient paths for spatial scale LARS to explain log nitrate concentration in drinking wells in Iowa. The scale the variable entered the model is indicated by the legend. The dotted vertical line indicates the chosen model that had the minimum OLS-based AIC.
Estimated coefficients from spatial scale (SS) forward stepwise, forward stagewise, LARS, and lasso models. The blank cells indicate variables not selected for a particular model. The horizontal dashed line separates the individual-level variables and the area-based variables considered at multiple spatial scales.
| VARIABLE NO. | EXPLANATORY VARIABLE | SS-STEPWISE | SS-STAGEWISE | SS-LARS | SS-LASSO |
|---|---|---|---|---|---|
| 1 | Latitude | −0.069 (*) | −0.015 (*) | −0.027 (*) | −0.026 (*) |
| 4 | Well_Depth | −0.243 (*) | −0.242 (*) | −0.240 (*) | −0.241 (*) |
| 5 | Elevation | 0.107 (*) | 0.055 (*) | 0.075 (*) | 0.074 (*) |
| 6 | Bdrk_Dpth | −0.129 (*) | −0.093 (*) | −0.109 (*) | −0.108 (*) |
| 7 | Bdrk_Flag | −0.080 (*) | −0.065 (*) | −0.072 (*) | −0.072 (*) |
| 11 | NearAFO_AnimalUnits | 0.003 | 0.006 | 0.006 | |
| 12 | Count_10 kmConfmnts | 0.001 | |||
| 13 | Count_10 kmFeedlots | 0.013 | 0.008 | 0.007 | |
| 14 | Count_10 kmMixed | 0.026 (*) | 0.020 (*) | 0.024 (*) | 0.024 (*) |
| 15 | Count_10 kmHogs | 0.026 (*) | 0.006 (*) | 0.014 (*) | 0.016 (*) |
| 19 | SinkholeDist_m | 0.244 (*) | 0.214 (*) | 0.260 (*) | 0.259 (*) |
| 21 | AvgK | −0.091 (*) | −0.021 (*) | −0.056 (*) | −0.056 (*) |
| 22 | Kz | 0.067 (*) | 0.008 | 0.032 | 0.032 |
| 23 | AvgKz | 0.017 (+) | 0.016 (*) | 0.015 (*) | |
| 25 | AvgTrans | 0.084 (*) | 0.055 (*) | 0.069 (*) | 0.070 (*) |
| 31 | Silt_1 km | 0.050 (*) | 0.038 (+) | ||
| 32 | Clay_500 m | 0.013 (+) | 0.012 (+) | ||
| 35 | OM_1 km | −0.079 (*) | −0.023 (+) | −0.057 (*) | −0.047 (*) |
| 39 | Dbovendry_1 km | 0.024 (*) | 0.025 (*) | 0.028 (*) | |
| 41 | Ksat_1 km | 0.020 (*) | 0.046 (*) | 0.041 (*) | |
| 42 | AWC_500 m | 0.019 (*) | 0.018 (*) | ||
| 46 | AASHTOGr_500 m | 0.032 | 0.020 | 0.020 | |
| 47 | AASHTOGr_1 km | 0.012 | |||
| 49 | Kw_1 km | −0.004 | |||
| 52 | CaCO3_500 m | −0.141 (*) | |||
| 53 | CaCO3_1 km | −0.099 (*) | −0.108 (*) | −0.110 (*) | |
| 55 | CEC7_1 km | −0.017 (*) | −0.047 (*) | −0.053 (*) | |
| 56 | pHH2O_500 m | −0.002 | −0.030 (*) | −0.026 (*) | |
| 58 | Slope_500 m | −0.019 (*) | −0.007 (+) | −0.009 (+) | |
| 60 | SlopeLength_500 m | 0.009 | −0.004 | ||
| 63 | Runoff_1 km | 0.021 | 0.011 | ||
| 65 | T_1 km | −0.126 (*) | −0.100 (*) | −0.112 (*) | −0.110 (*) |
| 66 | WEI_500 m | 0.100 (*) | 0.042 (*) | 0.076 (*) | 0.077 (*) |
| 71 | MAP_1 km | −0.009 | −0.009 (+) | ||
| 74 | FrostAction_500 m | 0.052 (*) | 0.027 (*) | 0.020 | 0.025 |
| 76 | CorrosionCon_500 m | 0.066 (*) | |||
| 77 | CorrosionCon_1 km | 0.050 (*) | 0.064 (*) | 0.058 (*) | |
| 78 | CorrosionSt_500 m | 0.007 | |||
| 81 | IACSR_1 km | −0.003 | |||
| 82 | WaterDepth_500 m | 0.005 | −0.003 | ||
| 83 | WaterDepth_1 km | −0.013 | |||
| 85 | FloodingFreq_1 km | 0.013 | |||
| 86 | PondingFreq_500 m | 0.029 (*) | 0.005 (+) | 0.011 | 0.009 |
| 89 | DrainClass_1 km | 0.078 (*) | 0.098 (*) | 0.066 (+) | 0.041 |
| 90 | FarmClass_500 m | −0.073 (*) | −0.048 (*) | −0.065 (*) | −0.061 (*) |
| 94 | HELWind_500 m | −0.001 | −0.010 | −0.009 (+) | |
| 97 | Basements_1 km | 0.003 | |||
| 99 | SewageLag_1 km | 0.012 | 0.007 (+) | 0.005 | |
| 100 | Trails_500 m | −0.017 | |||
| 101 | Trails_1 km | −0.024 | |||
| 107 | TileDrn_IADNR_1 km | −0.059 | 0.049 (+) | ||
| 108 | PopDen90_500 m | 0.033 (*) | 0.029 (*) | ||
| 111 | PopDen00_1 km | −0.026 (*) | −0.020 (*) | −0.051 (*) | −0.047 (*) |
| 112 | Recharge_500 m | 0.219 (*) | 0.158 (*) | 0.186 (*) | 0.189 (*) |
| 115 | FnGrn_Logs_6 mi | −0.055 (*) | −0.043 (*) | −0.047 (*) | −0.046 (*) |
Notes: Values marked with (*) have a P-value <0.05, and values marked with (+) have an associated P-value <0.1 (when covariates selected from the SS-Stagewise, SS-LARS, and SS-Lasso algorithms are plugged into OLS regression models).
Number of variables selected at each spatial scale for spatial scale (SS) forward stepwise, forward stagewise, LARS, and lasso models. The last row gives the total number of possible variables at each spatial scale.
| INDIVIDUAL-LEVEL | AREA-LEVEL | NUMBER OF VARIABLES SELECTED | ||||||
|---|---|---|---|---|---|---|---|---|
| 500 m | 1 km | 4 mi | 6 mi | |||||
| SS-Stepwise | 11 | 7 | 7 | 0 | 1 | 26 | ||
| SS-Stagewise | 15 | 11 | 12 | 0 | 1 | 39 | ||
| SS-LARS | 14 | 14 | 17 | 0 | 1 | 46 | ||
| SS-Lasso | 14 | 14 | 13 | 0 | 1 | 42 | ||
| Number of available variables | 27 | 43 | 1 | 71 | ||||
Number of shared significant variables with the same sign and spatial scale and total number of shared variables for spatial scale (SS) forward stepwise, forward stagewise, LARS, and lasso models. The frequency of shared significant variables with the same sign and spatial scale is given along with the total number of shared variables in parentheses.
| INDIVIDUAL-LEVEL | AREA-LEVEL | ||||
|---|---|---|---|---|---|
| 500 m | 1 km | 4 mi | 6 mi | ||
| SS-Stepwise, SS-Stagewise, SS-LARS, SS-Lasso | 10 | 3 | 2 | 0 | 1 |
| No. of shared variables | (11) | (5) | (4) | (0) | (1) |
| SS-Stepwise, SS-Stagewise, SS-LARS | 10 | 3 | 2 | 0 | 1 |
| No. of shared variables | (11) | (5) | (4) | (0) | (1) |
| SS-Stepwise, SS-Stagewise, SS-Lasso | 10 | 3 | 2 | 0 | 1 |
| No. of shared variables | (11) | (5) | (4) | (0) | (1) |
| SS-Stepwise, SS-LARS, SS-Lasso | 10 | 3 | 3 | 0 | 1 |
| No. of shared variables | (11) | (5) | (5) | (0) | (1) |
| SS-Stagewise, SS-LARS, SS-Lasso | 10 | 3 | 7 | 0 | 1 |
| No. of shared variables | (14) | (9) | (9) | (0) | (1) |
| SS-Stepwise, SS-Stagewise | 10 | 4 | 3 | 0 | 1 |
| No. of shared variables | (11) | (5) | (4) | (0) | (1) |
| SS-Stepwise, SS-LARS | 10 | 3 | 3 | 0 | 1 |
| No. of shared variables | (11) | (5) | (5) | (0) | (1) |
| SS-Stepwise, SS-Lasso | 10 | 3 | 3 | 0 | 1 |
| No. of shared variables | (11) | (5) | (5) | (0) | (1) |
| SS-Stagewise, SS-LARS | 10 | 3 | 7 | 0 | 1 |
| No. of shared variables | (14) | (10) | (10) | (0) | (1) |
| SS-Stagewise, SS-Lasso | 10 | 3 | 7 | 0 | 1 |
| No. of shared variables | (14) | (9) | (9) | (0) | (1) |
| SS-LARS, SS-Lasso | 11 | 6 | 8 | 0 | 1 |
| No. of shared variables | (14) | (13) | (13) | (0) | (1) |
| Significant but with different signs | 0 | 0 | 0 | 0 | 0 |
| Significant but with different SS | – | 2 | 0 | 0 | 0 |
Notes: Variables with a P-value <0.05 are considered significant (when covariates selected from the SS-Stagewise, SS-LARS, and SS-Lasso algorithms are plugged into OLS regression models).
In comparing SS-Stepwise with SS-Stagewise, SS-LARS, and SS-Lasso.
OLS-based Akaike information criterion (AIC) comparisons across spatial scale (SS) forward stepwise, forward stagewise, LARS, and lasso models.
| SS-STEPWISE | SS-STAGEWISE | SS-LARS | SS-LASSO | |
|---|---|---|---|---|
| Model 1: Smallest SS available | 28,193.57 | 28,196.73 | 28,183.15 | 28,178.17 |
| Model 2: Largest SS available | 28,144.05 | 28,143.04 | 28,133.79 | 28,131.90 |
| Model 3: Model-selected SS | 28,130.90 | 28,135.15 | 28,100.19 | 28,096.65 |