| Literature DB >> 24675770 |
Ian R Waite1, Jonathan G Kennen2, Jason T May3, Larry R Brown3, Thomas F Cuffney4, Kimberly A Jones5, James L Orlando3.
Abstract
We developed independent predictive disturbance models for a full regional data set and four individual ecoregions (Full Region vs. Individual Ecoregion models) to evaluate effects of spatial scale on the assessment of human landscape modification, on predicted response of stream biota, and the effect of other possible confounding factors, such as watershed size and elevation, on model performance. We selected macroinvertebrate sampling sites for model development (n = 591) and validation (n = 467) that met strict screening criteria from four proximal ecoregions in the northeastern U.S.: North Central Appalachians, Ridge and Valley, Northeastern Highlands, and Northern Piedmont. Models were developed using boosted regression tree (BRT) techniques for four macroinvertebrate metrics; results were compared among ecoregions and metrics. Comparing within a region but across the four macroinvertebrate metrics, the average richness of tolerant taxa (RichTOL) had the highest R(2) for BRT models. Across the four metrics, final BRT models had between four and seven explanatory variables and always included a variable related to urbanization (e.g., population density, percent urban, or percent manmade channels), and either a measure of hydrologic runoff (e.g., minimum April, average December, or maximum monthly runoff) and(or) a natural landscape factor (e.g., riparian slope, precipitation, and elevation), or a measure of riparian disturbance. Contrary to our expectations, Full Region models explained nearly as much variance in the macroinvertebrate data as Individual Ecoregion models, and taking into account watershed size or elevation did not appear to improve model performance. As a result, it may be advantageous for bioassessment programs to develop large regional models as a preliminary assessment of overall disturbance conditions as long as the range in natural landscape variability is not excessive.Entities:
Mesh:
Year: 2014 PMID: 24675770 PMCID: PMC3968005 DOI: 10.1371/journal.pone.0090944
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description, variable code and definition of explanatory environmental (landscape, riparian and habitat) and response (invertebrate metrics) variables used for model development.
| Description | Variable Code | Definition |
| Explanatory Variables | ||
| Watershed Variables | ||
| Percent Agricultural Landuse | Ag | Percent watershed area in agricultural landuse (NLCD 2001 category 81, 82) |
| Percent Urban Landuse | Urban | Percent watershed area in urban landuse (NLCD 2001 categories 21, 22, 23, and 24) |
| Sum of Percent Ag + Urban | Ag+Urb | Sum of percent watershed area in urban (NLCD 2001 categories 21, 22, 23, and 24) and agricultural (NLCD 81, 82) landuse |
| Percent Forest | Forest | Percent watershed area in forest landuse (NLCD 2001 categories 41, 42, 43) |
| Percent Wetland | Wetland | Percent watershed area in wetland land cover (NLCD 2001 category 90 and 95) |
| Road Density | Rd.Density | Road density in watershed = Road length (km)/watershed area (km2) |
| Mean Population Density | Pop.Density | Watershed mean population density based on 2000 census (#/km2) |
| Dam Density | Dam Density | Density of dams in watershed = Number of dams/watershed area (km2) |
| Percent Manmade Channel | ManMadeChan | Percentage of linear water features in stream buffer which are manmade |
| Riparian Variables | ||
| Percent Agricultural Landuse | Rip.AG | Percent buffer area in agricultural landuse (NLCD 2001 category 81, 82) in riparian buffer |
| Percent Urban Landuse | Rip.Urban | Percent buffer area in urban landuse (NLCD 2001 categories 21, 22, 23, and 24) in riparian buffer |
| Sum of Percent Ag + Urban | Rip.Ag+Urb | Sum of percent buffer area in urban (NLCD 2001 categories 21, 22, 23, and 24) and agricultural (NLCD 81, 82) landuse in riparian buffer |
| Percent Forest | Rip.Forest | Percent buffer area in forest landuse (NLCD 2001 categories 41, 42, 43) in riparian buffer |
| Mean Tree Canopy Cover | Rip.Canopy | Percent canopy cover (NLCD 2001 Percent Tree Canopy dataset; 30 m pixel) in riparian buffer |
| Road Density | Rip.Rd.Dens | Road density in buffer = Road length (km)/riparian buffer area (km2) in |
| Mean Population Density | Rip.Pop.Dens | Buffer area mean population density based on 2000 census (#/km2) |
| Natural Landscape Variables | ||
| Watershed Mean Elevation | Mn.Elev | Mean watershed elevation (m) |
| Watershed Mean Slope Percent | Slope | Mean percent watershed slope (%) |
| Mean Annual Precipitation | Mn.Ann.Precip | Mean annual precipitation (mm) |
| Riparian Mean Slope Percent | Rip.Slope | Mean percent riparian buffer slope (%) |
| Riparian Maximum Elevation | Rip.Max.Elev | Maximum riparian buffer elevation (m) |
| Soil Infiltration Rate B | Soil Infiltration B | Area of stream buffer having soils with moderate infiltration rates (From NRCS, STATSGO database) (km2) |
| Soil Infiltration Rate C | Soil Infiltration C | Area of stream buffer having soils with slow infiltration rates (From NRCS, STATSGO database) (km2) |
| Soil Infiltration Rate D | Soil Infiltration D | Area of stream buffer having soils with very slow infiltration rates (From NRCS, STATSGO database) (km2) |
| Hydrologic Runoff Variables | ||
| Average Monthly Coefficient of Variation | Ave_MonthCV | Coefficient of variation of average monthly runoff values for 2001 |
| Maximum Monthly Runoff | Max_Monthly | Maximum monthly runoff for 2001 (mm) |
| Maximum Monthly Coefficient of Variation | Max_MonthCV | Coefficient of variation of maximum monthly runoff values for 2001 |
| Maximum Runoff for January Months | Max_January | Maximum runoff for January 2001 (mm) |
| Average March Runoff | Ave_March | Average runoff for March 2001 (mm) |
| Maximum March Runoff | Max_March | Maximum runoff for March 2001 (mm) |
| Minimum Runoff for April | Min_April | Minimum runoff for April 2001 (mm) |
| Maximum Runoff for April | Max_April | Maxmum runoff for April 2001 (mm) |
| Average Spring Runoff | Ave_Spring | Average runoff for April and May 2001 (mm) |
| Maximum Runoff for May | Max_May | Maximum runoff for May2001 (mm) |
| Maximum Runoff for July | Ave_July | Maximum runoff for July 2001 (mm) |
| Maximum Runoff for November | Max_Nov | Maximum runoff for November 2001 (mm) |
| Average December Runoff | Ave_Dec | Average runoff for December 2001 (mm) |
| Response Variables: Invertebrate Metrics | ||
| Ephemeroptera, Plecoptera, and Trichoptera Richness | EPTR | Richness composed of Mayflies, Stoneflies and Caddisflies for a sample |
| Average Tolerance of Taxa in a Sample | RichTOL | Average USEPA tolerance values for sample based on richness |
| Intolerant Richness | INTOL_RICH | Number of USEPA intolerance (0 – 4 values) taxa |
| Noninsect Richness | NonInsectR | Total richness composed on noninsects |
All variables listed were initially considered for inclusion in boosted regression tree (BRT) models. NLCD–National Land Cover Dataset, NRCS–National Resource Conservation Service, STATSGO–State Soil Geographic data base.
Summary of select environmental variables by region, numeric variables presented as average values, range in parentheses.*
| Region | Urban (%) | Agriculture (%) | Population Density (#/km2) | Manmade Channels (%) | Ave_Spring Runoff (mm) | Mean Annual Precipitation (mm) | Mean Slope (%) |
|
|
|
|
|
|
|
|
|
| n = 591 | (0 – 94) | (0 – 97) | (0 – 13300) | (0 – 40) | (31 – 116) | (900 – 1610) | (0.4 – 18.8) |
|
|
|
|
|
|
|
|
|
| n = 167 | (0 – 37) | (0 – 50) | (0 – 390) | (0 – 17) | (52 – 87) | (900 – 1450) | (1.7 – 18.8) |
|
|
|
|
|
|
|
|
|
| n = 152 | (0 – 88) | (0 – 75) | (0 – 2100) | (0 – 40) | (34 – 87) | (1010 – 1330) | (1.6 – 13.0) |
|
|
|
|
|
|
|
|
|
| n = 139 | (0 – 56) | (0 – 57) | (0 – 1800) | (0 – 34) | (44 – 116) | (990 – 1610) | (3.3 – 17.0) |
|
|
|
|
|
|
|
|
|
| n = 133 | (1 – 94) | (0 – 97) | (70 – 13300) | (0 – 32) | (31 – 58) | (1060 – 1350) | (0.4 – 8.3) |
*Ave_Spring equal to the average runoff for the spring months (April-May).
Figure 1Map showing the stream sites used for model development (stars) and model validation (triangles).
Sites used for model development (stars) (n = 591) and model validation (triangles) (n = 467) are evenly spread across the four ecoregions used in this study (N.C. Appalachian, Ridge and Valley, N.E. Highlands and N. Piedmont).
Comparison of model evaluation statistics for boosted regression tree models (BRT) for four macroinvertebrate metrics for development (develop) and validation (valid) data sets at two spatial scales (full region and four ecoregions), number of variables in final model in parentheses.
| Metric | Model Data Set | Model Statistic | Full Region | North Central Appalachians | Ridge and Valley | Northeastern Highlands | Northern Piedmont |
| n = 591 | n = 167 | n = 152 | n = 139 | n = 133 | |||
| EPTR | Develop | R2 | 0.63 (6) | 0.54 (4) | 0.65 (5) | 0.70 (4) | 0.76 (4) |
| CV R2 | 0.46 | 0.27 | 0.30 | 0.38 | 0.50 | ||
| Valid | R2 | 0.63 | 0.68 | 0.82 | 0.57 | 0.78 | |
| CV R2 | 0.41 | 0.08 | 0.39 | 0.28 | 0.48 | ||
| RichTOL | Develop | R2 | 0.77 (5) | 0.67 (5) | 0.81 (4) | 0.80 (4) | 0.84 (4) |
| CV R2 | 0.64 | 0.42 | 0.51 | 0.56 | 0.65 | ||
| Valid | R2 | 0.73 | 0.64 | 0.64 | 0.68 | 0.82 | |
| CV R2 | 0.59 | 0.23 | 0.34 | 0.44 | 0.62 | ||
| INTOL_RICH | Develop | R2 | 0.66 (5) | 0.60 (7) | 0.65 (5) | 0.67 (4) | 0.72 (6) |
| CV R2 | 0.49 | 0.27 | 0.33 | 0.43 | 0.42 | ||
| Valid | R2 | 0.66 | 0.70 | 0.67 | 0.59 | 0.70 | |
| CV R2 | 0.48 | 0.26 | 0.42 | 0.30 | 0.39 | ||
| NonInsectR | Develop | R2 | 0.68 (4) | 0.66 (5) | 0.69 (4) | 0.75 (5) | 0.72 (4) |
| CV R2 | 0.56 | 0.34 | 0.49 | 0.43 | 0.48 | ||
| Valid | R2 | 0.67 | 0.31 | 0.80 | 0.70 | 0.80 | |
| CV R2 | 0.53 | 0.02 | 0.57 | 0.39 | 0.62 |
Validation models run with the same variables as the final development model.*
*R2–adjusted R-squared, CV R2-Cross validation R2; EPTR–Total taxa richness of Ephemeroptera (mayflies), Plecoptera (stoneflies) and Trichoptera (caddisflies); RichTOL-Average tolerance of all taxa; INTOL_RICH-Richness of intolerant taxa; NonInsectR-Noninsect taxa richness.
Comparison of explanatory variables for boosted regression trees (BRT) models for four macroinvertebrate metrics at two spatial scales (Full Region and four Individual Ecoregions); variables are presented in descending order of variable relative importance (VRI) in each model.*
| Invertebrate Metrics | Full Region | North Central Appalachians | Ridge and Valley | Northeastern Highlands | Northern Piedmont | |||||
| n = 591 | n = 167 | n = 152 | n = 139 | n = 133 | ||||||
| VRI | VRI | VRI | VRI | VRI | ||||||
| EPT Richness | Urban | 34 | Min_April | 29 | Rip.Forest | 32 | Urban | 51 | Urban | 46 |
| (EPTR) | Rip.Slope | 19 | Rip.Pop.Density | 27 | Ag+Urban | 21 | Wetland | 17 | Mean Elevation | 26 |
| Ave_Dec | 14 | Rip.MeanCanopy | 25 | Pop.Density | 18 | ManMadeChan | 16 | Rip.Forest | 18 | |
| Rip.MeanCanopy | 12 | Road Density | 20 | Max_MonthCV | 17 | Rip.Forest | 16 | ManMadeChan | 10 | |
| Rip.Forest | 12 | Urban | 12 | |||||||
| ManMadeChan | 9 | |||||||||
| Average | Pop.Density | 31 | Rip.Forest | 36 | ManMadeChan | 33 | Ave_March | 48 | Mean Elevation | 51 |
| Tolerance of all | ManMadeChan | 25 | Rip.Pop.Density | 17 | Rip.Forest | 29 | Rip.Wetland | 27 | ManMadeChan | 22 |
| Taxa | Rip.Slope | 16 | Rip.Ag | 17 | Max_Nov | 20 | Urban | 15 | Urban | 15 |
| (RichTOL) | Rip.Forest | 14 | Pop.Density | 16 | Pop.Density | 18 | Road Density | 10 | Rip.Forest | 12 |
| Ave_Dec | 14 | Soil Infiltration D | 14 | |||||||
| Richness of | Rip.Forest | 34 | Rip.Forest | 22 | Rip.Forest | 31 | Urban | 43 | Urban | 28 |
| Intolerant Taxa | Pop.Density | 23 | Rip.Slope | 18 | Pop.Density | 24 | Ave_July | 25 | Rip.Max.Elev | 20 |
| (INTOL_RICH) | Forest | 17 | Soil Infiltration D | 14 | Urban | 16 | Rip.Forest | 19 | Mean Slope | 18 |
| Rip.Max.Elev | 17 | Min_April | 13 | Rip.MeanCanopy | 16 | ManMadeChan | 14 | Soil Infiltration B | 14 | |
| Max_Monthly | 10 | Rip.Ag | 13 | Mean Slope | 13 | Rip.Wetland | 11 | |||
| Rip.Pop.Density | 12 | Pop.Density | 10 | |||||||
| Road Density | 9 | |||||||||
| Noninsect | Ave_March | 41 | Rip.Forest | 25 | Max_March | 44 | Max_January | 24 | Max_April | 36 |
| Richness | ManMadeChan | 26 | Pop.Density | 24 | Max_May | 24 | Ave_March | 24 | Rip.Max.Elev | 27 |
| (NonInsectR) | Mean Elevation | 17 | ManMadeChan | 22 | Soil Infiltration C | 21 | Rip.Wetland | 23 | Soil Infiltration C | 20 |
| Rip.Slope | 16 | Dam Density | 15 | ManMadeChan | 11 | Rip.Slope | 18 | ManMadeChan | 17 | |
| Rip.Pop.Density | 14 | Pop.Density | 12 | |||||||
*EPTR –Total taxa richness of Ephemeroptera, Plecoptera and Trichoptera; B–Soils with a moderate infiltration rate, C–Soils with a slow infiltration rate, D–Soils with a very slow infiltration rate, Rip –Riparian. See Table 1 for variable definitions.
Figure 2Partial dependency plots for variables in BRT model for RichTOL for the Full Region.
Boosted regression tree partial dependency plots show the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of individual explanatory variables with the response of all other variables removed (development data set). Shown in order of model importance: (A) population density (numbers/km2), (B) percent manmade channels, (C) riparian slope and (D) percent riparian forest, model R2 = 0.77. The relative contribution of each explanatory variable is reported in parentheses. Refer to Table 1 for variable definitions. The top two variables for the Full Region model showed potential threshold type responses for urbanization variables and the third variable, a natural factor, is likely acting as an urban surrogate (riparian slope). The final variable, a measure of riparian disturbance follows a more linear response and along with urbanization variables was a common explanatory variable in most models among the different regions.
Figure 3Partial dependency plots for variables in BRT model for RichTOL for North Central Appalachian Region.
Boosted regression tree partial dependency plots show the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of individual explanatory variables with the response of all other variables removed (development data set). Shown in order of model importance: (A) percent riparian forest, (B) riparian population density (#/km2), (C) percent riparian agriculture and (D) population density (#/km2). The relative contribution of each explanatory variable is reported in parentheses. Refer to Table 1 for variable definitions. Three of the four variables can be interpreted as disturbance variables, two directly assessing urban land use (population density) and the third, riparian forest, which measures the amount of disturbance in the riparian zone was the top variable modeled. However, this region had the shortest disturbance gradient and the lowest modeled R2 (0.67), though still relatively strong.
Figure 4Partial dependency plots for variables in BRT model for RichTOL for Ridge and Valley Region.
Boosted regression tree partial dependency plots show the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of individual explanatory variables with the response of all other variables removed (development data set). Shown in order of model importance: (A) percent manmade channels, (B) percent riparian forests, (C) maximum November runoff (mm) and (D) population density (#/km2), model R2 = 0.81. The relative contribution of each explanatory variable is reported in parentheses. Refer to Table 1 for variable definitions. Three of the four variables measure the effects of disturbance, two measure the response to urban land use and the other disturbance in the riparian zone due to either agriculture or urbanization. The fourth variable shows the response due to maximum November runoff.
Figure 5Partial dependency plots for variables in BRT model for RichTOL for NE Highlands Region.
Boosted regression tree partial dependency plots show the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of individual explanatory variables with the response of all other variables removed (development data set). Shown in order of model importance: (A) average March runoff (mm), (B) percent riparian wetlands, (C) percent urban and (D) road density (km/km2), model R2 = 0.80. The relative contribution of each explanatory variable is reported in parentheses. Refer to Table 1 for variable definitions. All four explanatory variables modeled can be interpreted as an urbanization land use effect. Average March is expected to increase due to higher imperviousness with higher urbanization and percent riparian wetlands we believe is acting as a surrogate for urbanization; higher wetlands commonly occur in lower elevation valleys where there is commonly more urban development. The last two variables measure urban land use directly.
Figure 6Partial dependency plots for variables in BRT model for RichTOL for Northern Piedmont Region.
Boosted regression tree partial dependency plots show the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of individual explanatory variables with the response of all other variables removed (development data set). Shown in order of model importance: (A) mean elevation (m), (B) percent manmade channels, (C) percent urban and (D) percent riparian forest, model R2 = 0.84. The relative contribution of each explanatory variable is reported in parentheses. Refer to Table 1 for variable definitions. This was the only model that had a natural factor as the top explanatory variable, however, we believe elevation is acting as a strong surrogate in this region for urbanization, though it is likely more complex than a strictly one for one surrogate. The other three variables all measure directly or indirectly urban land use disturbance and show a relatively strong potential threshold type response.
Figure 7Interaction of manmade streams and mean elevation on RichTOL for Northern Piedmont BRT model.
Boosted regression tree partial dependency plot shows the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of the interaction of two individual explanatory variables along the response variable (all other variable responses removed). There is a relatively strong interaction acting on RichTOL at low values of mean elevation and high values of percent manmade streams that cause high values of tolerant taxa to occur. This is a common pattern, higher urbanization occurring in the lower elevation valleys.
Figure 8Interaction of riparian wetlands and average March runoff (mm) on RichTOL for N.E Highlands BRT model.
Boosted regression tree partial dependency plot shows the response form of average taxa tolerance (y-axis = fitted function of RichTOL) based on the effect of the interaction of two individual explanatory variables along the response variable (all other variable responses removed). There is a relatively large interaction at high values of average March runoff when there are also high values of percent riparian wetland thus resulting in higher values of tolerant taxa (RichTOL) than would be expected. We believe that high values of riparian wetland are acting as a surrogate for high values of percent urban land use.
Figure 9Observed versus predicted plots for BRT models for development (left) and validation (right) data sets.
The observed versus predicted plots are based on the boosted regression models developed for average taxa tolerance (RichTOL) for five models: Full Region and four individual ecoregions (NC Appalachian, Ridge and Valley, NE Highlands, and N. Piedmont). The Full Region and N. Piedmont region plot relatively tight to the 1∶1 line for both the development and validation models indicating a good predictive fit with only slight bias at high and low values of RichTOL. The other regions in general showed more scatter and the N.C. Appalachian region which had the lowest modeled R2, had had the shortest disturbance gradient (narrow range of RichTOL values) compared to the other regions.
Comparison of model evaluation statistics for four macroinvertebrate metrics for three watershed size classes, number of variables in final model in parentheses.*
| Macroinvertebrates | Model Type | Model Statistic | Full Region | WS Size Class 1 (8 – 27 km2) | WS Size Class 2 (>27<66 km2) | WS Size Class 3 (> 66<777 km2) |
| n = 591 | n = 282 | n = 188 | n = 121 | |||
| EPT Richness | BRT | R2 | 0.63 (6) | 0.64 (5) | 0.69 (4) | 0.65 (4) |
| (EPTR) | RMSE | 2.34 | 2.68 | 2.30 | 2.18 | |
| Average Tolerance of | BRT | R2 | 0.77 (5) | 0.78 (4) | 0.83 (4) | 0.68 (4) |
| all Taxa (RichTOL) | RMSE | 0.53 | 0.58 | 0.46 | 0.39 | |
| Richness of Intolerant | BRT | R2 | 0.66 (5) | 0.68 (4) | 0.67 (4) | 0.63 (4) |
| Taxa (INTOL_RICH) | RMSE | 2.36 | 2.55 | 2.12 | 2.09 | |
| Noninsect Richness | BRT | R2 | 0.68 (4) | 0.77 (4) | 0.73 (4) | 0.72 (4) |
| (NonInsectR) | RMSE | 1.39 | 1.28 | 1.25 | 2.09 |
*BRT – Boosted Regression Trees. R2–adjusted R-squared, CV R2—cross-validation R2, EPTR–Total taxa richness of Ephemeroptera, Plecoptera and Trichoptera.