| Literature DB >> 30532078 |
Alison P Appling1, Jordan S Read2, Luke A Winslow3, Maite Arroita4,5, Emily S Bernhardt6, Natalie A Griffiths7, Robert O Hall4, Judson W Harvey8, James B Heffernan9, Emily H Stanley10, Edward G Stets11, Charles B Yackulic12.
Abstract
A national-scale quantification of metabolic energy flow in streams and rivers can improve understanding of the temporal dynamics of in-stream activity, links between energy cycling and ecosystem services, and the effects of human activities on aquatic metabolism. The two dominant terms in aquatic metabolism, gross primary production (GPP) and aerobic respiration (ER), have recently become practical to estimate for many sites due to improved modeling approaches and the availability of requisite model inputs in public datasets. We assembled inputs from the U.S. Geological Survey and National Aeronautics and Space Administration for October 2007 to January 2017. We then ran models to estimate daily GPP, ER, and the gas exchange rate coefficient for 356 streams and rivers across the continental United States. We also gathered potential explanatory variables and spatial information for cross-referencing this dataset with other datasets of watershed characteristics. This dataset offers a first national assessment of many-day time series of metabolic rates for up to 9 years per site, with a total of 490,907 site-days of estimates.Entities:
Year: 2018 PMID: 30532078 PMCID: PMC6289110 DOI: 10.1038/sdata.2018.292
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Data items included in 1.
| ID | Title | Description | Format |
|---|---|---|---|
| 1 | Site data | Site identifiers, details, and quality indicators | Table with 1 row per site (tab-delimited file) |
| 2 | Spatial data | Site coordinates (2a) and catchment boundaries (2b) | 1 shapefile for all coordinates and 1 for all catchments (.shp, .shx, .dbf, and .prj files) |
| 3 | Timeseries data | Data on water quality and quantity, collected or computed from outside sources | Tables with one row per time series observation (1 tab-delimited file per site-variable combination, 1 zip file per site) |
| 4 | Model inputs | Data formatted for use in estimating metabolism | Tables of prepared time series inputs (1 tab-delimited file per site, in 1 zip file per site) |
| 5 | Model configurations | Model specifications used to estimate metabolism | Table with 1 row per model (1 tab-delimited file, compressed into zip file) |
| 6 | Model outputs | Complete fits from metabolism estimation models | Text and 4 tables for each model (tab-delimited files, 1 zip file per model) |
| 7 | Model diagnostics | Key diagnostics and overall assessments of model performance | Table with 1 row per model (1 tab-delimited file, compressed into zip file) |
| 8 | Metabolism estimates and predictors | Daily metabolism estimates and potential predictor variables to support further exploration | Table with 1 row per site-date combination (1 tab-delimited file, compressed into zip file) |
Figure 1Inputs and workflow to generate metabolism estimates and supporting datasets.
Inputs are either exogenous (dark orange plaque shapes) or encapsulate the authors’ configuration decisions (gray trapezoids). Data processing steps leverage several R packages and other tools (blue rounded rectangles); specifics of these steps are documented in the text. Data products included in this release (yellow rectangles) are organized into 8 final items (superscripts, corresponding to IDs in Table 1).
Figure 2Sites included in this data publication.
Sites that met the initial site selection criteria but did not have sufficient data to be modeled are gray triangles. Sites with sufficient data for modeling are filled circles, colored according to the number of dates for which estimates were produced (3296 days is 9.02 years).
Data sources for boundaries of the catchments contributing to sites in this data release.
| Description | Reference | Number of Basins |
|---|---|---|
| EPA BASINS | [ | 262 |
| USGS StreamStats | [ | 54 |
| USGS GAGES-II | [ | 27 |
| Falcone | [ | 11 |
| Wieczorek 2012 | [ | 9 |
| USGS National Map Viewer | [ | 7 |
| Nakagaki | [ | 1 |
Definitions and provenance of timeseries variables downloaded from external databases.
| Variable Name | Description (Units) | Source Database | Parameter Code |
|---|---|---|---|
| disch_nwis | Discharge (ft3 s−1) | [ | 00060 |
| doobs_nwis | Dissolved oxygen concentration (mg O2 L−1) | [ | 00300 |
| wtr_nwis | Water temperature (°C) | [ | 00010 |
| baro_nldas | Surface pressure (Pa) | [ | pressfc |
| baro_gldas | Surface air pressure (Pa) | [ | psurf_f_inst |
| sw_nldas | Downwards shortwave radiation flux, surface (W m−2) | [ | dswrfsfc |
| sw_gldas | Downward shortwave radiation flux, surface (W m−2) | [ | SWdown_f_tavg |
Definitions and provenance of calculated timeseries variables.
| Variable Name | Description (Units) | Sources | Equation or streamMetabolizer function |
|---|---|---|---|
| Sources include other variables from this table, DateTimes of the [O2] data, hydraulic geometry coefficients from the cited sources, and site data (altitude, latitude, longitude). Where Sources are “X or Y”, the source ending in _nldas was preferred over _gldas, and _calcDischHarvey over _calcDischRaymond, whenever available. | |||
| baro_calcElev | Surface pressure (Pa) | altitude | calc_air_pressure() |
| depth_calcDischHarvey | Stream depth (m) | ||
| depth_calcDischRaymond | Stream depth (m) | ||
| dischdaily_calcDMean | Daily average discharge (m3 s−1) | disch_nwis | Daily mean (4am-3:59am) |
| doamp_calcDAmp | Daily amplitude in percent O2 saturation (%) | dopsat_calcObsSat | Daily range (4am-3:59am) |
| dopsat_calcObsSat | Percent O2 saturation (%) | doobs_nwis, dosat_calcGGbts | 100×doobs_nwis/dosat_calcGGbts |
| dosat_calcGGbconst | [O2] at saturation (mgO2 L−1) | baro_calcElev | calc_DO_sat() |
| dosat_calcGGbts | [O2] at saturation (mgO2 L−1) | baro_nldas or baro_gldas | calc_DO_sat() |
| par_calcLat | Photosynthetic photon flux density, PPFD ( | suntime_calcLon, latitude | calc_light() |
| par_calcLatSw | PPFD ( | par_calcLat, par_calcSw | calc_light_merged() |
| par_calcSw | PPFD ( | sw_nldas or sw_gldas | convert_PAR_to_SW() |
| sitedate_calcLon | Solar noon of the date (unitless) | DateTime | convert_UTC_to-_solartime() |
| sitetime_calcLon | Mean solar time (unitless) | DateTime, longitude | convert_UTC_to-_solartime() |
| suntime_calcLon | Apparent solar time (unitless) | DateTime, coordinates | convert_UTC_to-_solartime() |
| swdaily_calcDMean | Daily average downwards shortwave radiation flux (W m−2) | sw_nldas or sw_gldas | Daily mean (4am-3:59am) |
| veloc_calcDischHarvey | Stream velocity (m s−1) | ||
| veloc_calcDischRaymond | Stream velocity (m s−1) | ||
| velocdaily_calcDMean | Daily average velocity (m s−1) | veloc_calcDischHarvey or veloc_calcDischRaymond | Daily mean (4am-3:59am) |
Figure 3Temporal distribution of metabolism estimates at each site.
Each site forms a row, and horizontal line segments represent periods of continuous daily metabolism estimates. Colors give density of estimates, ranging from 17 to 365 daily estimates per year. For the purpose of this figure, sites were considered “seasonal” if the number of metabolism estimates in January was fewer than 1/24 the total number of estimates at a site (112 of 356 sites meet this criterion).
Counts of sites by distance to nearest structure, for the 333 modeled sites with catchment information.
| Structure | P0 | P50 | P80 | P95 |
|---|---|---|---|---|
| Column names give the lower bound on the distance, as a percentile of each site’s daily reach lengths, to the nearest structure of each type. For example, the nearest canal or ditch is located between the 0th and 50th percentile of reach lengths at 61 modeled sites; the nearest canal or ditch is beyond the 95th percentile at 246 sites; and 64 sites have no known structure closer than the 95th percentile of their reach lengths. | ||||
| Canal/ditch | 61 | 13 | 13 | 246 |
| Dam | 169 | 29 | 30 | 105 |
| NPDES | 130 | 38 | 27 | 138 |
| Any | 210 | 32 | 27 | 64 |
Model output assessment criteria and counts of models meeting each criterion.
| Measure | Low | Medium | High |
|---|---|---|---|
| Meeting any of the criteria in the Low column earns a model Low confidence, and meeting all criteria in the High column is required to earn High confidence. Parentheses in the table body contain the number of models meeting each criterion. | |||
| max( | >1.2 (49) | n.a. | <1.2 (384) |
| >50 (7) | 15–50 (52) | <15 (374) | |
| Negative GPP (%) | >50 (5) | 25–50 (4) | <25 (424) |
| Positive ER (%) | >50 (17) | 25–50 (35) | <25 (381) |
| Overall confidence | 71 | 63 | 299 |