Literature DB >> 25148521

Nitrate variability in groundwater of North Carolina using monitoring and private well data models.

Kyle P Messier1, Evan Kane, Rick Bolich, Marc L Serre.   

Abstract

n class="Chemical">Nitrate (n class="Gene">NO3-) is a widespread contaminant of groundwater and surface water across the United States that has deleterious effects to human and ecological health. This study develops a model for predicting point-level groundwater NO3- at a state scale for monitoring wells and private wells of North Carolina. A land use regression (LUR) model selection procedure is developed for determining nonlinear model explanatory variables when they are known to be correlated. Bayesian Maximum Entropy (BME) is used to integrate the LUR model to create a LUR-BME model of spatial/temporal varying groundwater NO3- concentrations. LUR-BME results in a leave-one-out cross-validation r2 of 0.74 and 0.33 for monitoring and private wells, effectively predicting within spatial covariance ranges. Results show significant differences in the spatial distribution of groundwater NO3- contamination in monitoring versus private wells; high NO3- concentrations in the southeastern plains of North Carolina; and wastewater treatment residuals and swine confined animal feeding operations as local sources of NO3- in monitoring wells. Results are of interest to agencies that regulate drinking water sources or monitor health outcomes from ingestion of drinking water. Lastly, LUR-BME model estimates can be integrated into surface water models for more accurate management of nonpoint sources of nitrogen.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25148521      PMCID: PMC4165464          DOI: 10.1021/es502725f

Source DB:  PubMed          Journal:  Environ Sci Technol        ISSN: 0013-936X            Impact factor:   9.028


Introduction

n class="Chemical">Nitrate (n class="Gene">NO3–) is a widespread contaminant of groundwater and surface water across the United States that has deleterious effects to human and ecological health.[1,2] The maximum contaminant level of 10 mg/L established by the U.S. Environmental Protection Agency was based on the prevention of methemoglobinemia in infants;[3] moreover, there is concern of many cancer types[4−6] and from lower concentration exposures.[7] Excessive NO3– inputs into the environment can result in adverse changes to ecosystems such as eutrophication and harmful algal blooms.[8−10] Protection of n class="Chemical">drinking water sources is mandated by the Safe Drinking n class="Chemical">Water Act; however, private well drinking water is unregulated in contrast to regulated public water systems.[11] In North Carolina where more than 1/4 of the population relies on private wells for drinking water,[12] quantifying potential exposures is important to protect public health. Monitoring programs such as the U.S. Geological Survey’s (USGS) National Water Quality Assessment (NAWQA) Program[13] and the NC Division of Water Resources (NC DWR) ambient monitoring program[14] are effective because they use consistent sampling and analytical methods, yet this water quality monitoring data is spatially and temporally sparse. Land use regression[15−21](LUR) is a proven method that complements monitoring programs and provides effective means for n class="Chemical">water quality exposure assessments. Previous studies have related land use characteristics to n class="Gene">NO3– contamination in surface waters[22−25] and groundwater. Additionally, regression-based methods have been implemented for estimating loading to surface waters.[21,23,24] In North Carolina, groundwater discharge to streams (baseflow) accounts for roughly two-thirds of annual streamflow in the Coastal Plains region of North Carolina[26] and may be contributing excess nutrient loads in streams;[27] however, current surface water models do not directly account for this large source of NO3– from baseflow. For linear regression models, traditional statistical methods to select predictor variables include forward, backward, and stepwise selection. These methods can lead to erroneous models with high multicollinearity when the candidate variables are related. However, for LUR model studies, model selection methods have been modified to accommodate the potential high multicollinearity from selection variables that differ only by a hyperparameter.[16,19] Additionally, lasso[28] and elastic net[29] regression are potential methods for selecting linear LUR models, but to the authors’ knowledge has not been employed for LUR model selection. For nonlinear regression, methods for model selection based on a large candidate variable space include stepwise logistic regression[30,31] and regression tree analysis which approximates nonlinear relationships;[32,33] still for continuous variable outcomes with nonlinear models, less rigorous methods for model selection have been developed. The number of candidate variables is generally consolidated to a tractable number through expert knowledge or single variable regression, and then various combinations of models are tested until one finds the best model in terms of a validation statistic like R2 or Akaike Information Criterion (AIC).[15,21,24] The advanced geostatistical method of Bayesian Maximum Entropy (BME) has also been shown to successfully estimate groundn class="Chemical">water quality conpan>taminants.[19,34] An advantage of BME is its ability to quantify spatial and temporal variability which is thenpan> used in the estimationpan> process at unpan>monpan>itored locationpan>s. BME, like all geostatistical methods, is data drivenpan> and can onpan>ly provide reliable estimates within the vicinity of measured values. However, BME utilizes Bayesian epistemic knpan>owledge blenpan>ding to combine multiple sources of data, which has beenpan> successfully demonpan>strated with incorporationpan> of deterministic mean trenpan>d funpan>ctionpan>s into BME for grounpan>dn class="Chemical">water.[19] Local spatial and temporal variability have lead previous studies to reduce n class="Gene">NO3– variability with a combinpan>ation of spatial smoothinpan>g and temporal averaginpan>g.[15,35,36] For inpan>stance, n class="Chemical">Nolan and Hitt spatially smoothed NO3– by taking watershed averages over their study time period, based on watersheds with an average size of approximately 2000 square-kilometers. They not only helped elucidate trends and potential explanatory variables, but they were able to explain a large percentage in the variability of spatially smoothed NO3– with a r2 of 0.80 for shallow aquifer NO3– and 0.77 for deep aquifer NO3–. However, this advantage of reducing groundwater NO3– variance is also a limitation because factors affecting spatially smoothed and temporally averaged NO3– might not affect point-level NO3–, and vice versa. Furthermore, since groundwater NO3– contains significant local variability, the need to provide local estimates of its variability naturally follows. Models developed for predicting spatially smoothed and temporally averaged NO3– will likely not be successful in predicting observed, point-level NO3–. The objectives of this study are to (1) develop a novel nonlinear regression model for spatial point-level and time-averaged groundn class="Chemical">water n class="Gene">NO3– concentrations in monitoring and private wells of North Carolina, (2) produce the first space/time estimates of groundwater NO3– concentrations across a large study domain by integrating LUR models into the BME framework, and (3) compare space/time NO3– concentration models to the current standard of spatially averaged NO3– concentration models. Two nonlinear models, whose form is adopted from Nolan and Hitt[15] with components that represent NO3– sources, attenuation, and transport, are created and selected with a new model selection framework for nonlinear regression models with correlated explanatory variables. We then integrate the LUR models into the BME framework to model space/time point-level NO3–. Results are of interest to agencies that regulate drinking water sources or that monitor health outcomes from ingestion of drinking water. Additionally, the results can provide guidance on factors affecting the point-level variability of groundwater NO3– and new resources for more accurate management of NO3– loads.

Materials and Methods

Nitrate Data

n class="Gene">NO3– data across n class="Chemical">North Carolina are obtained from three data sources (Supporting Information (SI) Figure S1), which are detailed as follows: n class="Chemical">North Carolina Division of n class="Chemical">Water Resources (NC-DWR) collects data near select permitted, dedicated wastewater treatment residual (WTR) application fields via monitoring wells. The second source is U.S. Geological Survey (USGS) data obtained through the National Water Information System (NWIS). Well depth information is not linked directly to each monitoring well although a subset of well depth information is available. Based on the subset with depth information, they have a mean depth of 33 feet with a standard deviation of 32 feet. Together, the NCDWR and USGS data represent shallow aquifer monitoring wells (n = 12 322), which hereafter will be referred to as “monitoring well data.” The last data set of groundn class="Chemical">water n class="Gene">NO3– comes from private well data collected by the North Carolina Department of Health and Human Services (NC-DHHS). Groundwater NO3– was obtained and address geocoded using the same process outlined in Messier et al.[19] Well depth information is not linked to water quality measurements, but a separate database on private well construction contains well depths. The mean depth is 95 feet with a standard deviation of 109 ft. This data will hereafter be referred to as “private well data” and this data is assumed to represent a deeper aquifer model of groundwater NO3– (n = 22 067). The median NO3– conpan>cenpan>trationpan>s for the NC-DWR, USGS, and private well data are 1.30, 0.10, and 0.62 mg/L respectively. The means are 4.61, 6.14, and 1.66 mg/L respectively. The percent observed above the detection limit is 79.7, 61.4, and 30.6 respectively. Additional basic statistics for the data set are available in the SI (Table S1).

Spatial and Temporal Observation Scales

In this work we develop models for n class="Gene">NO3– at three observation scales. The finpan>er scale corresponds to the space/time poinpan>t-level n class="Gene">NO3– data, that is, NO3– data as it is sampled. An intermediate observation scale corresponds to the time-averaged data, whereby NO3– at each well is averaged. The time-averaged data provides point-level spatial resolution, but no time variability. Finally, the coarser resolution observation scale corresponds to the spatially smoothed/time-averaged data, which was obtained by spatially smoothing the time-averaged data using a 25 km exponential kernel function. We choose 25 km as it is approximately the average size of watersheds in many NAWQA groundwater studies.[15,37] While previous works over large study domains have developed models for spatially smoothed/time average NO3– data, very few models, if any, have been developed for point-level NO3– data over large study domains. Our work therefore fills that knowledge gap.

Maximum Likelihood Estimation of Nitrate Distributions

Our notation for variables denotes a single random variable Z in capital letter, its realization, z, in lower case; and vectors and matrices in bold faces, for example, = [Z1,...,Z] and Z = [1,...,]. Due to the high percentage of nondetect (left-censored) data in both the monitoring well and private well databases, a maximum likelihood estimation (MLE) is used for the estimation of monitoring well and private well distribution parameters,[38] which is assumed to follow a log-normal distribution. MLE can directly account for the nondetect values by modifying the likelihood equation, with the censored observations given by the cumulative distribution function (n class="Chemical">CDF) evaluated at the detection limit. The MLE equation then becomes[38]where fμ,σ(z) denotes the normal probability distribution funpan>ction (PDF) of log-transformed (natural log) poinpan>t-level n class="Gene">NO3–, z, with mean and standard deviation parameters μ and σ, and F(t) denotes the CDF of the distribution taken at the log of the detection limit t. The estimated distributions are used to quantify the extent of contamination in monitoring and private wells and to handle nondetect data. For the regression analysis, the log-NO3– concentration of a measurement below detection limit t is assigned a value equal to the mean of the normal distribution N(μ,σ) truncated above log(t), whereas the geostatistical analysis can handle the full truncated normal distribution.[19]

Spatial Explanatory Variables

Spatial explanatory variables representing possible groundn class="Chemical">water n class="Gene">NO3– sources, attenuation, and transport factors were constructed prior to model development. Potential variables are summarized below with details available in the SI (Table S2). All of the explanatory variables have an inherent spatial distance parameter such as circular buffer radius or exponential decay range, which hereinafter is referred to as the hyperparameter. Each variable is calculated with multiple hyperparameter values since optimal distance is unknown a priori. In the final model selection process, a maximum of one hyperparameter value is allowed to be selected from each variable to avoid multicollinearity and effectively optimize the hyperparameter. The following variables adopted from n class="Chemical">Nolan and Hitt[15] are n class="Gene">NO3– sources calculated as kg-NO3–/yr/ha within a circular buffer: Sources include farm fertilizer, nonfarm fertilizer, manure, and NO3– atmospheric deposition. Each National Landcover Database (NLCD) category is calculated as a percent within a circular buffer. On-site wastewater treatment plant variables, septic density and average nitrate loading, are created following the methods of Pradhan et al.[39] The following point sources are calculated as the sum of exponentially decaying contribution:[19] Wastewater treatment residual field application sites (WTR), swine confined animal feeding operations (CAFOs), poultry CAFOs, cattle farms, and wastewater treatment plants (WWTP). Mean slope in degrees and topographic wetness index[40] (TWI) are calculated within circular buffers. Water withdrawals in cubic meters per second are calculated using USGS water use estimates.[12] Lastly, population density is calculated within circular buffers from U.S. Census block data assuming an even distribution of population per census block.

Nonlinear Regression Model Selection

In order to develop a LUR model for n class="Gene">NO3– we adopt a similar nonlinpan>ear multivariable model implemented by grounpan>dn class="Chemical">water vulnerability assessment(GWAVA)[15] which is also similar to the surface water counterpart spatially referenced regression on watershed Attributes (SPARROW).[21,23,24] We partition explanatory variables into source, attenuation, and transport terms. Following Nolan and Hitt,[15] the nonlinear multivariable model is constructed as follows:where z is the log-transform of NO3– concentration at point i, β0 is the intercept, Y((λ) is the k-th source predictor variable at point i with hyperparameter value λ, β is its source regression coefficient, Y((λ) is the l-th attenuation predictor variable at point i with hyperparameter value λ, γis its attenuation regression coefficient, Y((λ) is the m-th transport predictor variable with hyperparameter value λ, δ is its transport regression coefficient, and ε is an error term. The model contains an additive, linear submodel for sources, and multiplicative exponential terms for the attenuation and transport variables that act directly on the source terms.[15] For example Y((λ) may be equal to a land cover variable or a point source variable. The attenuation variables,Y(, physically represent areas that are associated with removing NO3– from groundwater such as wetlands and histosol soil. The transport variables, Y((λ)., may be equal to any variable that effects the movement of NO3– in the groundwater such as the soil permeability and average slope. The attenuation variable coefficients, γ, are constrained to be negative allowing them to only decrease NO3– concentrations, while the transport variable coefficients, δ, are unconstrained allowing variables to increase or decrease NO3– concentrations. We developed a nonlinear model regression model selection technique that accommodates variables that differ only by a hyperparameter and can be adapted for various nonlinear model forms. Our model selection procedure is essentially a nonlinear extension of adistance decay regression selection strategy (ADDRESS),[16] since to the authors’ knowledge there is not a regression selection strategy for nonlinear LUR. We implement constrained forward nonlinear regression with hyperparameter optimization (CFn class="Chemical">N-RHO) whose simple algorithm is as follows (SI Figure S2): Initialization: Linear regression on all candidate variables to obtain the initial values for the nonlinear model fitting. Candidate Variables: In the first iteration, the candidate variables consist of the source variables only. In the second iteration, candidate variables consist of attenuation and transport variables only. This is done so as to obtain an initial model with at least one source and one attenuation or transport variable. In every iteration afterward the candidate variables can be any variable. Nonlinpan>ear Regression: Nonlinear regression is performed by adding each candidate variable to the current model one at a time. Note that candidate variables are added according to their predetermined place in the nonlinear model (i.e., Source variables are in a linear submodel; Attenuation and transport in the exponential submodel.). Variable Selection: The variable that results in the highest R-Squared (lowest AIC is also an option) while constrained to maintaining all variables in the model statistically significant (p-value <0.05), is selected and added to the model. R-Squared ties beyond the thousandth decimal place are settled by the lowest p-value. Hyperparameter Optimization: The rest of the candidate variables that differ from the selected variable by only a hyperparameter are removed from the candidate variable pool, effectively optimizing the hyperparameter value. Selection Criteria: The new model must increase R-squared over user-defined selection criteria such as a one percent increase. If the model passes the selection criteria, then the iterative process continues to step 2. If it does not, then the algorithm ends with the final model being the i-n class="Species">th minus onpan>e model since the last variable did not pass the selectionpan> criteria.

BME Estimation Framework for Space/Time Mapping Analysis

To improve estimation accuracy, we integrate the time-averaged LUR results into the bayesian maximum entropy (BME) method of modern spatiotemporal geostatistics.[41,42] BME is a space/time geostatistical estimation framework grounded in epistemic principles that reduces to the space/time simple, ordinary, and universal Kriging methods as its linear limiting case when considering a limited, Gaussian, knowledge base, while also allowing the flexibility to process a wide variety of additional knowledge bases (physical laws, empirical relationships, non-Gaussian distributions, hard and soft data, etc.). We only provide the fundamental BME equations for mapping n class="Gene">NO3–; the reader is referred to other works for more detailed derivations of BME equations[41,43] and LUR integration into BME.[19] Let Z()be the space/time random field (S/n class="Gene">TRF) describinpan>g the distribution of grounpan>dn class="Chemical">water log-NO3– across space and time, where = (,t), is the space coordinate and t is time. The knowledge available is organized in the general knowledge base (G-KB) about the space/time trend and variability (e.g., mean, covariance) of NO3– across the study domain, and the site-specific knowledge base (S-KB) corresponding to the hard and soft data z available at a set of specific space/time points . First, we define the transformation of log-n class="Gene">NO3– data z at locations aswhere o() may be any determinpan>istic offset that can be mathematically calculated at any space/time coordinpan>ate . We then definpan>e X() as a homogeneous/stationary S/n class="Gene">TRF representing the variability and uncertainty with the transformed data , that is, such that is a realization of X(). Finally we let Z() = X() + o() be the S/TRF representing groundwater log-NO3–. In this study, we consider two choices for o(): (1) a constant value determined by the MLE mean resulting in a purely BME model, and (2) the LUR estimate () from CFN-RHO resulting in a LUR-BME model. The G-KB for the S/TRF X() describes its local space/time trenpan>ds and depenpan>denpan>cies. Inpan> this work, the genpan>eral knpan>owledge conpan>sists of the space/time mean trenpan>d funpan>ctionpan> m() = E[X()], and the covariance funpan>ctionpan> C(,′)=E[[X() – m()][X(′) – m(′)]] of the S/TRF X(). The S-KB consists of hard data and soft data; with hard data, = – (), for data points where is observed over the detection limit and soft data, , is at locations where NO3– is observed below the detecti limit. Following Messier et al.,[19] the BME soft data for log-NO3– is modeled as a Gaussian distribution truncated above the log of the detection limit. The overall knowledge bases considered consist of G = {m(), C(,′)}, and S = {f(·), }. In this case the BME set of equations reduces towhere f(x) is the BME posterior PDF for the offset-removed log n class="Gene">NO3–(x) at some unpan>monpan>itored estimationpan> poinpan>t , f(, , x) is the (maximum enpan>tropy) multivariate Gaussian PDF for (, , x) with mean and variance-covariance givenpan> by G-KB, f() is the trunpan>cated Gaussian PDF of , and A–1is a normalizationpan> conpan>stant. After the BME analysis is conpan>ducted, o() is added back to obtainpan> log-n class="Gene">NO3– concentrations.

Validation Statistics

The robustness of CFn class="Chemical">N-RHO is tested with a 10-fold cross-validationpan> procedure. Inpan> 10-fold cross-validationpan> data is randomly partitionpan>ed into 10 equal size subsamples. A single subsample is retained as the validationpan> data for testing the model, and the remaining nine subsamples are used as training data. Each of the 10 subsamples is used exactly onpan>ce as the validationpan> data. Similar variable selectionpan>s (which may differ onpan>ly by hyperparameter) for subsamples demonpan>strate model selectionpan> robustnpan>ess. Models are compared with a leave one-out cross-validation (LOOCV) mean squared error (MSE) and R-squared. Spatially smoothed/time-averaged NO3– and time-averaged NO3– models are also tested on how well they predict at the smaller observation scales. In LOOCV, each log-NO3– value Z is removed one at a time, and re-estimated using the given model based only on the remaining data. Let Z*(be the re- estimate for method k, then MSE( = (1/n)∑(Z*( −Z)2 and the cross-validation R-Squared is R2(,*().

Results

Nitrate Concentrations

The MLE of the statewide monitoring concentrations resulted in a geometric mean and standard deviation of the log-normal distribution of 0.62 and 14 mg/L, respectively (SI Figure S3). MLE for private wells resulted in a geometric mean and standard deviation of 0.45 and 5.1 mg/L (SI Figure S3).

Spatially Smoothed/Time-Averaged Nitrate

The 25 km spatially smoothed/time-averaged NO3– LUR model cross-validationpan> results (Table 1) in a r2 of 0.69 and 0.68 for monpan>itoring and private wells, respectively, which is of similar magnpan>itude to currenpan>t literature.[15] However, as expected, the LUR model calibrated for spatially smoothed/time-averaged NO3– underperforms and does progressively worse (top row, moving left to right in Table 1) as it predicts time-averaged NO3– and point-level NO3– with lower r2 and higher RMSE. The variables selected for this model via CFN-RHO are available in the SI (Table S3).
Table 1

Leave-One-out Cross-Validation Statistics Comparing for Four Estimation Methods That Predict Spatial/Temporally Averaged NO3– Concentrations, Temporal Averaged NO3 Concentrations, And Point-Level Observed NO3– Concentrationsa

  predicted value
  spatially smoothed/time-averaged NO3
time-averaged NO3
point-level NO3
method MW (n = 951)PW (n = 18,664)MW (n = 951)PW (n = 18,664)MW (n = 12,300)PW (n = 22,062)
spatially smoothed/time-averaged LURr20.690.680.270.080.150.08
RMSE0.8950.2932.231.192.401.27
time-averaged LURr2 0.370.090.230.09
RMSE 2.081.192.281.27
space/time BMEr2 0.700.25
RMSE 1.391.23
space/time LUR-BMEr2 0.740.33
RMSE 1.271.08

Note that methods were used to predict at scales more refined or equal to its calibration scale. MW = Monitoring Well model. PW= Private Well model. n = number of observations at that scale. Time averaging results in fewer observations. RMSE = root mean squared error. Units of NO3 concentration = mg/L.

n class="Chemical">Note that methods were used to predict at scales more refinpan>ed or equal to its calibrationpan> scale. MW = Monpan>itorinpan>g Well model. PW= Private Well model. n = number of observationpan>s at that scale. Time averaginpan>g results inpan> fewer observationpan>s. RMSE = root mean squared error. Unpan>its of n class="Gene">NO3 concentration = mg/L. 10-fold cross-validation of spatially smoothed/time-averaged NO3– LUR models was donpan>e to demonpan>strate the stability of CFN-RHO (SI Tables S4, S5). All variables were selected 7 and 10 out of 10 iterations for the monitoring and private well models, respectively.

Time-Averaged Nitrate

The LUR variables selected through CFN-RHO for time-averaged NO3– observed at monitoring wells and private wells are shown in Table 2. The LUR calibrated to predict time-averaged NO3– obtains a r2 of 0.37 and 0.09 for monitoring wells and private wells, respectively (Table 1, second row). Moreover, the LUR model predicts point-level NO3– with a r2 of 0.23 and 0.09 for monitoring and private well, respectively. LUR maps are available in SI Figure S4.
Table 2

Nonlinear Regression Model Variables Selected via CFN-RHO and Parameter Estimates for Time-Averaged NO3– Monitoring (Left) and Private Well (Right) Modelsa

 monitoring well
private well
variablevariable rangecoefficient estimatestandard errorvariable rangecoefficient estimatestandard error
Constantn/a–3.710.191n/a–1.5700.0382
   Source Variables  
manurea250 m0.07590.0317
wastewater treatment residuals (WTR)b5 km0.2450.0274
farm fertilizera250 m0.1320.0193250 m0.04320.0025
swine CAFO’sc2 km0.1170.0218
swine lagoonsb6 km0.10790.0146
developed lowd250 m0.1120.0214
developed (all combined)d100 m0.01127.08e-4
atmospheric depositiona250 m0.4770.12925 km2.94e-112.53e-10
   Attenuation and Transport Variables  
forest (all combined)d2 km–0.00640.00281
deciduous forestd4 km–0.01510.00127
herbaceous wetlandsd5 km–0.5310.079
histosold25 km–0.04270.011125 km–0.1060.0126
hydrologic soil group dd500 m–0.0120.0010
slopee25 km–0.0740.0261

All variables are significant with p-value < 0.025. Variables units: a, kg-NO3–/yr/ha; b, dimensionless; c- 100 pigs; d, percent; e, degrees; (−) not a variable in the model.

All variables are significant with p-value < 0.025. Variables units: a, kg-n class="Gene">NO3–/yr/ha; b, dimenpan>sionpan>less; c- 100 n class="Species">pigs; d, percent; e, degrees; (−) not a variable in the model. 10-fold cross-validation of time-averaged NO3– LUR models was conpan>ducted (SI Table S6, S7). All variables selected from the monpan>itoring well model are selected in at least six iterationpan>s of the 10-fold cross-validationpan> runpan>s. The majority of variables in the private well model were also stable; however swine lagoons and deciduous forest were only selected 2 and 0 out of 10 times. In both models, when a variable is not selected in the 10-fold cross validation it is likely due to other variables that capture similar source, attenuation, or transport processes (i.e., Forest instead of Deciduous, Swine CAFO’s instead of Swine Lagoons).

Point-Level Nitrate

We modeled the space/time covariance of the LUR offset removed log-NO3– S/TRF, X(), using a two-component, space/time nonseparable, exponential covariance model following Messier et al:[19]where c1 = 0.67 (log – mg/L)2, a = 93m, aτ = 15 days, c2 = 3.6 (log-mg/L)2, a = 1750m, aτ = 15840 days for monitoring wells (SI Figure S5) and a one-component, space/time exponential covariance model for private well where c1 = 0.76 (log – mg/L)2, a = 1181m,aτ = 8640 days (SI Figure S6). The LUR-BME model, which integrates the time-averaged LUR as the offset best predicts space/time point-level n class="Gene">NO3– concentrations with a r2 of 0.74 and 0.33 (Table 1) for monitorinpan>g and private wells, respectively. However, the LUR-BME predictions have a large variance at locations farther than the covariance model spatial range. Figure 1 maps the poinpan>t-level n class="Gene">NO3– concentrations estimated by LUR-BME for 1 day during the study period for both monitoring and private well models. These are the first results to show that there is a 4-fold improvement in predicting point-level NO3– when the LUR-BME method is used in comparison to previous studies that use models for spatially smoothed/time-averaged NO3–, and five percent improvement in r2when integrating a LUR model into the BME framework, over purely BME. A link to a movie of LUR-BME maps is available in the SI.
Figure 1

Comparison of LUR-BME results between the monitoring well (left of gray bar) model and private well (right of gray bar) model NO3– concentrations. The extent rectangles shows zoomed in portions of the state and are identical areas for both models. Extent (B) shows geometric mean predictions and then geometric standard deviation.

Comparison of LUR-BME results between the monitoring well (left of gray bar) model and private well (right of gray bar) model n class="Gene">NO3– conpan>cenpan>trationpan>s. The extenpan>t rectangles shows zoomed in portionpan>s of the state and are idenpan>tical areas for both models. Extenpan>t (B) shows geometric mean predictionpan>s and thenpan> geometric standard deviationpan>.

Discussion

Groundwater Nitrate Maps

This study presents a LUR model for point-level n class="Gene">NO3– inpan> n class="Chemical">North Carolina that elucidates processes affecting its local variability, and then utilizes the strengths of BME to create the first LUR-BME model of groundwater nitrate’s spatial/temporal distribution including prediction uncertainty. The first major finding is the LUR-BME model for monitoring wells, assumed to represent surficial aquifers, (Figure 1, SI Movie S1) shows groundwater NO3– that is highly variable with many areas predicted above the current standard of 10 mg/L. Contrarily, the private well results (Figure 1) depict widespread, low-level n class="Gene">NO3– concentrations, which is consistent with the current physical unpan>derstandinpan>g inpan> which sources tend to pollute the surficial aquifer, but then transport over time to the deeper drinpan>kinpan>g-n class="Chemical">water supply aquifers where concentrations are lower. This finding is significant because of the studies demonstrating potential significant health effects at concentrations as low as 2.5 mg/L.[4−7] Additionally, concentrations of NO3– could impact ecological function since there are potential large reserves in deeper aquifers that can discharge to surface waters.[27] The standard deviation maps (Figure 1) demonstrate the importance of NC-DWR and USGS monitoring wells and private well testing because areas within the spatial covariance range are well characterized, whereas those outside are less reliable. The second major finding is the LUR-BME maps (Figure 1) show that groundn class="Chemical">water n class="Gene">NO3– in monitoring wells is elevated in the southeastern plains of North Carolina (SI Figure S7) due to the larger amount of NO3– sources and the lack of subsurface attenuation factors (SI Movie S2) that are present in the coastal plain region. This corroborates the findings of Nolan and Hitt,[15] which also show spatially smoothed/time-averaged NO3– to be the highest in the southeastern plains of North Carolina. This expands that finding with point-level results showing significant point-level variability within regional trends. Additional concerns arise since groundwater flow of the southeastern plains contributes significantly to surface water flow.[27] Our LUR-BME model can be used with surface water models to quantify the effect of groundwater NO3– contributing to surface water contamination. The use of the methods in this study provide estimates at a finer resolution and down to smaller n class="Gene">NO3– values than Nolan and Hitt,[15] resulting in new findings. Nolan and Hitt[15] generally show greater concentrations than the LUR-BME model potentially due to their model using significantly less training data and averaging NO3– over watersheds. Our LUR-BME models benefit from the large amount of monitoring (n = 12 322) and private well (n = 22 067) data, whereas they used 2306 and 2490 across the U.S. for their shallow and drinking water models, respectively. LUR-BME benefits from the exactitude property of BME, thus our model results are in 100% agreement at monitoring locations. Contrarily, when our observed data is compared with Nolan and Hitt[15] by groupinpan>g results accordinpan>g to the binpan>s of Figure 1, Nolan and Hitt[15] overpredicts 48% and 59% of the time for monitoring and private wells, respectively (SI Figure S8,S9). As a result of the finer resolution of our maps and their improved ability to predict low level NO3–, our results lead to a significant new finding about the extent of areas with low level contamination. Our results show private well concentrations are greater than 0.25 mg/L while monitoring well concentrations are less than 0.25 mg/L in 30.6% of North Carolina’s area, compared to 2.6% for Nolan and Hitt[15] (SI Table S8,S9). Likewise, our results show monitoring and private wells are both above or below 0.25 mg/L at the same location in 68% of North Carolina, compared to 91% for Nolan and Hitt.[15] Hence whereas Nolan and Hitt[15] results suggest the geographical extent of the low level contamination of drinking water aquifer is limited to that of the shallow aquifer, which is consistent with downward transport of NO3– contamination, our LUR-BME models shows that in fact the geographical extent of the contamination of the drinking water extends over a much larger area than that of the shallow aquifer. This major new finding provides new evidence indicating that in addition to downward transport, there is also a significant outward transport of groundwater NO3– in the drinking water aquifer to areas outside the range of sources. This is especially significant because it indicates that the deeper aquifers are acting as a reservoir that is not only deeper, but also wider than the reservoir formed by the shallow aquifers.

LUR Variable Interpretations

Variables selected through CFn class="Chemical">N-RHO show processes inpan>fluencinpan>g monitorinpan>g well and private well n class="Gene">NO3– concentrations. Interpretations of regression sources parameters are based on the nonlinear model formulation: Since NO3– was log-transformed and the nonlinear model has multiplicative interaction, the percent increase of the geometric mean of NO3– is the exponential of the source coefficient multiplied by the result of the attenuation and transport terms held to their mean value. For instance, in the monitoring well model, the percent increase in the geometric mean of NO3– in mg/L for every 1 kg/yr/ha of farm fertilizer is exp(0.132 × 0.456) = 1.06 = 5% where 0.456 is the exponential of the mean attenuation and transport variables multiplied by their coefficients. For the private well model, the percent increase in the geometric mean of NO3– for every 1 kg/yr/ha of farm fertilizer is exp(0.0432 × 0.4636) = 1.02 = 2%. Every other source coefficient interpretation for time-averaged NO3– is provided in the SI. Comparing variables selected between the spatially smoothed/time-averaged n class="Gene">NO3– LUR and the time-averaged n class="Gene">NO3– LUR help elucidate effects the spatial scale has on groundwater NO3– concentrations. The variable hyperparameters selected by CFN-RHO help elucidate potential scales at which the variables affect groundwater NO3– concentrations. For example, the short buffer range of developed low likely captures the small size of single-family housing yards and their associated fertilizer applications. The monitoring well model WTR has an exponential decay range of 5 km. A possible explanation of this medium range is due to the volatization of NO3– into the air, which can then be transported over longer distances than subsurface transport mechanisms alone. Long buffer ranges for attenuation and transport variables such as percent histosol soil and mean slope represent variables with larger, regional scale effects. The third major finding is that both wasten class="Chemical">water treatment residuals (WTR) and n class="Species">swine CAFOs were selected as local sources of groundwater NO3– contamination, which to our knowledge have not yet been previously identified as sources in multivariable models that included regional sources. To help aide state-wide policy decisions concerning regional versus local sources, Figure 2 shows the elasticity of LUR predicted sources in monitoring wells, or the percent change in the geometric mean of groundwater NO3– within an area in response to the percent decrease in a LUR model source given all other sources remain at current levels. Farm fertilizer and atmospheric deposition result in the greatest decrease in groundwater NO3– state-wide (Figure 2A). Reducing WTR (Figure 2B) and swine CAFOs (Figure 2C) within 1 km of the source leads to significant reductions in groundwater NO3– in the local area surrounding the sources, demonstrating the importance of sources on local area NO3– variability.
Figure 2

Elasticity curves for monitoring well sources. Y-axis is the percent decrease in a source and the X-axis is the percent decrease in geometric mean, for (A) state-wide, (B) within 1 km of wastewater treatment residuals, and (C) within 1 km of swine CAFO’s.

Elasticity curves for monitoring well sources. Y-axis is the percent decrease in a source and the X-axis is the percent decrease in geometric mean, for (A) state-wide, (B) within 1 km of wasten class="Chemical">water treatmenpan>t residuals, and (C) within 1 km of n class="Species">swine CAFO’s.

Recommendations and Limitations

This work represents the first step in the development of modeling observed n class="Gene">NO3– over large domainpan>s without averaginpan>g. In previous studies, spatial averaginpan>g is utilized because it provides results at the domainpan> (state, regional, or national) desired for policy makinpan>g decisions and sheds light on processes inpan>fluencinpan>g grounpan>dn class="Chemical">water NO3–. We demonstrated that a LUR at the point-level in space is currently limited in terms of model predictive capability but when integrated into the BME framework, the improved model can estimate within the spatial covariance range similar to LUR models for spatially smoothed/time-averaged groundwater NO3– concentrations. Potential explanatory variables that can explain the remaining variability in the point-level LUR will need primary data collection. For instance, we found WTR to be a significant variable even though we just used location of fields. If records of timing and amounts of WTR applications were improved, then the temporal variability in monitoring wells near WTR application fields could be improved.[44] Similarly, a parcel-level query of farm fertilizer application practices could distinguish farms that use NO3– fertilizers efficiently versus farms that apply excessively or with poor timing. For private wells, the short spatial autocorrelation range may be due to differences in effectiveness of on-site wastewater treatment systems or residential fertilizer use. Additionally, we note that candidate variables not selected via CFN-RHO does not necessarily indicate they have no effect on groundwater NO3– concentrations in surficial or confined drinking-water aquifers of North Carolina. Many factors both statistically and physically can affect the selection such as correlation between candidate variables and local hydrogeology conditions being overwhelmed by larger scale trends. This study lacked well depth for the majority of monitoring and private wells. The monitoring and private well models clearly demonstrate a difference in concentrations based on depth, so well depth could quantify this more explicitly as opposed to categorically as done by this study. Furthermore, pumping rate information was not available for the private well data set thus the effect of local pumping could not be quantified. The USGS water use report[12] has information on domestic-use water withdrawals; however, it is at the county-scale, based on county populations, and cannot be down-scaled like the agricultural water withdrawals variable, thus it was not included as a candidate variable. Additionally, the detection limit of 1 mg/L for the private well data is high and lowering that detection limit would improve the ability of the model to delineate areas with low level contamination that may act as reservoir to surface water NO3– recharge. The high detection limit is also potentially responsible for the lower r2in the private well LUR model for time-averaged nitrate because it results in a low dependent variable variance. Predictions of the private well LUR model for time-averaged nitrate are likely biased toward the detection limit; however, the LUR-BME model for private well models likely avoids this bias due to the exactitude property along with the good spatial coverage of private well data across North Carolina. Moreover, greater uncertainty in attenuation processes in deeper aquifers is likely contributing to the lower r2. In conclusion, a LUR model with a novel model selection procedure can elucidate important predictors of point-level groundn class="Chemical">water n class="Gene">NO3– in North Carolina monitoring and private wells. The methods are translatable to other study areas in the United States. LUR-BME models can be used to predict spatial/temporal varying groundwater NO3– and provide uncertainty assessments. Further research should integrate groundwater NO3– results into surface water models to determine the extent of groundwater’s contribution to surface water contamination. Lastly, results will be useful in identifying localities of elevated NO3– for increased monitoring.
  19 in total

Review 1.  More than obvious: better methods for interpreting nondetect data.

Authors:  Dennis R Helsel
Journal:  Environ Sci Technol       Date:  2005-10-15       Impact factor: 9.028

2.  A distance-decay variable selection strategy for land use regression modeling of ambient air pollution exposures.

Authors:  J G Su; M Jerrett; B Beckerman
Journal:  Sci Total Environ       Date:  2009-03-21       Impact factor: 7.963

3.  Drinking water nitrate and the risk of non-Hodgkin's lymphoma.

Authors:  M H Ward; S D Mark; K P Cantor; D D Weisenburger; A Correa-Villaseñor; S H Zahm
Journal:  Epidemiology       Date:  1996-09       Impact factor: 4.822

4.  Chlorinated solvents in groundwater of the United States.

Authors:  Michael J Moran; John S Zogorski; Paul J Squillace
Journal:  Environ Sci Technol       Date:  2007-01-01       Impact factor: 9.028

5.  Predicting groundwater nitrate concentrations in a region of mixed agricultural land use: a comparison of three approaches.

Authors:  C D McLay; R Dragten; G Sparling; N Selvarajah
Journal:  Environ Pollut       Date:  2001       Impact factor: 8.071

6.  Spatial modeling for groundwater arsenic levels in North Carolina.

Authors:  Dohyeong Kim; Marie Lynn Miranda; Joshua Tootoo; Phil Bradley; Alan E Gelfand
Journal:  Environ Sci Technol       Date:  2011-04-29       Impact factor: 9.028

7.  Nitrate in public water supplies and the risk of colon and rectum cancers.

Authors:  Anneclaire J De Roos; Mary H Ward; Charles F Lynch; Kenneth P Cantor
Journal:  Epidemiology       Date:  2003-11       Impact factor: 4.822

8.  Groundwater arsenic contamination throughout China.

Authors:  Luis Rodríguez-Lado; Guifan Sun; Michael Berg; Qiang Zhang; Hanbin Xue; Quanmei Zheng; C Annette Johnson
Journal:  Science       Date:  2013-08-23       Impact factor: 47.728

9.  Estimating water supply arsenic levels in the New England Bladder Cancer Study.

Authors:  John R Nuckols; Laura E Beane Freeman; Jay H Lubin; Matthew S Airola; Dalsu Baris; Joseph D Ayotte; Anne Taylor; Chris Paulu; Margaret R Karagas; Joanne Colt; Mary H Ward; An-Tsun Huang; William Bress; Sai Cherala; Debra T Silverman; Kenneth P Cantor
Journal:  Environ Health Perspect       Date:  2011-03-21       Impact factor: 9.031

10.  Workgroup report: Drinking-water nitrate and health--recent findings and research needs.

Authors:  Mary H Ward; Theo M deKok; Patrick Levallois; Jean Brender; Gabriel Gulis; Bernard T Nolan; James VanDerslice
Journal:  Environ Health Perspect       Date:  2005-11       Impact factor: 9.031

View more
  7 in total

1.  Assessing the relationship between groundwater nitrate and animal feeding operations in Iowa (USA).

Authors:  Keith W Zirkle; Bernard T Nolan; Rena R Jones; Peter J Weyer; Mary H Ward; David C Wheeler
Journal:  Sci Total Environ       Date:  2016-06-06       Impact factor: 7.963

2.  Integrating remote sensing with nutrient management plans to calculate nitrogen parameters for swine CAFOs at the sprayfield and sub-watershed scales.

Authors:  Elizabeth C Christenson; Marc L Serre
Journal:  Sci Total Environ       Date:  2016-12-22       Impact factor: 7.963

3.  Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide.

Authors:  Kyle P Messier; Matthias Katzfuss
Journal:  Ann Appl Stat       Date:  2021-07-12       Impact factor: 2.083

4.  Seasonal Variation of Water Quality in Unregulated Domestic Wells.

Authors:  Yoshira Ornelas Van Horne; Jennifer Parks; Thien Tran; Leif Abrell; Kelly A Reynolds; Paloma I Beamer
Journal:  Int J Environ Res Public Health       Date:  2019-05-05       Impact factor: 3.390

5.  Sources and Risk Factors for Nitrate and Microbial Contamination of Private Household Wells in the Fractured Dolomite Aquifer of Northeastern Wisconsin.

Authors:  Mark A Borchardt; Joel P Stokdyk; Burney A Kieke; Maureen A Muldoon; Susan K Spencer; Aaron D Firnstahl; Davina E Bonness; Randall J Hunt; Tucker R Burch
Journal:  Environ Health Perspect       Date:  2021-06-23       Impact factor: 9.031

6.  Predicting nonpoint stormwater runoff quality from land use.

Authors:  Brik R Zivkovich; David C Mays
Journal:  PLoS One       Date:  2018-05-09       Impact factor: 3.240

Review 7.  Drinking Water Nitrate and Human Health: An Updated Review.

Authors:  Mary H Ward; Rena R Jones; Jean D Brender; Theo M de Kok; Peter J Weyer; Bernard T Nolan; Cristina M Villanueva; Simone G van Breda
Journal:  Int J Environ Res Public Health       Date:  2018-07-23       Impact factor: 3.390

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.