| Literature DB >> 36030259 |
Emma Krasovich1, Peiley Lau2, Jeanette Tseng3, Julia Longmate4, Kendon Bell3,5,6, Solomon Hsiang3,7,8.
Abstract
Water quality monitoring can inform policies that address pollution; however, inconsistent measurement and reporting practices render many observations incomparable across bodies of water, thereby impeding efforts to characterize spatial patterns and long-term trends in pollution. Here, we harmonized 9.2 million publicly available monitor readings from 226 distinct water monitoring authorities spanning the entirety of the Mississippi/Atchafalaya River Basin (MARB) in the United States. We created the Standardized Nitrogen and Phosphorus Dataset (SNAPD), a novel dataset of 4.8 million standardized observations for nitrogen- and phosphorus-containing compounds from 107 thousand sites during 1980-2018. To the best of our knowledge, this dataset represents the largest record of these pollutants in a single river network where measurements can be compared across time and space. We addressed numerous well-documented issues associated with the reporting and interpretation of these water quality data, heretofore unaddressed at this scale, and our approach to water quality data processing can be applied to other nutrient compounds and regions.Entities:
Year: 2022 PMID: 36030259 PMCID: PMC9420138 DOI: 10.1038/s41597-022-01650-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Mississippi/Atchafalaya River Basin and river network[26,27].
Summary of the raw data from the WQP for the 31 selected nutrient compounds associated with NPS pollution within the MARB between 1980–2018.
| Nutrient basis | Nutrient name | # of sites | # of samples |
|---|---|---|---|
| Nitrogen-based nutrient compounds | Ammonia* | 25,623 | 390,088 |
| Ammonia as NH3 | 6,078 | 57,319 | |
| Ammonia N* | 30,620 | 634,013 | |
| Ammonia N as N | 4,907 | 135,833 | |
| Ammonium as NH4 | 314 | 1,688 | |
| Kjeldahl N | 70,670 | 1,526,988 | |
| Nitrate* | 48,300 | 793,219 | |
| Nitrate as N | 4,825 | 51,778 | |
| Nitrogen | 12,446 | 248,357 | |
| Nitrogen, mixed forms | 21,813 | 412,803 | |
| Nitrogen nutrient | 4,001 | 75,226 | |
| Organic Nitrogen | 22,624 | 307,805 | |
| Total Ammonia* | 30,157 | 698,963 | |
| Total Kjeldahl N | 790 | 5,142 | |
| Total Kjeldahl N (Organic N plus Nitrate) | 3,196 | 19,864 | |
| Total Nitrogen, mixed forms | 3,020 | 55,133 | |
| Phosphorus-based nutrient compounds | Organic Phosphorus | 1,088 | 12,419 |
| Organic Phosphorus, particulate | 1 | 1 | |
| Orthophosphate* | 52,421 | 1,238,827 | |
| Orthophosphate as P | 2,132 | 63,614 | |
| Orthophosphate as PO4 | 955 | 7,853 | |
| Phosphate* | 24,378 | 479,954 | |
| Phosphate as P | 7,692 | 104,570 | |
| Phosphate as PO4 | 137 | 1,300 | |
| Phosphate Phosphorus* | 24,372 | 514,467 | |
| Phosphate Phosphorus as P | 7,692 | 104,567 | |
| Phosphate Phosphorus as PO4 | 129 | 972 | |
| Phosphorus* | 61,091 | 1,193,311 | |
| Phosphorus, hydrolyzable* | 34 | 764 | |
| Soluble Reactive Phosphorus* | 147 | 4,241 | |
| Total Phosphorus, mixed forms* | 7,342 | 76,842 | |
Nutrients with an unknown chemical form based on their WQP name are indicated with a *.
Fig. 2Water quality observation, from sampling to results.
Summary of our data harmonization process to produce the final harmonized dataset, SNAPD.
| Harmonization step | Details | Observations affected | |
|---|---|---|---|
| Step 0: | Pre-harmonization | Raw data | 9,217,921 |
| Step 1: | Organization name | Standardized organization names in instances where there were varied spellings. | 568,644 |
| Step 2: | Unique water monitoring sites | Flagged or combined coordinates and Monitoring Location Identifiers (MLIs) where possible such that each water monitoring site was defined as the unique combination of a MLI and coordinate pair. | 54,478 (multiple coordinates) |
| 965,724 (multiple MLIs) | |||
| Step 3: | Medium | If the sample was taken in any medium besides water, dropped. | 163,356 |
| Step 4: | Date | If an observation was missing a date, dropped. | 1,640 |
| Step 5: | Chemical form | If the chemical form of the observation could not be determined, dropped. | 1,026,757 |
| Step 6: | Concentration value | If the concentration value was negative, nonsensical (e.g., text instead of a number), or missing and the observation was not indicated to be a non-detect, dropped. | 194,579 |
| Step 7: | Concentration units | If concentration units were missing or if they could not be converted to mg/L, dropped. | 20,222 |
| Step 8: | Detection Text/codes | If the detection code/text indicated that concentration was not detected due to contamination or other quality control reasons, dropped. | 39,868 |
| Step 9: | Sample fraction | If sample fraction was ambiguous or missing, dropped. | 340,239 |
| Step 10: | Activity type | If the activity type indicated that the sample was part of a quality control check, dropped. | 384,273 |
| Step 11: | Result type | If the result type indicated that the concentration value was estimated, dropped. | 130,054 |
| Step 12: | Conversions | Converted nutrients to elemental form ( | all |
| Step 13: | Nutrient renaming | Renamed nutrients to incorporate their sample fraction (e.g., | all |
| Step 14: | Detection limit approximation | If a detection limit was not provided for a non-detect observation in the raw data, approximated the detection limit (see section on Non-detects, detection codes, and detection limits). | 68,533 |
| Step 15: | Non-detect handling | If an observation was indicated as non-detected, imputed concentration value using detection limits (see section on Imputing concentration for non-detects). | 1,241,315 |
| If a nutrient-site-year had 80% or more non-detected observations, flagged observations and left concentration as N/A. | 612,918 | ||
| Step 16: | Outlier flagging | If a given nutrient’s concentration value was above the 99th or below the 1st percentile, flagged as a potential outlier. | 131,021 |
| Step 17: | Duplicates | If there were duplicates from multiple concentrations reported for the same site, nutrient, sample fraction, detection status, and date, averaged concentration and indicated the number of observations in the daily average. Note that this also includes time duplicates (see section on Duplicates). | 3,191,771 |
| If there were duplicates due to differently named organizations reporting the same record, chose one organization and assigned to duplicate records. | 142,952 | ||
| If there were duplicates due to a site measuring both detected and non-detected concentrations on the same date for the same nutrient, averaged concentration and flagged that the average includes an imputed value. | 134,848 | ||
| Step 18: | Nutrients and sample fraction combination | If nutrient sample fractions could be combined to create a more common nutrient (e.g., total phosphorus vs. particulate phosphorus), combined observations where possible (see section on Combining nutrients and sample fractions). | 352 (added as new observations) |
| Step 19: | Data quality | For a given sample, if the filtered nutrient concentration was greater than or equal to the unfiltered nutrient concentration, dropped. | 100,050 |
The number of observations affected by each harmonization step is indicated. Observations may be counted more than once as there may have been more than one harmonization step that affected a given record.
Conversion factors from molecular to elemental form for all nutrients in our sample requiring conversion[17].
| NUTRIENT NAME | REPORTED MOLECULAR FORM | MULTIPLY BY | DESIRED ELEMENTAL FORM |
|---|---|---|---|
| as NH3 | 0.822 | as N | |
| as NH4 | 0.776 | as N | |
| as NO3 | 0.225 | as N | |
| as PO4 | 0.326 | as P |
Types of duplicate data in our dataset and the corresponding action taken to harmonize these types of duplicates.
| TYPE OF DUPLICATE DATA | HARMONIZATION ACTION |
|---|---|
| Averaged concentration results to be at the daily level for a given site, nutrient, sample fraction, and date, regardless of whether an observation had a timestamp. | |
| Combined observations under one organization name. | |
| Averaged observed and imputed non-detect concentrations to be at the daily level for a given site, nutrient, sample fraction, and date. Created a flag to indicate when imputed non-detect and observed concentration values were averaged. |
Summary of the final dataset, SNAPD (Standardized Nitrogen and Phosphorus Dataset).
| Nutrient basis | Nutrient name | # of sites | # of samples |
|---|---|---|---|
| Nitrogen-based nutrient compounds | Ammonia (filtered) | 26,060 | 275,829 |
| Ammonia (inorganic) | 169 | 423 | |
| Ammonia (particulate) | 67 | 176 | |
| Ammonia (unfiltered) | 49,739 | 673,734 | |
| Inorganic Nitrogen | 98 | 1,556 | |
| Nitrate (filtered) | 20,388 | 161,368 | |
| Nitrate (inorganic) | 491 | 817 | |
| Nitrate (particulate) | 221 | 329 | |
| Nitrate (unfiltered) | 15,215 | 124,673 | |
| Organic Nitrogen (filtered) | 10,978 | 54,778 | |
| Organic Nitrogen (particulate) | 29 | 62 | |
| Organic Nitrogen (unfiltered) | 11,964 | 164,903 | |
| Kjeldahl Nitrogen (filtered) | 10,230 | 53,300 | |
| Kjeldahl Nitrogen (inorganic) | 16 | 86 | |
| Kjeldahl Nitrogen (particulate) | 561 | 5,019 | |
| Kjeldahl Nitrogen (unfiltered) | 55,710 | 874,093 | |
| Total Nitrogen (filtered) | 11,970 | 61,205 | |
| Total Nitrogen (particulate) | 776 | 11,990 | |
| Total Nitrogen (unfiltered) | 21,966 | 356,456 | |
| Phosphorus-based nutrient compounds | Organic Phosphorus (filtered) | 159 | 1,454 |
| Organic Phosphorus (particulate) | 10 | 13 | |
| Organic Phosphorus (unfiltered) | 946 | 5,282 | |
| Orthophosphate (filtered) | 34,373 | 451,088 | |
| Orthophosphate (inorganic) | 170 | 1,537 | |
| Orthophosphate (organic) | 17 | 38 | |
| Orthophosphate (particulate) | 287 | 656 | |
| Orthophosphate (unfiltered) | 30,745 | 395,815 | |
| Total Phosphorus (filtered) | 15,481 | 164,242 | |
| Total Phosphorus (inorganic) | 42 | 111 | |
| Total Phosphorus (particulate) | 203 | 489 | |
| Total Phosphorus (unfiltered) | 61,738 | 989,986 | |
Fig. 3Spatial coverage of SNAPD in the MARB[26,27].
Fig. 4Distribution of the pre-harmonized to harmonized water quality concentration data for all water monitoring sites that measure TN and TP in our retrieved data. The harmonized TN distribution plotted here includes water quality observations that were previously labelled as nitrogen, nitrogen mixed forms, or total nitrogen and are now classified as TN based on our methods. Similarly, water quality observations previously labelled as phosphorus, phosphorus mixed forms, or total phosphorus are now classified as TP based on our methods.
Fig. 5Distribution of the pre-harmonized to harmonized data for selected water monitoring organizations measuring (a) total nitrogen and (b) total phosphorus. For display, we selected organizations in which our harmonization process impacted both the number of observations and the distribution. We included all raw measurements for either TN or TP that may be harmonized using their metadata. The pre-harmonized distributions included observations measuring total phosphorus, total phosphorus mixed forms, and phosphorus for phosphorus-based nutrients; and nitrogen, nitrogen mixed forms, and total nitrogen mixed forms for nitrogen-based nutrients. Distributions for the harmonized data had fewer observations than those for the pre-harmonized observations because we dropped observations if they could not be harmonized based on metadata.
Fig. 6Sankey plots demonstrating the data harmonization process for concentration unit metadata for all nitrogen and phosphorus compounds in our sample. For visualization purposes, we combined concentration units with 50,000 observations or fewer into an “other” category. (a) Nitrogen compounds unit harmonization. For nitrogen compounds, the other category includes the following concentration units: #/100 ml, %, % by vol, % by wt, % recovery, cm3/g @stp, cm3/g stp, g/kg, g/m2, mg N/l, mg/g, mg/kg, mg/kg as N, mg/m2 NH4, mgd, MPN, MPN/100 ml, none, NTU, pci/l, ppb, ppm, ueq/l, ug/kg, ug/l, ug/l as N, and umol/l. (b) Phosphorus compounds unit harmonization. For phosphorus compounds, the other category includes the following concentration units: #/100 ml, %, cfu/100 ml, g/kg, g/m2, lb/day, mg/g, mg/kg, mg/kg as P, mg/kg PO4, ml/l, mV, none, ppb, ppm, ug/l, and ug/l as P.
| Measurement(s) | Nitrogen Compound • phosphorus compound |
| Technology Type(s) | water monitors |
| Sample Characteristic - Environment | water body • watershed • amount of nitrogen atom in water • water pollution • pollution monitoring • amount of phosphorus in water |
| Sample Characteristic - Location | river, contiguous United States of America • Mississippi/Atchafalaya River Basin |