Literature DB >> 34219820

Budyko-Based Long-Term Water and Energy Balance Closure in Global Watersheds From Earth Observations.

Akash Koppa^1,2, Sarfaraz Alam¹, Diego G Miralles², Mekonnen Gebremichael¹.

Abstract

Earth observations offer potential pathways for accurately closing the water and energy balance of watersheds, a fundamental challenge in hydrology. However, previous attempts based on purely satellite-based estimates have focused on closing the water and energy balances separately. They are hindered by the lack of estimates of key components, such as runoff. Here, we posit a novel approach based on Budyko's water and energy balance constraints. The approach is applied to quantify the degree of long-term closure at the watershed scale, as well as its associated uncertainties, using an ensemble of global satellite data sets. We find large spatial variability across aridity, elevation, and other environmental gradients. Specifically, we find a positive correlation between elevation and closure uncertainty, as derived from the Budyko approach. In mountainous watersheds the uncertainty in closure is 3.9 ± 0.7 (dimensionless). Our results show that uncertainties in terrestrial evaporation contribute twice as much as precipitation uncertainties to errors in the closure of water and energy balance. Moreover, our results highlight the need for improving satellite-based precipitation and evaporation data in humid temperate forests, where the closure error in the Budyko space is as high as 1.1 ± 0.3, compared to only 0.2 ± 0.03 in tropical forests. Comparing the results with land surface model-based data sets driven by in situ precipitation, we find that Earth observation-based data sets perform better in regions where precipitation gauges are sparse. These findings have implications for improving the understanding of global hydrology and regional water management and can guide the development of satellite remote sensing-based data sets and Earth system models.

Entities: Chemical Disease Mutation Species

Keywords: Budyko hypothesis; energy Balance; evapotranspiration; precipitation; remote sensing; water balance

Year: 2021 PMID： 34219820 PMCID： PMC8244049 DOI： 10.1029/2020WR028658

Source DB: PubMed Journal: Water Resour Res ISSN： 0043-1397 Impact factor: 5.240

Introduction

The terrestrial water and energy cycles are tightly coupled at various spatial and temporal scales; these links can be direct, due to the use and release of energy in water phase changes, as well as indirect, due to multiple feedbacks. At the watershed scale, the transfer of heat and moisture modulates the state of the atmosphere, soil surface, subsurface, and biosphere. For example, positive and negative soil moisture‐precipitation feedbacks result from the interplay between the energy available to evaporate water and soil moisture, together with the response of the atmospheric boundary layer to the resulting latent and sensible heat fluxes (Eltahir, 1998; Guillod et al., 2015; Taylor et al., 2012). Recent studies have shown that the latent heat flux (i.e., terrestrial evaporation, hereafter “ET”) is strongly coupled, not just to surface soil moisture, but also vegetation physiology (Mengis et al., 2015) and groundwater storage (Fan et al., 2017; Maxwell & Condon, 2016). An important step in improving the understanding of land‐atmosphere coupling at the watershed scale is to investigate the closure of the water and energy balances. In fact, in hydrology, the closure problem using observations is well recognized as a longstanding scientific challenge (Beven, 2006). Although Earth system models close the water and energy balances by design (with the use of continuity equations), the increased complexity of these models over the recent decades has not reflected on a commensurate increase in our understanding of watershed‐level processes. Beven et al. (2020) attribute this discrepancy to the significant uncertainties in hydrological observations, represented in their inability to close the water balance without considerable errors in the different fluxes. Meanwhile, several studies have highlighted the challenges of accurately closing the water and energy balance of watersheds using observations from multiple sources (Aires, 2014; Beven, 2019; Pan et al., 2012; Pellet et al., 2019; Sahoo et al., 2011). Scrutinizing this closure at regional and global scales is deemed crucial to identify gaps in our understanding of watershed‐scale processes and to reveal systematic errors in the state‐of‐the‐art observations (Hegerl et al., 2015). Thus, improving the accuracy of closure from observations can improve the prediction of regional water resources, impact water management, and enable a better adaptation to hydroclimatic extremes. In addition, accurate closure can help constrain past simulations and future projections of the hydrologic cycle by land surface models (Greve, Orlowsky, et al., 2014), global climate models (GCMs) (Allen & Ingram, 2002; Prein & Pendergrass, 2019) and hydrologic models (Greve, Burek, & Wada, 2020). Due to this importance, different studies have already attempted to assess this closure, but often disregarding the strong links between water and energy cycles: the general approach so far has been to investigate water (Sheffield, Ferguson, et al., 2009) and energy (Heusinkveld et al., 2004) balances separately. Although ground‐based instruments can be used to measure all the water and energy balance components, they are inadequate for assessing the closure at watershed scales because of upscaling issues inherent to their localized and sparse nature (Blöschl & Sivapalan, 1995). As an alternative, Earth observation satellite‐based remote sensing provides spatially and temporally continuous estimates of key water and energy balance components (Lettenmaier et al., 2015). The use of multiple satellites enables continuous coverage across the entire globe, including regions that are ungauged by in situ measuring instruments. The spatial resolution of existing satellite‐based estimates of water and energy balance components varies from 10–100 km, which appears adequate for closure studies of large watersheds without requiring any downscaling. However, using satellite remote sensing to assess the closure of the water and energy balance through traditional budget equations is not straightforward. A primary challenge is the lack of accurate data on watershed storage changes, ground heat flux, and runoff. The former two are either not available, or unavailable at the required spatial resolution; that is the case of watershed storage changes retrieved from the Gravity Recovery and Climate Experiment (GRACE) (Rodell, Velicogna, & Famiglietti, 2009). Here, we aim to study the capabilities of satellite‐based remote sensing data in closing the water and energy balance of large watersheds (>10,000 km2) in a combined manner, using a methodology that uses information from the water and energy budgets to constrain each other. Previous closure studies have focused on either small study regions, or specific watersheds where reliable streamflow data were available. For instance, Heusinkveld et al. (2004) used field measurements to evaluate energy balance closure in a small arid watershed in Israel. Likewise, Sheffield, Ferguson, et al. (2009) and Pan et al. (2012) analyzed the ability of satellite‐based data sets, in combination with streamflow measurements, to close the water balance of large watersheds. To circumvent the need for watershed storage changes and ground heat flux data, we concentrate on long timescales in which heat and water storage changes are revealed to be negligible compared to the magnitude of the cumulative fluxes. Nonetheless, this long‐term perspective does not negate the need for runoff estimates. To avoid the requirement of runoff, which cannot be retrieved from current satellites nor will they be derived directly with the next generation of sensors (Biancamaria et al., 2016), we based our study on the Budyko framework (Budyko, 1974), a semi‐empirical representation of the long‐term water and energy balance. The approach is applied under the hypothesis that more accurate precipitation (P) and ET data sets lead to more accurate closure of water and energy balances. This particular Budyko application has proven to mimic data evaluation methodologies which use ground‐based measurements (Koppa & Gebremichael, 2017). As such, our study contributes to the growing body of literature where physical consistency of P and ET data products are evaluated based on the Budyko framework (Beck, Wood, McVicar, et al., 2020; Greve, Burek, & Wada, 2020; Greve, Orlowsky, et al., 2014; Hobeichi et al., 2020; Miralles, Jiménez, et al., 2016). In contrast to previous studies which primarily used in situ or land surface model‐based data sets, this study represents a novel application of the Budyko hypothesis in understanding the current state of satellite‐based remote sensing in characterizing the combined closure of water and energy balance of global watersheds. In doing so, we present a first evaluation of P and ET data sets in watersheds where such an evaluation has not been done before due to the lack of in situ data. In addition, the appraisal of closure at a global scale offers insights into the different physical controls, such as topography and land cover, which hinder or aid the effective retrieval of water and energy balance components from satellite‐based remote sensing. A final hurdle rises from the large variability in sensors and retrieval algorithms which are used to estimate the water and energy balance components, and their possible inconsistencies in accuracy, resolution, and coverage. This issue is aggravated by the fact that, in most cases, estimates stemming from a single satellite may not retrieve the component of interest accurately; this holds especially for ET, but it also pertains to surface net radiation or precipitation. To estimate these variables with higher accuracy, most retrieval methods rely on measurements of different atmospheric and terrestrial properties from a myriad of satellite sensors. The choice of source data and algorithms to combine them leads to disagreements among the different data sets of water and energy balance components, such as precipitation (Beck, Vergopolan, et al., 2017; Hong, Tang, et al., 2018; Sun et al., 2018) and ET (McCabe et al., 2016; Miralles, Jiménez, et al., 2016; Mueller et al., 2011). To account for the uncertainties and errors in closure due to variability in sensors and retrieval algorithms, we use an ensemble of satellite‐based precipitation and ET data sets. Finally, we analyze the patterns in closure across gradients in topography, aridity, greenness, land use, hydrology, and climate conditions. In the following, we present details of the remote sensing‐based data sets used in our analyses (Section 2), the methodology used in the study including the selection of watersheds at a global scale, description of the Budyko framework‐based metrics developed in this study to analyze the closure of water and energy balance in the Budyko space using the remote sensing‐based data sets (Section 3). In Section 4 and Section 5, we present and discuss the results of our study. Finally, we summarize our main conclusions (Section 6).

Remote Sensing Data Sets

This study considers remote sensing‐based P and ET for evaluating the water and energy budget closure at watershed scales. We select eight data sets for ET, seven for P, and one for net radiation—see Table 1. The selected data sets have different temporal and spatial coverage, as listed in the table. We consider the 20‐year period 1998–2017, which provides a sufficiently long temporal overlap between the different products for the analysis. For each combination of P and ET, the annual averages are determined for their period of overlap (within the 1998–2017 time period). In addition, we compare our remote sensing‐based results with an ensemble of P and ET data sets derived from in situ observations, reanalysis, and land surface models. We present a brief description of the P, ET, and R n data sets below.

Table 1

List of Satellite‐Based Precipitation, Terrestrial Evaporation, and Net Radiation Data Sets Used for Closing the Water and Energy Balance of Watersheds

Data source	Type	Spatial resolution (degree)	Temporal coverage (years)	Spatial coverage (degree)	Reference
Precipitation
CHIRPSv2.0	Satellite	0.05	1981–2019	50°N‐50°S	Funk et al. (2015)
CMORPHv0.x.RAW	Satellite	0.25	2002–2017	60°N‐60°S	Joyce et al. (2004)
PERSIANN	Satellite	0.25	2000–2019	60°N‐60°S	Sorooshian et al. (2000)
PERSIANN.CCS	Satellite	0.04	2003–2019	60°N‐60°S	Hong, Gochis, et al. (2007)
PERSIANN.CDR	Satellite	0.25	1983–2016	60°N‐60°S	Ashouri et al. (2015)
TRMM.3B42RT	Satellite	0.25	2000–2019	50°N‐50°S	Huffman et al. (2007)
TRMM.3B43	Satellite	0.25	2000–2019	50°N‐50°S	Huffman et al. (2007)
CPC.Unifiedv1.0	Observation	0.50	1979–2019	Global	(Chen et al., 2008)
ERA5.Land	Reanalysis	0.10	1981–2019	Global	(Muñoz‐Sabater et al., 2018)
GPCCv7.0	Gauge	1.00	1901–2013	Global	(Becker et al., 2013)
UDELv5.0	Gauge	0.50	1900–2017	Global	(Willmott & Matsuura, 2000)
Terrestrial evaporation
AVHRR.NTSG	Satellite	0.08	1983–2013	Global	K. Zhang et al. (2010)
SSEBOpv4.0	Satellite	0.01	2003–2019	Global	Senay et al. (2011)
MOD16A3	Satellite	0.05	2000–2014	Global	Mu et al. (2011)
GLEAMv3.3a	Satellite	0.25	1980–2018	Global	Martens et al. (2017)
GLEAMv3.3b	Satellite	0.25	2003–2018	Global	Martens et al. (2017)
CSIRO‐PMLv2.0	Satellite	0.50	1981–2012	Global	Y. Zhang et al. (2016)
BESS	Satellite	0.50	2000–2015	Global	Jiang and Ryu (2016)
FluxCom.RS	Satellite	0.50	2001–2015	Global	Tramontana et al. (2016)
ERA5.Land	Reanalysis	0.50	1981–2019	Global	(Muñoz‐Sabater et al., 2018)
FLDAS	Reanalysis	0.10	1981–2019	Global	(McNally et al., 2017)
FluxCom.RSM	Machine learning and reanalysis	0.50	1979–2019	Global	(Tramontana et al., 2016)
GLDASv2.1	Reanalysis	0.25	1948–2019	Global	(Rodell, Houser, et al., 2004)
Net radiation
CERESv4.0	Satellite	1.00	2000–2018	Global	Wielicki et al. (1996)

List of Satellite‐Based Precipitation, Terrestrial Evaporation, and Net Radiation Data Sets Used for Closing the Water and Energy Balance of Watersheds

Precipitation

The remote sensing‐based precipitation products are primarily based on either longwave infrared or microwave observations, or a combination of both. Climate Hazards Group Infrared Precipitation with Stations (CHIRPSv2.0) uses cold cloud duration data from thermal infrared sensors aboard satellites of the National Oceanic and Atmospheric Administration (NOAA). The satellites include the Geostationary Operational Environmental Satellites (GOES‐8,10). The retrieval algorithm is based on the cold cloud duration of a pixel, that is, the length of time a pixel is covered by high cold clouds as revealed by infrared brightness temperatures. Further, the derived precipitation is calibrated with the Tropical Rainfall Measuring Mission (TRMM) Multi Satellite Precipitation Analysis (TMPA) 3B42v7 precipitation data set (Funk et al., 2015) Climate Prediction Center Morphing Algorithm (CMORPHv0.x.RAW) precipitation is estimated exclusively from passive microwave observations from multiple satellites, such as the Defense Meteorological Satellite Program (DMSP) 12, 14, and 15, the NOAA‐15, 16, 17, and 18, the Advanced Microwave Scanning Radiometer for Earth Observation Satellite (AMSR‐E), and TRMM. In addition, infrared data are used to gap‐fill precipitation where microwave data are not available through a propagation algorithm. The former is sourced from GEOS‐8 and GEOS‐10 from NOAA, Meteosat‐7 and Meteosat‐10 from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT), and the Geostationary Meteorological Satellite (GMS‐5) from the Japan Aerospace Exploration Agency (JAXA) (Joyce et al., 2004) Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) uses longwave infrared from multiple satellites (GOES‐8, GOES‐10, GMS‐5, Meteosat‐6, and Meteosat‐7). PERSIANN also uses cold cloud pixel information from gridded brightness temperature, together with an artificial neural network model, to estimate precipitation (Sorooshian et al., 2000) PERSIANN Cloud Classification System (PERSIANN.CCS) incorporates a cloud classification system which categorizes clouds based on cloud height, areal extent, and texture. These classifications are used to assign rainfall values to each pixel based on a prespecified relationship between rain rate and brightness temperature (Hong, Gochis, et al., 2007) PERSIANN Climate Data Record (PERSIANN.CDR) uses neural networks as in PERSIANN, but trained with the National Centers for Environmental Prediction (NCEP) precipitation data. Further, the precipitation estimates are adjusted using the Global Precipitation Climatology Project (GPCP) gauge‐based data (Ashouri et al., 2015) TRMM TMPA Real Time (TRMM.3B42RT) is the real‐time data product from TMPA. It uses microwave data from multiple sensors that are intercalibrated and the data gaps are filled by calibrated infrared data. TRMM uses microwave measurements from multiple satellites (TRMM, DMSP, Aqua and NOAA satellite series) (Huffman et al., 2007) TRMM TMPA Gauge Calibrated Data set (TRMM.3B43), in which TMPA 3B42 is calibrated with the Global Precipitation Climatology Center (GPCC) gauge‐based data set (Huffman et al., 2007) The following is a brief description of the four precipitation data sets based on in situ and reanalysis data: Climate Prediction Center Unified Gauge‐Based Data set (CPC.Unified.v1.0) is constructed using in situ precipitation observations from over 30,000 stations which are collected from different agencies across the globe. The analysis of the gauges is done at 0.125° and released at a resolution of 0.5° latitude‐longitude grid (Chen et al., 2008) European Center for Medium‐Range Weather Forecasts Reanalysis 5 Land (ERA5.Land) combines weather forecasts with observations from multiple sources to generate a globally consistent estimate of atmospheric and land variables. ERA5.Land is a higher resolution replay of the land component of ERA5. ERA5 assimilates a large number of observations, including ground‐based radars, satellite‐based observations, atmospheric sounders among others (Muñoz‐Sabater et al., 2018) Global Precipitation Climatology Center Gauge‐Based Data set (GPCC.v7.0) is an in situ gauge‐based analysis based on approximately 67,200 stations worldwide (Becker et al., 2013) University of Delaware Precipitation Data set (UDEL.v5.0) is a monthly data set constructed using observations a large number of in situ stations sourced from the Global Historical Climatology Network (GHCN) and national institutions from different countries (approximately 22,000 stations) (Willmott & Matsuura, 2000)

Terrestrial Evaporation

ET derivation algorithms generally use combinations of satellite observations and reanalysis data of multiple meteorological variables. Numerical Terradynamic Simulation Group (NTSG) Algorithm (AVHRR.NTSG) ET is based on data from Advanced Very High‐Resolution Radiometer (AVHRR) sensors. It employs a modified Penman‐Monteith (PM) method to quantify the potential canopy transpiration and soil evaporation. For open water evaporation, it uses a Priestley and Taylor (PT) approach. The PM method is modified by incorporating biome‐specific canopy conductance determined using NDVI from AVHRR (K. Zhang et al., 2010). The other meteorological inputs required by the PM and PT approaches—such as maximum, minimum, and average air temperatures, water vapor pressures—are sourced from the NCEP‐National Center for Atmospheric Research (NCAR) reanalysis (Kalnay et al., 1996) Simplified Surface Energy Balance Operational Data set (SSEBOpv4.0) applies a simplified surface energy balance model that combines ET fractions from MODIS thermal imagery with PM‐based reference ET (Senay et al., 2011). The meteorological variables required by the PM are derived from the Global Data Assimilation System (GDAS), an analysis system which provides initial conditions for the medium‐range forecast models from NOAA Moderate Resolution Imaging Spectrometer (MODIS) Data set (MOD16A3) uses a semi‐empirical ET algorithm that relies on a PM equation to estimate potential ET (Mu et al., 2011). The input variables for the PM equation are derived from NASA’s Global Modeling and Assimilation Office analysis product (GEOSDAS) Global Land Evaporation Amsterdam Model (GLEAM) v3.3a combines multiple remote sensing measurements into a PT‐based framework, where remote sensing inputs include microwave‐derived soil moisture and vegetation optical depth among others (Martens et al., 2017). Interception loss is estimated through a Gash analytical model (Miralles, Gash, et al., 2010). The surface radiation and air temperature are sourced from the European Center for Medium‐Range Weather Forecasts (ECMWF) reanalysis product (ERA5). P input comes from the Multi‐Source Weighted‐Ensemble Precipitation (MSWEP) data set (Beck, Wood, Pan, et al., 2019) GLEAM v3.3b is based on the same algorithm described above but, unlike other ET data sets, it only uses satellite‐based inputs and no reanalysis forcing (Martens et al., 2017). For surface radiation, the Clouds and Earth’s Radiant Energy System (CERES) data set is used. The air temperature is derived from the Atmospheric Infrared Sounder (AIRS) system onboard Aqua (Aumann et al., 2003) Commonwealth Scientific and Industrial Organization (CSIRO) PM‐Leuning (PM‐L) (CSIRO.PMLv2.0) uses a PML‐based model that utilizes reanalysis meteorological forcing and satellite‐derived inputs. The meteorological inputs are sourced from the Princeton Global Forcing (Sheffield, Goteti, & Wood, 2006) and the Water and Global Change (WATCH) Forcing Data ERA‐Interim (WFDEI) (Weedon et al., 2014). Satellite‐derived inputs include leaf area index from AVHRR, and albedo from the Global Land Surface Satellite (GLASS) data set (Liang et al., 2013). Its interception loss is based on a modified Gash analytical model. At the annual timescale, the model is constrained by the Budyko framework such that annual ET is always less than annual P for grid cells not covered by crops (K. Zhang et al., 2010) Breathing Earth System Simulator (BESS) uses a process‐based model which couples atmospheric and canopy radiative transfer, and canopy photosynthesis. It uses MODIS atmosphere and land data along with other satellite, reanalysis (ERA‐Interim) and ancillary data to derive ET using a PM‐based potential ET estimates and stomatal conductance based on Ball et al. (1987). Unlike other models, it explicitly links canopy conductance to carbon concentrations in the atmosphere (Jiang & Ryu, 2016) FLUXCOM Initiative data set (FluxCom.RS) uses multiple machine‐learning techniques to train algorithms with eddy covariance data from the FLUXNET global network of flux towers (Baldocchi et al., 2001), and applies MODIS observations (land surface temperature, land cover, fraction of absorbed radiation by canopy) as predictors to yield ET estimates (Tramontana et al., 2016) ET data sets which are generated using LSMs or machine learning models driven with in situ or reanalysis forcing are described below. European Center for Medium‐Range Weather Forecasts Reanalysis 5 Land (ERA5.Land), in which, as described above, ET reanalysis estimates are produced using the land component of ERA5, albeit at a higher resolution (Muñoz‐Sabater et al., 2018) Famine Early Warning Systems Network Land Data Assimilation System (FLDAS), which is part of a global land data assimilation system created specifically to address the monitoring and forecast requirements associated with food security in data‐scarce regions. We used the outputs from the Noah LSM run using NASA’s Land Information System (LIS). The primary forcing is precipitation from CHIRPS (McNally et al., 2017) FLUXCOM Initiative Remote Sensing and Model Data (FluxCom.RSM) uses a machine learning algorithm similar to FluxCOM.RS. However, the machine learning model is driven by model‐based climate variables (Tramontana et al., 2016) Global Land Data Assimilation System (GLDAS.v2.1) assimilates satellite and in situ observations into multiple LSMs to generate fields of land surface fluxes and states. In this study, we use the instance of GLDAS which uses the Noah LSM (Rodell, Houser, et al., 2004)

Net Radiation

We use net radiation (R n) data from the CERES instruments aboard the Terra satellites. R n is computed in a two‐step process. In the first step, uncalibrated top‐of‐the‐atmosphere fluxes are produced using meteorological variables, cloud, aerosol, and ozone properties as input to a delta‐two stream radiation transfer model. In the next step, the top‐of‐the‐atmosphere fluxes are calibrated using observations from the CERES instruments and the input variables are modified. The calibrated input variables are then used for second run of the radiation transfer model to generate fluxes at different pressure levels (top of atmosphere to surface) (Wielicki et al., 1996).

Methodology

Selection of Watersheds

Spatially, most of the data sets used in this study cover the entire globe, yet some P data sets are unavailable over high latitudes. To maintain consistency, we limit our analysis to watersheds between 60°N and 60°S. We source the delineated watersheds from the HydroBASINS global watershed data set, which in turn are derived from HydroSHEDS (Lehner et al., 2008). The HydroBASINS data set consists of hierarchically nested basins at 12 levels, with level 1 representing the largest basins and level 12 the smallest. HydroBASINS classifies the basins according to the Pfafstetter coding system in which a larger basin at any level is divided into nine smaller units at the next level (Lehner & Grill, 2013). Out of the 12 levels, we choose 4,734 level‐5 watersheds (Figure 1a). Choosing level 5 yields a large‐enough sample; at the same time, the input satellite‐based precipitation and ET data sets are available at <0.5° resolution (approximately 2,500 km2), while the average area of the 4,734 level‐5 watersheds is approximately 28,500 km2, that is, still larger than the data set grid‐cells. Figure 1a also presents a hydroclimatic classification of the 4,734 watersheds according to aridity index (AI = PET/P, where PET is potential evaporation and P is precipitation). Five aridity classes are considered, in agreement with the United Nations Environmental Program (UNEP) (Grove, 1999): (1) Humid (0 < AI ≤ 1.5), (2) Dry sub‐humid (1.5 < AI ≤ 2.0), (3) Semi‐arid (2.0 < AI ≤ 5.0), (4) Arid (5.0 < AI ≤ 33.3), and (5) Hyper‐arid (AI > 33.3). We also present global maps of additional watershed characteristics, including elevation (Figure 1b), compound topographic index (CTI) (Figure 1c), and Normalized Difference Vegetation Index (NDVI) (Figure 1d).

Figure 1

Study area consisting of 4,734 watersheds based on HydroBASINS (Pfafstetter level 5) and classified according to (a) aridity, (b) elevation, (c) complex topographic index (CTI), and (d) normalized difference vegetation index (NDVI).

Long‐Term Water and Energy Balance

Under the assumptions of negligible net water storage change and ground heat flux (Sposito, 2017), the long‐term water (Equation 1) and energy (Equation 2) balances of watersheds can be expressed as where, Q is runoff, λ·ET is surface latent heat flux ( is latent heat of vaporization), and H is surface sensible heat flux. In this study, PET is calculated as based on Milly and Dunne (2016).

Budyko Hypothesis

The Budyko hypothesis describes the long‐term water and energy balance of watersheds using two indices: (1) Evaporative Index (ET/P, abbreviated as EI) and (2) Aridity Index (PET/P, abbreviated as AI). In its original formulation (Budyko, 1974), AI and EI were related by a non‐parametric and nonlinear function. However, several parametric versions of the Budyko function have been formulated ever since (Choudhury, 1999; Fuh, 1981; Yang et al., 2008; L. Zhang et al., 2004). These deviations have since been attributed to different watershed characteristics such as soil (Porporato et al., 2004), vegetation (Donohue et al., 2012; Good et al., 2017), and groundwater (Istanbulluoglu et al., 2012). In this study, we utilize Fuh’s equation (Fuh, 1981), a single‐parameter Budyko function to close the combined water and energy balance of watersheds: where is a parameter which has no analytical solution, but several parameterizations have been suggested relating it to watershed characteristics such as topography, soil, vegetation, and geographical location of watersheds among others (Li et al., 2013; Xu et al., 2013). The Budyko curve (Equation 3) is constrained by the following water and energy limits: The water limit (Equation 4) implies that ET cannot exceed P in the long term. Implicit in the constraint is the assumption that the contribution of watershed storage to long‐term water availability is negligible, which is not always a valid assumption (see Section 3.4). In watersheds where water availability is not a constraint, ET is limited by the available energy in the form of R n or PET (Equation 5). Figure 2 shows a representative Budyko space (AI vs. EI) consisting of the Budyko curve (Equation 3), the water limit (Equation 4), and the energy limit (Equation 5).

Figure 2

The Budyko space consisting of a representative Budyko curve constrained by the water and energy limits. We also represent the cluster radius (CR) (solid blue line) and the closure error (CE) (solid orange line) used in this study to quantify the uncertainty and error in water and energy balance closure of watersheds, respectively. Each point represents a [P, ET, R n] combination in the Budyko space. The analyses are based on 56 such combinations per watershed. The triangle represents the centroid of the 56 combinations. In this study, we postulate that the Budyko function (Equation 3) offers a reliable proxy for traditional water (Equation 1) and energy (Equation 2) balance equations at the watershed scale (Beck, Wood, McVicar, et al., 2020; Greve, Orlowsky, et al., 2014). We note here that Fuh’s equation is semi‐empirical, and while it is grounded on well‐established water and energy constraints, it is not equivalent to a water and energy balance closure from first principles. However, the equation provides an effective representation of these long‐term balances that has been evaluated in numerous previous studies referenced above.

Deviations From the Budyko Hypothesis

The efficacy of the Budyko hypothesis in representing the long‐term combined water‐energy balance of watersheds is well known. However, watersheds may violate the assumptions (detailed above) under which the hypothesis is applicable. A major and observed deviation happens in watersheds with continuous and prolonged groundwater extraction, which leads to the exceedance of the water limit (Equation 4) (Condon & Maxwell, 2017; Destouni et al., 2013; Jaramillo & Destouni, 2015; Moshir Panahi et al., 2020). In certain watersheds with high heat advection and interception loss, the energy limit (Equation 5) may also be exceeded (Holwerda et al., 2012). In this study, we address the issue of the violation of the water limit arising from long‐term changes in watershed storage. According to Condon and Maxwell (2017), long‐term groundwater extraction (for example, irrigation) can cause significant increases in ET. Thus, the water limit (Equation 4) can be redefined to include this additional source by irrigation as ET/(P + ΔTWS) = 1, where ΔTWS represents the change (Δ) in total water storage (TWS). Therefore, it can be reasonably assumed that watersheds which have substantial contribution from ΔTWS to their water availability can deviate from the water and energy balance described by the Budyko hypothesis. In this study, we identify and exclude such watersheds using the following procedure. First, we use the global TWS anomaly data from GRACE, specifically the solution from the Jet Propulsion Laboratory (JPL) (Watkins et al., 2015), to calculate the mean (2002–2017) ΔTWS in each watershed (Figure 3a). The procedure detailed in Rodell, Famiglietti, et al. (2018) is followed to fit a linear trend to the monthly TWS anomalies and the trend values are aggregated to mm/year for each watershed. Then, we exclude watersheds in which the mean ΔTWS is greater than 5% of the mean P, under the assumption that if ΔTWS is less than 5% it cannot substantially influence the appraisal of (Budyko‐based) water and energy balance closure (described below). Based on this criterion, we exclude 266 watersheds from our analyses (Figure 3b). These watersheds are primarily located in intensively irrigated regions, such as North Africa, Arabian Peninsula, watersheds around the Caspian Sea, South America, and north‐west India.

Figure 3

(a) Global trends in mean annual changes in total water storage (TWS) derived from GRACE TWS anomaly data. (b) Watersheds in which TWS change is greater than 5% of the mean annual precipitation (266 watersheds), which are excluded in this study.

Uncertainty and Error in Water and Energy Balance Closure

We evaluate the ability of the ensemble of satellite‐based P, ET, and R n data sets to close the water and energy balance of watersheds using two Budyko hypothesis‐based metrics: (1) uncertainty in closure and (2) error in closure. Each point (AI, EI) in the Budyko space (Figure 2) represents the long‐term water and energy balance of a watershed, as described by a specific combination of P, ET, and R n (or PET). Therefore, the use of an ensemble of 56 possible permutations of P, ET, and R n data sets results in a point cloud for each of the 4,734 watersheds (Figure 1a), which represents the uncertainty in the closure of the water and energy balance due to the input data sets. First, we define cluster radius (CR), a simple metric to quantify that closure uncertainty in the Budyko space. For each watershed, CR is defined as the mean of all the Euclidean distances between the centroid of the point cloud and the individual points. In Figure 2, CR is represented by the solid blue line, while the dashed red line represents the circle corresponding to that radius. Likewise, the fit to the Budyko function (Equation 3) over long time scales enables us to quantify the error in the closure of water and energy balance of watersheds. Analogous to the CR as a metric of uncertainty, we also define closure error (CE) to quantify the closure error in the Budyko space. CE is defined as the Cartesian distance between a point (AI, EI) described by the satellite‐based data sets and the reference (AI, EI) of the watershed described by the Budyko function (red line in Figure 2). In a previous study, we showed that CE accurately represents the error in the P and ET data sets (Koppa & Gebremichael, 2017). Specifically, CE in the Budyko space (Figure 2) is calculated as where, EIest and AIest are the long‐term average evaporative and aridity indices estimated from satellite‐based estimates of P, ET and R n considered in this study (see Section 3) AIref is the reference long‐term estimate of AI determined using observed R n from the NASA/GEWEX Surface Radiation Budget (SRB) (Cox et al., 2006) v3.0 data and P from WorldClim v2.0 data set (Fick & Hijmans, 2017), a widely used high resolution (1 km) data set which uses between 9,000 and 6,000 weather stations for estimating long‐term P climatology at a global scale. As in Koppa and Gebremichael (2017) this approach is based on the assumption that the aridity of the watershed is known and does not change significantly. The SRB data spans the years 1983–2007 with a spatial resolution of 1° × 1°. The temporal range of the WorldClim data set is 1970–2000 and the spatial resolution is 1 × 1 km. The reasons for selecting the SRB data are threefold. First, it is the only satellite‐based radiation data set which spans most of the time period WorldClim is based on. More importantly, SRB has been extensively validated (Pinker et al., 2005) and is widely used in the development of benchmark global meteorological and land surface products, such as the Princeton Global Forcing (Sheffield, Goteti, & Wood, 2006) and the North American Land Data Assimilation System (NLDAS) (Mitchell et al., 2004) data sets. Finally, none of the ET data sets use SRB R n as an input, because SRB mission ended in 2007, and the temporal coverage of the data sets used in the study spans till the year 2019. As such, the AIref values remain independent of the data sets used for generating the evaluated ET data sets EImod is determined from Fuh’s equation (Equation 3) as where is the Budyko parameter. EImod is calculated for every year, and the long‐term EImod is determined as the average of all annual values As has no analytical solution, we use the parameterization developed by Xu et al. (2013). Specifically, we make use of the following multiple linear regression (MLR) model where, lat is the latitude of the centroid of each watershed and is derived from the HydroBASINS data set CTI is the compound topographic index (K. J. Beven & Kirkby, 1979) defined as , where α is the upstream watershed area derived from the HydroBASINS data set, is the slope in degrees. We derive the average slope of each watershed from the Multi‐Error‐Removed Improved‐Terrain (MERIT) DEM which covers the entire Earth (Yamazaki et al., 2017). Low CTI values are associated to smaller watersheds and steep slopes. On the other hand, high CTI values are associated with large watersheds and gentle slopes, and typically related to higher water accumulation and soil moisture A is the area of watershed derived from HydroBASINS data set and elev is elevation derived from MERIT DEM NDVI is the normalized difference vegetation index derived from MODIS, specifically MOD13A2 v6.0, available at 1 km spatial resolution (Didan et al., 2015). We determine one NDVI value per watershed by averaging across 2000–2018 The Budyko curve is highly nonlinear. As a result, in arid and hyper‐arid regions, where precipitation values are very low, small changes in precipitation may lead to large changes in the distance‐based metrics (CR and CE) used in this study. These issues were systematically addressed in Koppa and Gebremichael (2017). To prevent the CE metric from being biased due to the nonlinearity of the Budyko curve, Koppa and Gebremichael (2017) found that it is important to assume that the aridity index of the watershed is known, so that the Euclidean distance is calculated to the correct region in the Budyko curve. This is an extension of the approach by Greve, Orlowsky, et al. (2014), wherein the shortest distance to the Budyko curve was calculated. The approach by Greve, Orlowsky, et al. (2014) is, however, more prone to erroneous CE values. For example, if the remote sensing‐based data sets are present in the humid region of the Budyko space, but the actual aridity of the watershed is arid, then the CE value must be high. However, because the changes in the humid region of the Budyko curve are small, the CE values are erroneously small. In Koppa and Gebremichael (2017), this issue is discussed in detail and corrected for with the formulation in Equation 6. To calculate the benchmark AI, we use the best available long‐term climatological P data set (WorldClim) and an independent net radiation data set which matches the time period (SRB). This also enables a single benchmark EI value for each watershed against which the closure is analyzed. More importantly, to enable comparison across watersheds with different aridity indices, a normalization of the CR and CE metrics of each watershed by the corresponding AI values is required. In Koppa and Gebremichael (2017), the methodology presented here has been comprehensively and successfully tested against in situ station‐based validation methodologies. A potential source of uncertainty in the CE metric (Equation 6) is the value of the Fuh parameter, , which has no analytical solution. In this study, we use an empirically derived equation, which relates to different characteristics of the watershed (Equation 7). Therefore, we quantify the uncertainty in the CE metric arising from uncertainty in the accuracy of Equation 7 as follows. For every watershed, we first fix the value of derived from Equation 7 as the initial value . We then define an uncertainty range of ±2.0 around the initial value within which the sensitivity analysis is carried out. Specifically, we adopt the stratified single parameter sampling strategy suggested by (Saltelli et al., 2008). Accordingly, we sample the value of at equal intervals of 0.1 in the range [, ]. For each watershed, a value of is sampled and used to calculate the resulting CE metric. Finally, the standard deviation of the CE metric per watershed is calculated from the set of CE values calculated using the explored range of .

Evaluating P and ET Data Sets Based on Water and Energy Balance Closure

In this study, we validate the selected P and ET data sets based on their ability to close the water and energy balance of watersheds. The most convenient approach is to select as the best performing the P and ET data sets which, when combined, result in the lowest CE for a watershed. However, this does not completely solve the issue of compensatory biases, that is, P and ET data sets which are independently inaccurate but when combined lead to low CE values. Therefore, we adopt a more robust procedure to determine the best performing P and ET data sets. We calculate a set of CE values (Equation 6) for every P (or ET) data set by combining the P (or ET) data set with all the ET (or P) data sets. If we consider CHIRPSv2.0 as an example, we calculate a set of CE values by combining the CHIRPSv2.0 with the eight ET data sets used in this study. Then, we calculate the mean of these set of CE values for each P (or ET) data set. Finally, the P (or ET) data set with the lowest mean CE is considered as best performing for each specific watershed.

Results

First, we analyze the global distribution of uncertainty (CR) and error (CE) in satellite‐based data sets with respect to the closing the water and energy balance of watersheds from (Figure 4). Next, we use spider plots to interpret the results by decomposing them along different gradients, which represent different characteristics of the watersheds such as climate (AI), topography (elevation and CTI), and greenness (NDVI) (Figure 6). In addition, we quantify the sensitivities of CR and CE to changes in P and ET (Figure 7). For every watershed, we rank different P and ET data sets based on their skill at closing the (Budyko‐based) long‐term water and energy balance (Figure 8). In addition, we analyze the relative performance of the P and ET data sets according to different characteristics of the watershed (Figure 9).

Figure 4

Figure 6

Spider plots of CR and CE, normalized by AI, for (a) aridity, (b) elevation (in m), (c) compound topographic index (CTI), and (d) NDVI. In the plot, the CR and CE values for different watershed characteristics (for example, aridity) are scaled between 0% and 100% (represented by the concentric circles), with 0% and 100% representing the lowest and highest CR (or CE) values, respectively. For example, when the watersheds are grouped according to aridity, the highest CE value is seen in hyper‐arid watersheds and the lowest CE value is seen in sub‐humid watersheds. The classification bins for elevation, CTI, and NDVI correspond to the following percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%.

Figure 7

Global distribution of the mean CR determined with (a) mean ET (average of 8 data sets) and 7 P data sets and (b) mean P (average of 7 data sets) and 8 ET data sets, normalized by the watershed aridity index. We select the classification bins based on following percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%.

Figure 8

Global distribution of the best (a) P and (b) ET data set per watershed with respect to the closure of the long‐term water and energy balance (i.e., CE). Per watershed, we calculate the mean CE for each specific P (or ET) data set paired with all ET (or P) data sets. Then, the P (or ET) data set with the minimum value is selected as best performing.

Figure 9

Heat maps of the root mean square errors (RMSE) of P and ET data sets for different aridity, elevation, CTI, and NDVI classes. The numbers represent the rank of the P or ET data set for each watershed characteristic. The RMSE values are calculated as follows: For each P or ET data set, the Budyko error for every watershed is determined. We then calculate the RMSE of the Budyko errors corresponding to watersheds which come under a specific class of a watershed characteristic. We select the following classification based on different percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%.

Global distribution of the mean of (a) CR and (b) CE, normalized by the aridity index of watersheds, determined using 56 combinations of P, ET, and R n data sets. We select the following classification bins for the CR and CE metrics based on different percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. Global distribution of the uncertainty (standard deviation) in CE values due to uncertainty in the Fuh parameter (Equation 3). We select the following classification bins for the standard deviation in CE metrics based on different percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. Spider plots of CR and CE, normalized by AI, for (a) aridity, (b) elevation (in m), (c) compound topographic index (CTI), and (d) NDVI. In the plot, the CR and CE values for different watershed characteristics (for example, aridity) are scaled between 0% and 100% (represented by the concentric circles), with 0% and 100% representing the lowest and highest CR (or CE) values, respectively. For example, when the watersheds are grouped according to aridity, the highest CE value is seen in hyper‐arid watersheds and the lowest CE value is seen in sub‐humid watersheds. The classification bins for elevation, CTI, and NDVI correspond to the following percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. Global distribution of the mean CR determined with (a) mean ET (average of 8 data sets) and 7 P data sets and (b) mean P (average of 7 data sets) and 8 ET data sets, normalized by the watershed aridity index. We select the classification bins based on following percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. Global distribution of the best (a) P and (b) ET data set per watershed with respect to the closure of the long‐term water and energy balance (i.e., CE). Per watershed, we calculate the mean CE for each specific P (or ET) data set paired with all ET (or P) data sets. Then, the P (or ET) data set with the minimum value is selected as best performing. Heat maps of the root mean square errors (RMSE) of P and ET data sets for different aridity, elevation, CTI, and NDVI classes. The numbers represent the rank of the P or ET data set for each watershed characteristic. The RMSE values are calculated as follows: For each P or ET data set, the Budyko error for every watershed is determined. We then calculate the RMSE of the Budyko errors corresponding to watersheds which come under a specific class of a watershed characteristic. We select the following classification based on different percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. To aid the interpretation of results, the values presented in the CR and CE global maps are categorized into six percentile‐based bins: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%. The bin limits (0%, 10%, 25%, 50%, 75%, 90%, and 100% percentiles) are calculated from the 4,753 CR and CE values (one for each watershed). Then, each CR and CE value is assigned to one of the six bins.

Global Patterns in the Budyko‐Based Water and Energy Balance Closure

Figure 4a shows that CR values are low in humid and dry sub‐humid regions of eastern United States, the rainforests of Amazonia, parts of India, and southeast Asia (see also aridity map in Figure 1a). From Figure 4b, it is evident that CE is also low in these areas. However, the CR and CE for water and energy balance closure are very high in humid regions in northern latitudes, especially in the temperate forests of North America and Europe, eastern China, and Myanmar. In semi‐arid regions, we see that the watersheds in the western part of United States exhibit high CR and CE values (Figure 4). In addition, we see high CR and CE values in Patagonia and the Middle East. At the same time, several semi‐arid watersheds have low CR and CE values, including most of Australia, north‐eastern China, central United States, and southern Africa. On the other hand, in arid and hyper‐arid regions CR and CE are typically high, including in the East Sudanian savanna, the Greater Horn of Africa, Middle East and central Asia. While the spatial patterns of CR and CE are similar, there are certain regions where they show significant deviations. In the west Sahara and the Himalayan region, CR values are very low but the corresponding CE values are high. Likewise, the watersheds in sub‐Saharan Africa and the rainforests in central Africa have high CR compared to their CE values. We interpret these similarities and differences between CR and CE in the “Discussion” section (Section 5). For reference, the standard deviation of CR and CE is found to follow the spatial patterns in mean values (Figure S2). As detailed in the Methodology section, it is important to discuss the uncertainty in CE associated with the uncertainty in the Fuh parameter (Figure 5). In 75% of the watersheds, this uncertainty is very low (lower than 0.0813, or approximately 12% of the corresponding CE values). However, some regions—especially in eastern part of China, Russia, and the west of North America—show a high uncertainty in the CE metric propagating from the Fuh parameter. Most of these regions also show high CE values (higher latitudes in North America, and eastern China). We note here that while performing the sensitivity analysis of the Fuh parameter, we considered a very wide range of values Therefore, the uncertainty in Figure 5 represents a high ceiling of what could be expected due to the influence of uncertainties in the Fuh parameter. For the sake of clarity and considering that only in 5% of the watersheds this uncertainty is > 0.6, we do not present uncertainty ranges in the figures discussed in subsequent sections.

Figure 5

Global distribution of the uncertainty (standard deviation) in CE values due to uncertainty in the Fuh parameter (Equation 3). We select the following classification bins for the standard deviation in CE metrics based on different percentile ranges: 0%–10%, 10%–25%, 25%–50%, 50%–75%, 75%–90%, and 90%–100%.

Making use of spider plots, Figure 6 groups the CR and CE values according to various watershed characteristics that govern the partitioning of P into ET and Q (Padrón et al., 2017). It is evident from Figure 6a that when the watersheds are aggregated according to aridity (Figure 6a), the highest CR is seen in humid regions, followed by hyper‐arid regions. The highest CE values are in the arid watersheds, followed by humid regions. Figure 6a shows that the high closure uncertainty and error in humid regions is primarily due to the influence of northern latitudes. When we group the CR and CE values along an elevation gradient (Figure 6b), the highest CR is seen in relatively high percentile ranges (360–748 m and 748–1,286 m). Interestingly, we see that the in watersheds in the highest elevation percentile range (1,286–5,300 m) there is a stark difference between the closure uncertainty and error. In Figure 6c, we group the CR and CE values according to the compound topographic index (CTI). With increase in CTI, we see a decrease in both CR and CE values. Our final choice of a watershed characteristic to group the CR and CE values with is NDVI, which represents the greenness of vegetation in the watersheds (Figure 6d). The highest uncertainty and error occur in regions with very low NDVI. With increase in the greenness, the CE decreases. However, CR is relatively higher for watersheds with NDVI in the range 0.38–0.55, which corresponds to temperate forests (Figure 6d). With a single R n data set considered in our study, the uncertainty in the Budyko‐based water and energy balance closure (CR) is computed based on the variability in the chosen P and ET data sets only. To determine the relative contribution of P and ET data to CR values for each watershed we conduct a simple experiment. First, we fix the value of ET (or P) to the mean of all the 8 ET (or 7 P) data sets for each watershed, and then determine the CR values for each of the 7 P (or 8 ET) data sets. The resulting CR values relate only to the permutation of P (or ET)—see Figures 7a and 7b. By comparing both figures, we see that ET uncertainties (Figure 7b) contribute twice as much as P uncertainties (Figure 7a) to CR (see also Figure S5). In general, the spatial patterns in both figures are similar: regions with high P and E uncertainties tend to coincide. However, several regions show deviations from this norm. In Amazonia, the CR values in the two cases are comparable, thus the uncertainty in water‐energy balance closure in the Budyko space is influenced equally by both P and ET uncertainties in this region. This is also the case for southeast Asia and central India, where CR values are impacted equally by the uncertainty in P and ET.

Best P and ET Data Sets for Water and Energy Balance Closure

By comparing Figures 4b and S3, it is evident that the lowest errors in the Budyko‐based water and energy balance closure (i.e., CE) are an order of magnitude lower than the average CE values from all the 56 combinations of P and ET data sets. This shows that a few of the existing retrieval algorithms are capable of accurately closing the water and energy balance of watersheds, or that at least they do not all suffer from common deficiencies that make them unsuitable. Therefore, analyzing the distribution of the best P and ET data sets can help us determine the most adequate P and ET retrieval algorithms for long‐term closure studies in different regions. In Figure 8, we see several consistent patterns. The PERSIANN family of P data sets along with CHIRPSv2.0 have the lowest CE in temperate forests and the rainforests of Congo and Amazonia. On the other hand, the picture is more complex for ET. In temperate forests, BESS and CSIRO.PMLv2.0 are the most performant (Figure 8b). In Amazonia, the GLEAM data sets have the least CE. In the rainforests of Congo, FluxCom.RS and SSEBOpv4.0 seem to perform better than the other data sets. In the mountainous watersheds of the Himalayas and the Andes, the two TRMM‐based P data sets have the least CE values. However, for ET no single data set performs consistently better than the others. In the semi‐arid Southern Hemisphere watersheds of Patagonia and Australia, we see that TRMM data sets are the most performant. As far as ET is concerned, AVHRR and CSIRO.PMLv2.0 data sets show the lowest CE. In the arid and hyper arid regions of Sahara and the Middle East, we see again that PERSIANN.CDR, PERSIANN.CSS, and CHIRPSv2.0 outperforming the other data sets.; among ET data sets, GLEAM data sets have the lowest CE along with CSIRO.PMLv2.0. Finally, we rank the P and ET data sets according to different watershed characteristics such as aridity, elevation, CTI, and NDVI in Figure 9. For both P and ET, we see that the highest CE values are in mountainous watersheds (high elevation and low CTI). Overall, we see that the TRMM.3B43 and BESS data sets have high ranks across all the different classification criteria for P and ET data, respectively. While the ET data sets seem to perform similar to each other when grouped according to different watershed characteristics, P data sets show discernible differences. We see that PERSIANN.CDR and CMORPH data sets perform consistently worser than other data sets across different grouping criteria.

Discussion

Interpretation of CR and CE Patterns

CR and CE provide two distinct types of information, and differences between the two for the same watershed provide useful insights into the efficacy of existing algorithms. We analyze two contrasting scenarios: (1) low CR and high CE and (2) high CR and low CE values. For this, we use spider plots (Figure 4) and the spatial distribution of a simple ratio, R, defined as Figure 10a illustrates the results. R(CR, CE) closer to 0 represents the first scenario (low CR and high CE), and R(CR, CE) closer to 1 represents the second scenario (high CR and low CE). For the first scenario, a prime example are the arid and semi‐arid regions, where the aggregated CE value is much higher than the CR value (Figure 6a). This shows that although the data sets agree very well with each other regarding the degree of closure (low uncertainty, clustered close to each other in the Budyko space), they do not agree with the Budyko function (distance to the Budyko curve is high for all combinations). Therefore, in semi‐arid regions, most (if not all) existing P and ET retrieval algorithms need to be improved from the perspective of water and energy balance closure. We also see this in high elevation watersheds (1,286–5,300 m), highlighting the need for better satellite‐based remote sensing retrievals in mountainous regions. It is also interesting to note that the only forested region to which this scenario is applicable is Amazonia. Although the absolute CR and CE values are low in Amazonia, the relative difference between the two suggests that P and ET estimation algorithms systematically fail to achieve a closure. While a possible explanation for such systematic deviations could be the unsuitability of the negligible watershed storage changes, we note that watersheds with significant storage losses (Figure 3b) were excluded from the analyses.

Figure 10

Global distribution of ratios (a) R(CR, CE) and (b) R(CRET, CRP) using 56 combinations of P, ET, and R n data sets. R(CR, CE) is defined as CR/(CR + CE), where CR and CE values are taken from Figures 4a and 4b respectively. R(CRET, CRP) is defined as CRET/(CRET + CRP), where CRP and CRET values are taken from Figures 7a and 7b respectively. The opposite scenario (i.e., R values close to 1) indicates regions where P and ET data sets have large uncertainty, leading to high CR values, but the corresponding CE values are relatively low. As the spider plots in Figure 6a suggest, the humid regions are an example for this scenario; more specifically the Congo rainforest, western North America, and mainland Europe (Figure 10a). In these regions, despite the uncertainties in the ensemble, a few data sets are close to the reference Budyko curve and hence have a low CE. In such cases, the combinations of P and ET data sets which are close to the Budyko curve can serve as reference in improving other algorithms. This gives rise to the possibility of creating a merged P and ET data sets using CE for determining the weights (Beck, van Dijk, et al., 2017). The data sets which are not close to the Budyko curve (i.e., that show a high CE) may have common, systematic deficiencies in their input data or algorithms; this may orient product developers that seek to improve the accuracy of their data sets. Finally, the spider plots reflect that CR and CE are highly correlated with watershed characteristics (Figure 6). Specifically, we see that CTI (Figure 6c), a topographic index, explains well the spatial patterns in CR and CE values, even better than elevation. As described in the Methodology section, lower values of CTI correspond to smaller and steeper watersheds, and higher CTI corresponds to flatter and larger watersheds. We see that the CR and CE are high for smaller and steeper watersheds, and lower for larger watersheds. This highlights the difficulties of remote sensing techniques in mountainous regions, which has been noted in previous studies (Dinku et al., 2011; Gebremichael et al., 2014). From our analysis, we see that PERSIANN.CDR and SSEBOpv4.0 tend to perform the best for these regions; the higher resolution of SSEOBOp4.0 (1 km) could play role in that sense. We also see that the greenness of the watershed (NDVI) explains the variability of CE, and CR, to a lesser extent. Very low NDVI values correspond to watersheds with lack of vegetation, which lie primarily in arid and hyper‐arid regions, where the uncertainty is higher.

Understanding the Sources of the Budyko‐Based Closure Uncertainty

The results also reveal that the CR of watersheds are primarily governed by uncertainty in ET data rather than P (Figure 7). However, there may be regions in which the uncertainty is governed by P. To better understand whether CR is affected more by P or ET uncertainties, we define, similar to R(CR, CE), another ratio: where CRP is the CR determined using mean of ET data sets (Figure 7a) and CRET is CR determined using mean of P data sets (Figure 7b). Therefore, CRET represents uncertainty in (Budyko‐based) closure arising only from the ET uncertainty, and CRP represents uncertainty in closure arising from P uncertainties. If R(CRET, CRP) tends to 0, it implies that uncertainty in P influences CR more than ET. If R(CRET, CRP) tends to 1, uncertainty in ET influences CR more. Figure 10b, although confirming the results from Figure 7, shows that there are several regions in which CR is influenced by both P and ET uncertainties. Examples for such a scenario are central India, parts of southern Africa and Amazonia. In Amazonia, the higher uncertainties in P are also reflected in the fact that no unique P data set outperforms all others, and that the best data sets are not gauge calibrated. This reflects that there are not enough gauges in the region, but also the general inadequacy of satellite remote sensing over tropical forests. In terms of ET, GLEAM seems to outperform the other data sets in Amazonia, with the GLEAMv3.3b outperforming GLEAMv3.3a despite using only satellite data. It is important to note that the impact of individual uncertainties in P and ET data sets on CR and CE (Figures 10 and S5) are not linearly related to the absolute uncertainties in P and ET data sets. This is apparent in the scatter plot between the coefficient of variation (CV) in P and ET (Figure 11). From Figure 11, it is evident that the absolute uncertainty in ET is in fact marginally lower than the uncertainty in P data sets, especially in humid regions. However, from Figures 10 and S5, the uncertainties in ET propagate further into CR and CE. A mathematical reason for this is apparent from the Budyko space: the fact that P appears in the denominator in both x (AI) and y (EI) axes, but ET appears only in the numerator in the y axis (EI). A physically consistent explanation is the fact that higher precipitation implies both lower aridity and a lower evaporative index at the same time. On the other hand, higher ET affects only EI, and this effect is not compensated by changes to AI (in water‐limited regions). This nonlinearity in sensitivities would not be revealed without combining the water and energy balances together.

Figure 11

Scatter plot of the coefficient of variation of the absolute values of P (determined using 7 P data sets for each watershed) versus the coefficient of variation of the absolute values of ET (determined using 8 ET data sets for each watershed). This plot shows the absolute uncertainties (in the form of coefficient of variation) in P and ET data sets, as opposed to cluster radius (which represents the uncertainty in the closure of water and energy balance).

Performance of P and ET Data Sets in High Latitude Watersheds

A recurring theme in the results is that the ensemble of P and ET data sets selected in this study fails to close the Budyko‐based water and energy balance of watersheds in the northern latitudes, both in the temperate forests and the grasslands in mainland Europe. The decomposition of the results according to watershed characteristics (Figure 1) did not reveal any consistent reason for this high uncertainty. However, we see that the best combination of P and ET data sets for these regions results in very low CE values (0%–10% range in Figure S3). This implies that some algorithms (PERSIANN.CDR and PERSIANN.CCS for precipitation, and BESS and CSIRO.PMLv2.0 for ET) are still well suited to represent the long‐term closure in these regions. The fact that a gauge‐calibrated P data set (PERSIANN.CDR) performs well reveals the benefit of utilizing the dense network of rain gauges in Europe and North America. Also, the extension of the PERSIANN algorithm with a more sophisticated cloud segmentation (PERSIANN.CCS) seems to have contributed to better rainfall estimation over these regions. For ET, the coupled biophysical model used in BESS may be more suitable for higher latitudes. We note that both BESS and CSIRO.PMLv2.0 are based on the PM equation, which tends to perform better in low radiation conditions than PT formulations (Miralles, Jiménez, et al., 2016). Also, the CSIRO.PMLv2.0 data set is calibrated with the Budyko curve, which could explain its better performance in these analyses. In fact, CSIRO.PMLv2.0 outperforms all the data sets across the majority of the landcover classes (Figure S4). We note that our ranking of the data sets is in the context of long‐term closure, and therefore only indicative of the performance of the mean bias in the data sets and not of the temporal dynamics.

Comparison With P and ET From In Situ Observations and LSMs

Finally, we contrast our results based on remote sensing data sets with those arising from the use of in situ and state‐of‐art LSM’s. As CR is biased by the number of P and ET data sets (56 combinations for remote sensing‐based data sets compared to 16 for in situ and LSM‐based data sets), we focus only on comparing CE results, which provide an independent benchmark of water and energy balance closure in the Budyko space and are not biased by the number of data sets. Figure 12 presents a global map of fractional difference (Equation 10) between mean of CE values calculated using only remote sensing‐based data sets (CERS) and only in situ and LSM‐based data sets (CELSM).

Figure 12

Global distribution of fractional difference between mean closure error (CE) of all combinations of remote sensing‐based (CERS) and in situ and LSM‐based (CELSM) P and ET data sets. Positive values imply LSM‐based data sets are better at closing the Budyko‐based water and energy balance. Negative values imply remote sensing‐based data sets are better. Negative CERS‐LSM values point to lower remote sensing‐based CE compared to in situ and LSM‐based data sets, and hence a better performance in closing the water and energy balance in the Budyko space. A consistent geographical pattern emerges in Figure 12, in which negative CERS‐LSM values are seen in the tropics, while positive CERS‐LSM values are seen in the mid and high latitudes. A plausible explanation for this clear distinction could be that the in situ P data sets are more accurate in regions with high density of rain gauges, such as the North America and Europe. This increased accuracy translates into improved accuracy in ET, as many of the LSM and reanalysis‐based ET estimates use in situ‐based gridded P estimates as forcing. The opposite applies in parts of the tropics, such as the Amazon and Congo rainforests, where in situ measurements are sparse and hence satellite‐based remote sensing data sets show an advantage. We also see large regions in central Asia and north‐east Europe where the remote sensing data sets perform similar to in situ and LSM’s as far as the Budyko‐based water and energy balance closure is concerned (−0.1 to 0.1 bin in Figure 12). In Figure S6, we present a comprehensive ranking of remote sensing and in situ or LSM‐based P and ET data set using heatmaps, similar to Figure 9. We see that GPCCv7.0, which uses the greatest number of in situ rainfall stations, consistently performs than other in situ and remote sensing‐based data sets. As far as ET is concerned, we see that the Noah LSM‐based FLDAS and GLDASv2.1 perform better than the other two in situ and LSM‐based data sets under different aridity, topographic, and vegetation‐based gradients. Our results not only highlight the importance of satellite‐based remote sensing in regions where in situ monitoring is insufficient.

Conclusion and Future Work

In this study, we quantified the potential of global satellite data to close the water and energy balance of large watersheds worldwide. To do so, we applied a Budyko‐based framework that explicitly considers water and energy balance constraints. Our results helped identify regions in which current understanding of watershed‐scale hydrologic processes is hindered by systematic errors in observations. Moreover, the closure analysis enabled the quantification of long‐term biases in individual satellite‐based data sets, thus providing a global assessment of the current skill at retrieving different fluxes. We show that uncertainties and errors in closure are highly variable in space. Of importance is the high uncertainty in temperate forests, where both P and ET data sets are highly uncertain and exhibit significant biases. In addition, we see that the water‐energy balance closure is difficult to achieve in arid and semi‐arid regions, where the availability of accurate satellite data is important, as these regions are also scarcely sampled in situ. Concerted efforts are needed to improve P and ET data sets in arid and mountainous watersheds where most of the data sets are shown to agree with each other but display systematic errors. Finally, we showed that satellite‐based remote sensing data sets perform better than in situ observations and model‐based P and ET data sets in data‐scarce regions. Overall, our findings highlight the need for improving existing algorithms in specific regions, either by revisiting their parameterizations or incorporating physical processes that may be relevant but not considered at the moment. The evaluation presented here is novel, independent of in situ measurements, and integrative in terms of considering both the water and energy balance together using the Budyko framework. Altogether, it offers a new means for the creation of merged data sets P and ET that better represent the long‐term closure of water and energy cycles. Supporting Information S1 Click here for additional data file.

16 in total

Budyko-Based Long-Term Water and Energy Balance Closure in Global Watersheds From Earth Observations.

Introduction

Remote Sensing Data Sets

Precipitation

Terrestrial Evaporation

Net Radiation

Methodology

Selection of Watersheds

Long‐Term Water and Energy Balance

Budyko Hypothesis

Deviations From the Budyko Hypothesis

Uncertainty and Error in Water and Energy Balance Closure

Evaluating P and ET Data Sets Based on Water and Energy Balance Closure

Results

Global Patterns in the Budyko‐Based Water and Energy Balance Closure

Best P and ET Data Sets for Water and Energy Balance Closure

Discussion

Interpretation of CR and CE Patterns

Understanding the Sources of the Budyko‐Based Closure Uncertainty

Performance of P and ET Data Sets in High Latitude Watersheds

Comparison With P and ET From In Situ Observations and LSMs

Conclusion and Future Work

1. Soil water balance and ecosystem response to climate change.

2. Local flow regulation and irrigation raise global human water consumption and footprint.

3. Satellite-based estimates of groundwater depletion in India.

4. Afternoon rain more likely over drier soils.

Review 5. Hydrologic regulation of plant rooting depth.

6. Emerging trends in global freshwater availability.

7. Reconciling spatial and temporal soil moisture effects on afternoon rainfall.

8. A land data assimilation system for sub-Saharan Africa food and water security applications.

Review 9. Towards a methodology for testing models as hypotheses in the inexact sciences.

10. Variability and change in the hydro-climate and water resources of Iran over a recent 30-year period.

1. A deep learning-based hybrid model of global terrestrial evaporation.