Literature DB >> 32226131

Calibration of the Global Flood Awareness System (GloFAS) using daily streamflow data.

Feyera A Hirpa¹, Peter Salamon², Hylke E Beck³, Valerio Lorini², Lorenzo Alfieri², Ervin Zsoter⁴, Simon J Dadson¹.

Abstract

This paper presents the calibration and evaluation of the Global Flood Awareness System (GloFAS), an operational system that produces ensemble streamflow forecasts and threshold exceedance probabilities for large rivers worldwide. The system generates daily streamflow forecasts using a coupled H-TESSEL land surface scheme and the LISFLOOD model forced by ECMWF IFS meteorological forecasts. The hydrology model currently uses a priori parameter estimates with uniform values globally, which may limit the streamflow forecast skill. Here, the LISFLOOD routing and groundwater model parameters are calibrated with ECMWF reforecasts from 1995 to 2015 as forcing using daily streamflow data from 1287 stations worldwide. The calibration of LISFLOOD parameters is performed using an evolutionary optimization algorithm with the Kling-Gupta Efficiency (KGE) as objective function. The skill improvements are quantified by computing the skill scores as the change in KGE relative to the baseline simulation using a priori parameters. The results show that simulation skill has improved after calibration (KGE skill score > 0.08) for the large majority of stations during the calibration (67% globally and 77% outside of North America) and validation (60% globally and 69% outside of North America) periods compared to the baseline simulation. However, the skill gain was impacted by the bias in the baseline simulation (the lowest skill score was obtained in basins with negative bias) due to the limitation of the model in correcting the negative bias in streamflow. Hence, further skill improvements could be achieved by reducing the bias in the streamflow by improving the precipitation forecasts and the land surface model. The results of this work will have implications on improving the operational GloFAS flood forecasting (www.globalfloods.eu).

Entities: Chemical Disease Gene Species

Keywords: Early warning system; Flood forecasting; GloFAS; Global hydrology; Model calibration

Year: 2018 PMID： 32226131 PMCID： PMC7097972 DOI： 10.1016/j.jhydrol.2018.09.052

Source DB: PubMed Journal: J Hydrol (Amst) ISSN： 0022-1694 Impact factor: 5.722

Introduction

Flood early warning systems (FEWS) are principal components of disaster risk management since they provide a unique opportunity to identify upcoming flood hazards ahead of their occurrence. However, the effectiveness of early warning systems depends on the skill with which the occurrence and severity of floods can be predicted. In recent years, global-scale operational flood forecasting systems have emerged as promising support tools for disaster preparedness and response worldwide, and major effort has been dedicated to improving the skill of such systems (see reviews in Emerton et al., 2016, Hirpa et al., 2018). Flood forecast skill is primarily affected by: (i) model errors due to incomplete representation of physical processes and inaccurate parameterization, (ii) uncertainty in the model initial conditions, and (iii) errors in the meteorological forcing (e.g., Fekete et al., 2004, Biemans et al., 2009, Pappenberger et al., 2011, Nasonova et al., 2011, Müller Schmied et al., 2014, Sperna Weiland et al., 2015). Several studies have investigated the relative contributions of the meteorological forcing (precipitation), initial conditions, model structure or parameters uncertainty to the errors in the streamflow simulations (e.g., Fekete et al., 2004, Li et al., 2009, Materia et al., 2010, Nasonova et al., 2011, Elsner et al., 2014, Sampson et al., 2014, Sperna Weiland et al., 2015, Döll et al., 2016). Even though precipitation uncertainty was generally identified as the largest contributor to the error in the simulated streamflow (e.g., Sperna Weiland et al., 2015, Beck et al., 2016), calibration of global-scale model parameters has been demonstrated to improve streamflow simulations (e.g., Werth and Güntner, 2010, Müller Schmied et al., 2014, Beck et al., 2016). Calibrating a global-scale hydrological model is a challenging task mainly due to the high computational demand and the limited availability of reliable streamflow observations (e.g., Bierkens, 2015). Hence, previous works have been limited to small number of basins, performed at a relatively larger catchments and/or temporal scales (e.g., monthly), or using a proxy streamflow data derived from satellite observations (e.g., Werth and Güntner, 2010, Sperna Weiland et al., 2015, Revilla-Romero et al., 2015a, Revilla-Romero et al., 2015b). The increasing advances in the computational power and progressively improving calibration algorithms present an opportunity to perform model calibration at the global scale and higher spatial and temporal resolutions. Large-scale hydrological modelling using grid-based hydrological routing model coupled with a land surface model has been used for generating streamflow for large river basins or at continental and global scales (e.g., Goteti et al., 2008, Decharme et al., 2010, Dadson et al., 2011, Alfieri et al., 2013, Li et al., 2013, Wu et al., 2014). In the coupled modeling systems the land surface model generates runoff from meteorological forcing by calculating the water balance, and a routing model is used to calculate the flow in river channels. The Copernicus Global Flood Awareness System (GloFAS, Alfieri et al., 2013; www.globalfloods.eu) produces ensemble streamflow forecasts and threshold exceedance probabilities for large rivers around the world. The system employs a coupled land surface scheme and flow routing model to generate operational forecasts at daily time steps and 0.1° grid resolution. The land surface scheme, referred to as Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (H-TESSEL; Balsamo et al., 2009) and used operationally in the Integrated Forecast System (IFS), computes the surface and subsurface runoff. A simplified version of LISFLOOD (Burek et al. 2013) is used for routing the runoff produced by the land surface scheme through the river network and computing the groundwater fluxes. GloFAS serves as a primary mechanism of support for flood early warning in regions without local FEWS and provides complementary flood forecast information for areas with alternative systems such as national or continental FEWS. Much effort has been directed at improving GloFAS forecast skill (e.g., Zajac et al., 2017, Hirpa et al., 2016, Revilla-Romero et al., 2015a, Revilla-Romero et al., 2015b, Zsoter et al., 2016), however calibration of the hydrological model has not been previously performed. In the present study we calibrate the hydrological model parameters underpinning the operational GloFAS using daily streamflow observations from 1287 stations across the globe. The significant contribution of this calibration work lies in its application to global-scale operational flood forecasting system hence can help improve the flood early warning through enhanced flood forecast skill. Furthermore, the calibration covers hundreds of river basins with a wide range of climate including snow-dominated boreal basins, equatorial basins with high runoff, and dry basins with low runoff. To accommodate the high computational demand of global-scale calibration of a distributed hydrological model, an Evolutionary Algorithm with co-evolution of multiple simulations running in parallel was used (e.g., Deb et al., 2002, Tang et al., 2006). The main objectives of the work are to improve the operational flood forecasting skill by calibrating the flow routing and groundwater model parameters, and to quantify the contribution of the flow routing model parameters uncertainty to the streamflow forecast skill. We quantify the skill gained as the result of model calibration by comparing to baseline simulations produced using default parameter sets determined based on the literature (see Feyen et al., 2007). The transferability of parameters to another period was evaluated using a different time period from calibration. While calibration of the routing model alone is unlikely to remove all the errors in the streamflow simulation, we aim to reduce the relative contribution of the parameter error to the streamflow forecast uncertainty. The remainder of this paper is organized as follows. The runoff routing model and the calibrated parameters are described in Section 2. The calibration method and data are presented in section 3, followed by the evaluation method in Section 4. Results, discussions and conclusions are presented in 5, 6, 7 respectively.

LISFLOOD model and parameters

LISFLOOD is a distributed hydrological model composed of sub-models capable of separately simulating different hydrological processes. In GloFAS, the model is run at 0.1° resolution globally and at daily time step. The sub-models are used for the simulation of groundwater storage, groundwater flow, and flow routing into and through river channels. The groundwater storages and transport are represented using two interconnected groundwater zones each consisting of a linear reservoir (Burek et al., 2013). The outflow from each zone is estimated based on the water storage and time constant of the reservoir. The two reservoir time constants, the percolation from upper to lower zone, and a loss from lower zone to deep groundwater are calibrated parameters (see Table 1).

Table 1

A list of calibrated LISFLOOD parameters. The GloFAS operational model currently uses the default parameter values.

Parameter name	Description [unit]	Parameter values
Parameter name	Description [unit]	Lower bound	Upper bound	Default value
UpperZoneTimeConstant	Time constant for water in upper zone [days]	3	40	10
LowerZoneTimeConstant	Time constant for water in lower zone [days]	40	500	200
GwPercValue	Maximum rate of percolation going from the upper to the lower groundwater zone [mm/day]	0.01	2	0.5
GwLoss	Maximum loss rate out of lower groundwater zone expressed as a fraction of lower zone outflow [−]	0	0.5	0
CalChanMan	A multiplier applied to Channel Manning's coefficient n [−]	0.1	15	4
LakeMultiplier	A multiplier to adjust the lake outflow width parameter [−]	0.5	2	1
adjust_Normal_Flood	Adjusts the balance between normal and flood storage of a reservoir [−]	0.01	0.99	0.8
ReservoirRnormqMult	A multiplier to adjust the magnitude of the normal outflow from a reservoir [−]	0.25	5	1

A list of calibrated LISFLOOD parameters. The GloFAS operational model currently uses the default parameter values. There are three routing components in LISFLOOD: i) routing the surface runoff into stream channel, ii) subsurface runoff to the channel, and iii) flow routing through the stream channel. The surface runoff and in-channel routing are performed by solving kinematic wave equations (Chow, 1998). For each model pixel at every time step (here set to four hours), a 4-point implicit finite-difference solution of the kinematic wave equations is applied to compute the flow of water to the nearest downstream channel (Burek et al., 2013). The subsurface runoff routing into the stream channel is modeled differently. The total outflow from upper and lower groundwater zones at a given time step is routed to the nearest downstream channel pixel as a scaled sum of the outflow from upper and lower groundwater zones. The routing operations, implemented using the PCRaster software (http://pcraster.geo.uu.nl/), require information about stream channel characteristics such as channel length and gradient, flow width and depth, and Manning’s roughness coefficient. For GloFAS, the river network map, the flow direction map, the upstream area, and the flow length were obtained from the global river network database (Wu et al. 2012). The channel gradient between adjacent pixels was estimated as a ratio of the elevation change to the flow length. The river widths were taken from the Global Width Database for Large Rivers (GWD-LR; Yamazaki et al. 2014). The bankfull water depth was empirically estimated using long-term average discharge by applying Manning’s equation. In GloFAS, a total of 463 large lakes (>100 km2) and 667 reservoirs are incorporated to the LISFLOOD model (see Zajac et al, 2017 for the locations). The attributes of the lakes and reservoirs were obtained from global databases (Lehner and Döll, 2004, , 2009, Lehner et al., 2011). Outflow from a lake is estimated using its relationship with the lake level using weir equation (Bollrich, 1992). The lake level variation is estimated from the change in lake storage over time (Zajac et al., 2017). Each reservoir is simulated based on its relative filling levels and design parameters such as total storage capacity, storage limits and outflow requirements. The reservoir parameters were estimated from the metadata obtained from global database. However to improve the simulation skill we calibrate to adjust the balance between normal and flood storage limits of a reservoir, and the normal outflow from a reservoir (Table 1). Irrigation water use is modeled as a monthly withdrawal from discharge in the river network, but any potential outtake from groundwater, precipitation or soil water is not accounted for. Open water (e.g., river channels and lakes) evaporation is estimated using the Penman–Monteith equation with the required variables extracted from ECMWF reforecast datasets consistent with the forcing data (see Section 3.2).

Calibration method and data

Calibration method

Algorithm

We used an Evolutionary Algorithm (EA) for the calibration of the LISFLOOD model parameters. EA is a population-based optimization algorithm in which each individual (e.g, a vector of model parameters) in a large population represents a candidate solution for the optimization problem. The goodness of fit for each individual is evaluated based on selected objective functions, which are designed as either maximization (e.g, Kling-Gupta efficiency) or minimization (e.g., root mean square error) equations constrained by physically meaningful model parameter ranges. The basic principle of the EA is to modify and improve the population through evolution over a range of generations, and ultimately identify the best performing individual. The evolution mechanisms from one generation (parent) to the next (offspring) are inspired by biological evolution such as mutation, crossover and reproduction (Maier and Co-authors, , 2014, Nicklow et al., 2010). EAs have been previously used for calibration of hydrological and/or water resources models (e.g., Muleta and Nicklow, 2005, Kollat and Reed, 2005, Tang et al., 2006, Khu et al., 2008, Hadka and Reed, 2013, Maier and Co-authors, , 2014, Beck et al., 2016). In EA, the evolutionary loop starts by generating offspring (λ) from the population (µ). In this work this was performed by varying the population using a combination of a blend crossover (90%) and Gaussian-based mutation (10%) algorithms (Fortin at al., 2012). Determining the fraction of the population varied by crossover or mutation is part of the algorithm design (De Rainville et al., 2014), and its impact on the accuracy of the optimum solution can be investigated in a separate work. The offspring generation is followed by offspring evaluation in order to determine the best performing individuals. This was performed though model simulation using the newly generated model parameters. We applied a Non-dominated Sorting Genetic Algorithm (NSGA-II; Deb et al., 2002) for the selection of the best performing parameters from a mixed set of parents and offspring (µ+λ). The evolutionary loop continues until a stopping criterion is met. EAs have been shown to be one of the most efficient and effective algorithms in hydrologic calibration and particularly the NSGA-II algorithm was previously found to be superior to other algorithms with its rapid convergence (Tang at al., 2006).

Implementation

The Kling-Gupta efficiency (KGE; Gupta et al., 2009) was selected as the objective function for the calibration. Due to the high computational demands of the gridded LISFLOOD model at global-scale, it was necessary to limit the total number of model simulations during the calibration exercise. Here a combination of improvement based criteria (i.e., improvements in the objective functions) and exhaustion-based criteria (fixed number of generations) was employed for stopping the calibration algorithm. Using a selected number of basins, we first determined the rate of change of the objective function (KGE) with the number of generations in the evolutionary algorithm. Typically there is a fast improvement rate in the initial steps of the calibration, and then a slower rate for longer generations until there is no further skill gain for any additional generation after a convergence to an optimum parameter set. Since there was minor improvement beyond 10 generations for most of the test basins we determined that the maximum number of generations for the calibration runs to be 15. For the improvement based criterion, we determined that if, after 5 generations the KGE improvement for four consecutive generations is less than 0.001, the algorithm stops. In other applications, if the computational time is not a concern, one can run for a larger number of generations before obtaining the optimal parameter sets. Moreover, it might be desirable to implement several other stopping criteria (see, Marti et al., 2009). The EA algorithm was implemented using Python programming language and is publicly available as Distributed Evolutionary Algorithm in Python (DEAP; Fortin et al., 2012, De Rainville et al., 2014; https://github.com/deap/deap). The µ and λ values are set to 12 and 24 respectively, and, as mentioned previously, the probability that an offspring is produced by crossover (mutation) was set to 0.9 (0.1). Multiple parallel runs in a given generation are performed by distributing on multiple cores on a Window® PC. The calibration was run for a maximum of 372 simulations (15 generations, µ = 12 and λ = 24), unless the improvement criterion was met earlier, for each calibration station before stopping. For LISFLOOD, these simulations take, on average, approximately six hours for each calibration station on a PC with 12 cores. For stations located in the same river basin, the calibration was iteratively performed from upstream to downstream (in ascending catchment area) using the streamflow from the calibrated upstream subbasin(s) as inflow for the calibration of the next interstation area. This approach accounts for geomorphologic connections along the drainage network and was shown to improve model performance over independent single-site calibration (Wi et al., 2015, Xue et al., 2016).

Forcing data

For this work, the surface and subsurface runoff from ECMWF reforecast were used as input to the LISFLOOD model. The runoff fields were produced by the ECMWF land surface scheme, HTESSEL, which performs the energy and water balance calculations over land and water grids (e.g, Balsamo et al. 2009). The reforecast is created based on a retrospective run of the most recent version of the ECMWF’s Integrated Forecast System (IFS) which undergoes regular updates to improve the weather forecast (see, ECMWF, 2017). The purpose of the retrospective run is to generate long-term (20 years) datasets consistent with the operational weather forecasts (Alfieri et al., 2014). This makes the reforecast a suitable forcing data for the calibration of the operational LISFLOOD model parameters. The data record spans from June 1995 to June 2015, and due to frequent model updates it is based on multiple model cycles: Cycle 41r1 (July through March) and Cycle 41r2 (April through June), with horizontal resolutions of ∼32 km and ∼18 km respectively. The runoff fields are interpolated using nearest neighbour method from the native resolutions to 0.1° × 0.1° grids before they were used for forcing LISFLOOD.

Streamflow data

Daily streamflow observations from 1287 stations around the world were used for the model calibration and evaluation. The catchment areas of the stations vary from 484 km2 to 4.8 × 106 km2, with a median of 38,000 km2. The regional distribution of the stations is as follows: 41.6% are located in North America, 25.6% in South America, 14.1% in Europe, 10.9% in Africa, 4.5% in Australia and Oceania, and 3.3% in Asia. The Global Runoff Data Centre (GRDC) was the primary source of the streamflow data, but additional data from national hydrological services (e.g., South Africa, www.dwa.gov.za) and from the R-ArcticNet database for the Arctic region (http://www.r-arcticnet.sr.unh.edu/v4.0/AllData/index.html) were used. We only used stations with more than four years of data during the study period (1995–2015). For each station, the record was split into two for calibration and validation purposes (Fig. 1). If the record was shorter than eight years, four years were used for calibration and the remaining days were used for validation. Validation was not performed if the available data is less than 1 year. If the record was equal to or longer than eight years, half was used for calibration and half for validation. The most recent period was used for the calibration because the earlier forcing data have greater uncertainty.

Fig. 1

The median year of streamflow data used for model calibration (a), for validation (b) and the length of data available for validation (c). The most recent data available were used for calibration. Four years of data was used for calibration for those stations with validation period of less than 4 years, and the remaining stations have equal length of both calibration and validation periods.

Evaluation

The evaluation was performed separately for the calibration and validation periods (see Fig. 2). For the calibration period, the improvement in streamflow simulations as a result of the calibrated parameters () was assessed by comparing with a baseline simulation (), which was generated using default model parameters. The Kling-Gupta efficiency (KGE), correlation (R), percent bias (B) and Nash–Sutcliffe efficiency (NSE) (Gupta et al., 2009) were used as evaluation metrics. The relative improvements were expressed as skill scores:where: denotes KGE skill score; is KGE with default parameters; is KGE with calibrated parameters; and indicate KGE of a perfect simulation (=1). Positive (negative) values indicate improved (deteriorated) skill after calibration. The best value is 1. For each case, the KGE was computed with reference to streamflow observations (). A similar skill score computation was repeated for R and NSE. In order to have a comparable range of skill scores (i.e., with the best score being 1), the B skill score (BSS) was computed using the absolute values of the percent bias as follows:where the Bperf (=0) is the B of a perfect simulation, Bcal is B after calibration and B is B with default parameters. The second evaluation, during the validation period, was performed using streamflow observations from a different time period, to assess the temporal transferability of the calibrated model parameters.

Fig. 2

The schematics of the streamflow evaluation for the calibration and validation periods (see Fig. 1 for the time periods).

Results

Streamflow with default parameters

To evaluate the accuracy of the baseline streamflow simulation, we first compared it with observed streamflow data. The computed scores during the calibration period (KGE, B and R) show a large regional difference in the performance of the streamflow simulated with default parameters (Fig. 3). The KGE score is high (e.g., >0.2) in most parts of the Amazon, southeastern United States, Europe, northwestern North America and Russia. However, the KGE score is low (less than0) for most part of Africa, Midwestern United States and Australia. The regions with low KGE score also have large bias (positive or negative) in the simulated streamflow (Fig. 3b). Specifically, large underestimation in the United States and large streamflow overestimation in Africa (e.g., Nile, Congo, Niger basins) and Murray-Darling in Australia contributed to the low KGE score in those regions. Tropical basins have high daily streamflow temporal correlation (>0.6), but parts of North America have low temporal correlation (Fig. 3c). The streamflow overestimation in the tropics is consistent with previous reports that atmospheric models have a tendency of overestimating rainfall amounts (e.g., Beck et al., 2017, Trenberth et al., 2011, Kang and Ahn, 2015). The reason for the large streamflow underestimation in North America is, however, unclear since atmospheric models tend to have low bias in high latitudes.

Fig. 3

Performance of the baseline simulation (with default LISFLOOD parameters).

Streamflow with calibrated parameters

KGE skill gain

To quantify the improvements in simulated streamflow as the result of calibrating the LISFLOOD model parameters, we calculated the KGE skill score for each calibrated river basin both for the calibration and validation periods (see Fig. 4). The skill scores during the calibration period show that the streamflow skill improved due to calibration for the majority of the river basins. Overall, model calibration improved for 67% of the basins (with KGE skill score of >0.08) with a median KGE skill score of 0.15 (P90 = 0.46 and P10 = −0.002). However there was no skill gain (29% with KGE skill score close to 0) or skill loss (4% with negative KGE skill score, <−0.08) for the remaining basins after calibrating the LISFLOOD model parameters. If North America, where there was a large negative bias and a considerable number of skill loss after calibration was observed, is excluded the percentage of skill gain would increase to 77%, and the remaining stations will have 1% skill loss and 22% no skill.

Fig. 4

The KGE skill score during calibration (top) and validation (bottom) periods.

The KGE skill score during calibration (top) and validation (bottom) periods. The overall KGE skill score for the validation period is slightly worse than the one for the calibration period. Globally, calibrating the model parameters improved the performance for 60% of the stations; we found skill loss for 10% during the validation period. The median KGE skill score is 0.12 (P90 = 0.42 and P10 = −0.075). In a similar pattern to the calibration period, the skill score map shows that the majority of the skill loss occurred in North America.

Regional breakdown

To investigate the regional variations and the impact of catchment area on the skill gain we present the regional breakdown of the KGE, NSE and R skill scores (Fig. 5). Results reveal that the skill score varies across regions. On the one hand, the KGE skill improved (KGESS > 0.08) for a significant number of sub-basins across all regions: 84% in in South America, 82% in Europe, 67% in Australia and Oceania, 62% in both Africa and Asia, and 52% in North America. This indicates that the calibration of the LISFLOOD model parameters improved the performance of the simulated streamflow; however the extent of the improvement varies across regions. On the other hand calibrating only the selected model parameters has limitations: there was no added skill (KGE skill score ∼ 0) compared to the baseline streamflow for 39% of basins in North America, 38% in Asia, 35% in Africa, 33% in Australia, 16% in South America, and 14% in Europe. Furthermore, there is a small number of basins with negative KGE skill score (e.g., 9% in North America). The median NSE skill score is similar to those of KGE across all regions, but it has more spread for most of the continents. The R skill scores show different patterns, notably in South America where the large KGE skill gain after calibration has not been reflected in R skill score. Possible influence of the inherent bias in baseline simulation on the skill scores is investigated in Section 5.3. While here we use the calibration period to examine the regional dependence of the skill score, a similar pattern was observed for the validation period.

Fig. 5

Regional variations of KGE, NSE and R skill scores (B skill score is presented below).

Separation by hydro-regions

To check if hydroclimate was a factor for the skill improvement, we show the calibration skill score variations with hydro-regions (see Fig. 6). The global river basins are classified by Meybeck et al. (2013) into hydro-regions (also called ‘hydrobelts’) based on mean annual runoff and temperature. According to the classification by Meybeck et al. there are five global hydro-regions: Boreal (mean annual temperature below 0 °C), mid-latitude (mean annual temperature between 0 and 15 °C and runoff between 150 and 750 mm), subtropical (mean annual temperature between 15 and 25 °C and runoff between 150 and 750 mm), equatorial (mean average temperature exceeding 20 °C and runoff > 750 mm), and dry (mean annual runoff less than 150 mm) regions. The results show that hydroclimate is an important factor with regards to the calibration skill improvement. The equatorial and subtropical basins showed the best skill score with almost all basins having a positive KGE skill score. While the large majority of basins in mid-latitude (primarily in North America as discussed above) have positive skill score, there are several basins with poor skill after calibration. Boreal and dry basins have large majority of basins with enhanced streamflow skill, but also a handful of basins with negative KGE skill score. The NSE and R skill scores also vary across regions. The equatorial and subtropical regions showed relatively higher NSE skill score and lower R skill score compared to other regions.

Fig. 6

Distribution of skill scores classified based on hydro-regions (Meybeck et al., 2013).

The impact of bias on the skill score

A potential bias in streamflow may not be adjusted with tuning the model parameters listed in Table 1. This is especially important when there is a negative bias in the runoff forcing in which the water deficit cannot be improved unless the runoff bias is first corrected. To examine weather runoff bias contributed to the calibration skill, we present the KGE skill score as a function of streamflow bias in the baseline simulation (generated with default parameters) for different regions (see Fig. 7). We found that the predominant stations with negative KGE skill score are those with negative bias in the baseline streamflow. Furthermore, almost all stations with positive bias in the baseline streamflow have improved KGE skill score. This is because positive streamflow bias can be corrected by tuning the groundwater percolation or loss parameter to transfer more water to the deep groundwater storage.

Fig. 7

KGE skill score as a function of bias in the baseline simulations.

How do parameters adjust?

To understand how model parameters adjust to the bias in the baseline streamflow, we look at two hydro-regions with contrasting KGE skill scores. Equatorial and mid-latitude basins have predominantly large positive and negative streamflow biases respectively. Fig. 8 presents the relative change in model parameter values after calibration (normalized by the range) for equatorial basins. The normalized change for each parameter is defined as (Pc-Pd)/(Pmax-Pmin), where Pc is calibrated; Pd is default; Pmax is maximum; and Pmin is minimum parameter values (see Table 1). The most important parameter in improving the KGE in the equatorial basins (with predominantly positive bias) is ‘GwLoss’, which is a fraction of lower zone outflow transferred to deep groundwater storage and does not rejoin the river channel. This indicates that the calibration algorithm increases the groundwater loss parameter to correct for positive bias in the simulated streamflow. The KGE skill score is not as sensitive to changes in other parameters in the region.

Fig. 8

Impact of the change in model parameters after calibration on the KGE skill score. Each dot in the scatter plot represents a calibrated sub-basin in the equatorial hydroregion (where there is a large positive bias in the baseline streamflow). Horizontal axis shows the bias in the baseline streamflow and the vertical axis shows the normalized change (after calibration) for each parameter. Zero on y-axis indicates no change. However, it is more challenging to improve the forecast skill when there is a water deficit in the simulated streamflow due to, for example, an underestimation in the runoff depth. The LISFLOOD model, under the current GloFAS setup, controls the timing of streamflow and the groundwater percolation, but it does not have any parameter to adjust for the deficit in the runoff depth. This is reflected in the mid-latitudes (Fig. 9) in which a large number of basins have negative streamflow bias. The KGE skill gain was obtained for the limited number of basins with negative streamflow bias by tuning the channel Manning's coefficient (using ‘calChanMan1′) perhaps through improving the flow timing (i.e., correlation). The deep groundwater loss parameter (‘GwLoss’) remains important for basins with positive bias in the region.

Fig. 9

The same as Fig. 8 but for mid-latitude basins (with a significant negative bias in the baseline streamflow).

The same as Fig. 8 but for mid-latitude basins (with a significant negative bias in the baseline streamflow). As a further step, we look at the bias skill score, i.e., the relative change in the streamflow bias compared to the baseline simulation. Even though bias was not explicitly used as an objective function in the calibration algorithm, it is still generally expected to improve since it is one component of the objective function (KGE, Gupta et al., 2009). Fig. 10 shows the impact of calibrating model parameters on reducing the streamflow bias. There is a clear separation in the bias skill score based on the baseline streamflow: the bias was reduced for basins with overestimation in baseline simulation and it was increased for basins with streamflow underestimation. Similar to the case for KGE skill score, the parameter responsible for the water transfer to deep groundwater is the main responsible for the bias skill improvement. The overall results indicate that the negative bias cannot be corrected with the 8 parameters currently used in LISFLOOD model calibration, and that methods to reduce the bias in the runoff volume (e.g., calibrating the land surface model) should be explored.

Fig. 10

Streamflow bias skill score after calibration for all 1287 stations.

Discussions

The calibration of the LISFLOOD parameters resulted in a substantial improvement of simulated streamflow KGE score (both during the calibration and validation periods) for the majority of stations around the world. The KGE skill score varies across different regions mainly based on the bias in the baseline streamflow. On the one hand river basins with positive bias in the baseline simulation showed the largest improvements in the skill score after calibration and on the other hand those with negative bias showed the lowest skill score. There are possible reasons for the low skill score for some of the river basins worldwide (4% of the 1287 stations had negative KGE skill score). Firstly, it should be noted that the parameters for the land surface model (H-TESSEL), responsible for the partitioning of precipitation into runoff and evapotranspiration, were not calibrated in this work. This means that only the parameters related to the flow timing, variability and groundwater loss were altered during the calibration process, without changing other hydrological model parameters (such as those related to evaporation, infiltration, etc.). This limits the effectiveness of the calibration algorithm for basins where, for example, forcing bias is the main source of streamflow uncertainty. Secondly, uncertainties in the observed streamflow due to errors in station sites, data collection, human intervention, and other sources should always be taken into account (e.g., Wilby et al., 2017). Careful manual inspection and corrections (e.g., locating on the river channel, matching upstream area between provided and GloFAS estimated) have been performed for the stations used in this work; however the quality control of multi-source, diverse instrumentation global-scale hydrological dataset remains a challenging task. Thirdly, uncertainty in the forcing data may result in fitting model parameters to an erroneous input data. This could introduce additional errors to model parameters and could reduce the accuracy of streamflow simulation during the validation period. Finally, the calibration skill loss may partly be attributed to a premature stoppage of the evolutionary algorithm. Some basins may need longer than the maximum 372 simulations (15 generations) set in this work to converge to a global maximum. The following further steps can be taken to improve the skill score. Firstly, the bias in the surface and subsurface runoff should be reduced. This can be done through improving the skill in the meteorological (mainly precipitation) forecasts and by calibrating the land surface model (H-TESSEL, e.g., Orth et al., 2016). Calibration of the land surface model parameters, not just the LISFLOOD model parameters, may offset the large bias in precipitation forcing. Secondly, regionalization of model parameters can be applied to transfer model parameters from best performing basins to basins with similar characteristics, but with poor scores or with limited streamflow data (e.g., Beck et al., 2016, Samaniego et al., 2010). Finally, multiple objective functions (to account for the high flows) can be used and the EA calibration algorithm can be run for longer generations to prevent a potential premature stoppage. Longer runs with different stopping criteria (see Maier et al., 2014) than those implemented in this work may result in better solutions. This is, however, dictated by the time demands of such a computationally demanding global scale calibration. The distribution of streamflow observations available for use in this calibration study is unevenly distributed around the world, leaving GloFAS uncalibrated for several flood-prone rivers in developing regions. For example, major river basins in Asia with history of frequent flooding (e.g., Indus, Ganges, Brahmaputra, and Mekong) are not included in this study. To fill the gaps in streamflow observations, future calibration work could utilize satellite-based river level or width estimations (e.g., Revilla-Romero et al., 2015a, Andreadis et al., 2007, Brakenridge et al., 2007).

Conclusions

This work is a first attempt to calibrate the model underpinning GloFAS operational flood forecasts using daily streamflow data from 1287 stations around the world. An evolutionary algorithm with KGE as an objective function was applied. ECMWF reforecast was used as model forcing and a combination of improvement based criteria and exhaustion-based criteria were employed for stopping the calibration algorithm. Results show that calibrating the routing and groundwater model improved streamflow simulations (KGE skill score) compared with the baseline (with default parameters) for the large majority of stations. However, the skill gained varies depending on the bias in the baseline simulation: the largest improvements occurred in areas with positive bias in the baseline, while the lowest skill score was obtained in basins with a negative bias. The disparate sensitivity of skill scores to model parameters suggests that calibrating the routing and groundwater model parameters is an important step but it is not sufficient for reducing the streamflow forecast uncertainty. Future efforts in improving the accuracy of GloFAS flood forecasts could mainly focus on improving the global runoff estimation through reducing the errors in the meteorological forecasts (such as bias reduction) and improving the land surface model performance. In addition, explicitly including an objective function that is exclusively based on peak flows could potentially improve the flood forecasting.

2 in total

1. Borg: an auto-adaptive many-objective evolutionary computing framework.

Authors: David Hadka; Patrick Reed
Journal: Evol Comput Date: 2012-04-09 Impact factor: 3.277

2. The impact of lake and reservoir parameterization on global streamflow simulation.

Authors: Zuzanna Zajac; Beatriz Revilla-Romero; Peter Salamon; Peter Burek; Feyera A Hirpa; Hylke Beck
Journal: J Hydrol (Amst) Date: 2017-05 Impact factor: 5.722

2 in total