Literature DB >> 34219821

A Comparison of Six Transport Models of the MADE-1 Experiment Implemented With Different Types of Hydraulic Data.

Alraune Zech^1,2, Sabine Attinger^2,3, Alberto Bellin⁴, Vladimir Cvetkovic⁵, Gedeon Dagan⁶, Marco Dentz⁷, Peter Dietrich^2,8, Aldo Fiori⁹, Georg Teutsch².

Abstract

Six conceptually different transport models were applied to the macrodispersion experiment (MADE)-1 field tracer experiment as a first major attempt for model comparison. The objective was to show that complex mass distributions in heterogeneous aquifers can be predicted without calibration of transport parameters, solely making use of structural and flow data. The models differ in their conceptualization of the heterogeneous aquifer structure, computational complexity, and use of conductivity data obtained from various observation methods (direct push injection logging, DPIL, grain size analysis, pumping tests and flowmeter). They share the same underlying physical transport process of advection by the velocity field solely. Predictive capability is assessed by comparing results to observed longitudinal mass distributions of the MADE-1 experiment. The decreasing mass recovery of the observed plume is attributed to sampling and no physical process like mass transfer is invoked by the models. Measures like peak location and strength are used in comparing the modeled and measured plume mass distribution. Comparison of models reveals that the predictions of the solute plume agree reasonably well with observations, if the models are underlain by a few parameters of close values: mean velocity, a parameter reflecting log-conductivity variability, and a horizontal length scale related to conductivity spatial correlation. The robustness of the results implies that conservative transport models with appropriate conductivity upscaling strategies of various observation data provide reasonable predictions of plumes longitudinal mass distribution, as long as key features are taken into account.

Entities: Chemical Disease Species

Keywords: AMDE tracer test; contaminant transport; geostatistics; heterogeneous aquifers; model comparison

Year: 2021 PMID： 34219821 PMCID： PMC8243996 DOI： 10.1029/2020WR028672

Source DB: PubMed Journal: Water Resour Res ISSN： 0043-1397 Impact factor: 5.240

Introduction

Modeling contaminant transport by groundwater is a topic of great interest that stimulated intensive research in the last four decades due to its relevance to aquifer pollution (Dagan, 1989; Fetter et al., 2018; Gelhar, 1993; Rubin, 2003). The task of predicting transport faces several difficulties: the processes are of long duration, measurements are scarce, the subsurface medium is of complex heterogeneous structure subjected to uncertainty and many times, the geometry and the mass content of the contaminant source is also uncertain. Under these circumstances, models play an important role: they help understanding the involved processes, analyzing field data, and making long term predictions. Models developed in the past differ in conceptualization of the aquifer structure, in the required data, in quantification of transport, in the formulation of the governing equations and mechanisms they represent, in computational complexity, and in their goals. We focus on transport of plumes of conservative solutes in steady flow, driven by the natural head gradient. At the considered aquifer scale, the main mechanism responsible for plume spreading is macrodispersion (Zech et al., 2015, 2019) due to the spatial variability of the hydraulic conductivity . The effect increases with higher K heterogeneity as quantified for instance by the log‐conductivity variance . We focus on the quantification of transport by the spatial relative mass distribution in mean flow direction x. A few elaborate field experiments were conducted in the past, in order to gain understanding and validate models. A challenging one is at the macrodispersion experiment (MADE) site (Boggs et al., 1992) located near Columbus, Mississippi. It is situated in a highly heterogeneous aquifer, making it of relevance to many actual aquifers (Gómez‐Hernández et al., 2017). An important feature of MADE was the application of different observation methods, in order to characterize the spatial distribution of hydraulic conductivity (e.g., Boggs et al., 1990; Rehfeldt et al., 1992; Bohling et al, 2016). Long‐term tracer tests provide observed relative mass distribution in space at a few fixed times t (Boggs et al., 1990, 1995). It has motivated a flurry of works on structure characterization by field data and different modeling strategies (see e.g., Zheng et al., 2011). We examine the ability of six transport models of different aquifer conceptualization to predict the observed relative mass distribution without calibration of transport parameters: (a) the first order approximation (FOA) (Fiori et al., 2017); (b) the multi‐indicator model and self‐consistent approximation (MIMSCA) (Fiori et al., 2013); (c) the time‐domain random walk (TDRW) of Dentz et al. (2020); (d) the Binary Inclusions model (Zech et al., 2021); (e) a Binary Facies model, and (f) the flow reactors model (see Section 3 for details). We further examine the predictive power of the models by extending the time (t = 1,000 days) beyond MADE observations (t max = 503 days). The selected models adopt upscaling: they do not aim to reproduce the actual spatial conductivity distribution, but to represent the aquifer structure at a simplified level of complexity, which still accounts for the effect of heterogeneity on transport. Within that frame, models cover a wide spectrum of aquifer configurations. They differ in conceptualizations of formation structure and of transport, in the use of field data, in computational methodology and in effort. Key features they share are the advection dominated subsurface transport, flow, and transport initial and boundary conditions, and a model setup independent of transport observations. The limitation to transport models independent of measured concentrations is motivated by the aim of application to field situations, where transport tests at the relevant space and time scales are hardly available due to their duration and costs. Models use the independently inferred hydraulic data to predict advection‐dominated transport and do not incorporate mass transfer as an additional mechanism. Although mass transfer models are useful to predict non‐Gaussian plumes, particularly in fractured media, their application to field scale transport in porous media, such as MADE (Harvey & Gorelick, 2000), requires determining mass transfer rates from calibration by using concentration observations. We concentrate here only on predictive transport models relying solely on field data of aquifer properties and flow. We believe that the comparison of the models which differ in type and underlying field data, conceptualization of conductivity structure, and complexity of computations by using MADE as a platform is important in helping the research community to grasp transport issues and the users in selecting characterization strategies, goals of models, and method implementation. The plan of the study is as follows: Section 2 recapitulates the MADE aquifer characterization and transport experiment; Section 3 describes the methodology of the models, including their application to MADE; Section 4 is devoted to the model prediction for solute mass distribution in comparison to MADE observations, as well as at later time. Section 5 contains the general discussion on data comparison, while Section 6 concludes the study.

MADE Transport Field Experiment

The MADE experiment was the object of a large body of publications, dealing with aquifer characterization, structure, and conductivity analysis as well as transport interpretation; see for instance reviews of Zheng et al. (2011) and Gómez‐Hernández et al. (2017). We recall in the following only those aspects of direct relevance to the examined models.

Hydraulic Conductivity Distribution

The MADE site aquifer is composed of highly heterogeneous alluvial terrace deposits. Measurements of hydraulic conductivity at multiple locations (Figure 1a) were performed by flowmeter, slug test, and direct push injection logging (DPIL) (Boggs et al., 1990; Bohling et al., 2016; Rehfeldt et al., 1992). Granulometry analysis of soil samples provided additional estimates. Besides, two pumping tests provide equivalent conductivities K eq of the volume surrounding the wells (Boggs et al., 1992). The use of different techniques at the same site and subsequent application by different models offers an opportunity to examine their impact on transport prediction.

Figure 1

(a) Map of macrodispersion experiment (MADE) site according to Boggs et al. (1990) and Bohling et al. (2016): Locations of hydraulic conductivity measurements devices (colored dots); tracer test source area (black outline); and sampling network boundary (dashed outline) where bromide samples were collected. (b) Potentiometric surface map of head measurements according to Boggs et al. (1990). Black dot marks tracer test injection location. The spatially distributed observations (Figure 1a), carried out at different depths resulted in a large volume of data which served for geostatistical analysis. Figure 2 summarizes their outcomes, which is of interest for the application of the different models (Section 3). The most reliable and extensively used data are those based on flowmeter (Boggs et al., 1990; Rehfeldt et al., 1989) with a total number of 2,611 observations and more recently DPIL with 31,123 samples (Bohling et al., 2012, 2016).

Figure 2

Geostatistical measures of hydraulic conductivity K for macrodispersion experiment (MADE) from DPIL (direct push injection logging) (Bohling et al., 2016), flowmeter, grain size analysis, slug tests (Rehfeldt et al., 1992): log‐conductivity variance , horizontal and vertical integral scale (correlation length) I and I , respectively. Visualization of geometric mean K G, range of observed values from minimal to maximal ([K min, K max]), and range of one standard deviation around mean ([K Ge−, K Ge]). K eq denotes the equivalent conductivity for two large scale pumping tests (Boggs et al., 1990). The differences in the geostatistical parameters, revealed by Figure 2, especially in the geometric mean conductivity K G, are a result of the different properties of the observation methods as well as the density and locations of observation points. In particular, the difference between flowmeter and DPIL can be explained by the inability of flowmeter to detect low K values. While the maximal values K max are close (around 0.005 m/s) the minimal values K min differ by two orders of magnitude. The effect manifested in Figure 2 is that flowmeter K G is larger and is smaller. The difference in the longitudinal integral scales I suggests that the zones of low K values are less connected than those of higher magnitude. The analysis based on 214 soil samples is less reliable. Still, it provided input data for conductivity conceptualization based on the lithofacies approach (e.g., Carle & Fogg, 1996) as presented by Bianchi and Zheng (2016) and herein for a simplified binary structure approach. The impact of these difference upon flow and transport are discussed in Section 4.

Flow

In Figure 1b, we reproduce the head contour lines map of Boggs et al. (1992, their Figure 3). The head gradient is not constant, but slowly varying in space. The mean head gradient is between J ∈ [0.003, 0.0036], depending on the choice of boundary locations (Boggs et al., 1990).

Transport Experiment

We focus on the first tracer transport experiment, which was conducted in the years 1986–1988 (Adams & Gelhar, 1992; Boggs et al, 1990, 1992; Rehfeldt et al., 1992). The tracer plume displayed a non‐Gaussian longitudinal solute mass distribution with the bulk of the mass staying near the source and small amounts spreading over large distances.

Initial Conditions

A solution obtained by dissolving M 0 = 25 kg of bromide in 10 m3 of water was injected over a period of 48 h through five wells of 60 cm screen length. While the subsequent transport took place under natural head gradient conditions, the tracer solution was forced radially into the aquifer during injection. As a result, the initial tracer body was much larger than the expected extent of the body adjacent to the wells screens, which can be seen in the early tracer plume snapshot at 9 days after injection (Adams & Gelhar, 1992, Figure 5). The apparent upstream tracer spread is not a result of dispersive transport opposite to the main flow direction, but of tracer input history. This is important for models which assume ergodicity and for the setup of initial conditions. The injection mode also implies that the initial condition was flux proportional, with a preference of mass flowing in high‐conductivity channels.

Plume Detection and Data Aggregation (Upscaling)

We reproduce the observed longitudinal relative mass distribution of Adams and Gelhar (1992, Figure 7) at six times: 49, 126, 202, 279, 370, and 503 days after beginning of injection. The computation of is based on concentration C(x, y, z, t) sampled in a dense MLS network, which thins out with distance to the source (Figure 1a). Subsequently, C was numerically integrated over transverse planes (y, z), accumulated and averaged over slices 10‐m long in the x direction. The obtained upscaled longitudinal relative mass , where m(x, t) is the relative longitudinal mass (Section 3), is assigned to the centers of the slices at x = −5 m, 5 m, 15 m, … (Adams & Gelhar, 1992). The fact that the reported mass is an upscaled/aggregated quantity was overlooked by many previous studies which compared the observed with m, the modeled mass at fine scale, as mentioned by Fiori et al. (2019). The significance of data aggregation is discussed in Section 4.

Mass Recovery

The observed relative mass displays decreasing recovery rates M total/M 0 of 2.06, 0.99, 0.68, 0.62, 0.54, and 0.43, for the T = 49, 126, 202, 279, 370, and 503 days, respectively. Thus, the observed relative mass does not obey the mass conservation requirement , except at 126 days after injection (Figure 3). We do not normalize observed mass by the total reported mass , but use for comparison with theoretical models, as it is appropriate for a conservative solute.

Figure 3

Longitudinal mass distribution for models at T = 126 days in fine model resolution (left column) and upscaled (aggregated) form (Δx = 10 m, right column) against macrodispersion experiment (MADE)‐1 experiment data at linear scale (first row), log‐scale (second row) and in cumulative mass (third row). Vertical lines in third column indicate locations of 5% (dotted), 50% (dashed), and 95% (dashed dotted) recovered mass. Note that for the Binary Facies model the fine scale refers to a grid resolution of 2 m. The discrepancy in mass recovery rate is indicative of uncertainty of the data. As discussed in Adams and Gelhar (1992), the excessive mass recovery at t = 49 days could be due to spurious hydraulic connections among the multilevel samplers. This is a result of the method of installation and is enhanced by the pressure buildup during injection. Preferential sampling from high conductivity regions is also possible. In such case, the assumption of uniform tracer distribution employed in the spatial integration easily leads to an overestimated mass recovery. In contrast, in line with Fiori (2014), we attribute the decreasing mass recovery at later times to insufficient sampling in the downstream zone. The high heterogeneity implies possible channeling effects at MADE and mass trapping in low‐conductivity parts which are hardly sampled. Similarly, the inadequate coverage due to the much lower sampling density downstream leads to observation errors, particularly at the edges of a plume. Thus, the much lower sampling density makes it difficult to sample the leading edge of the plume, which may contain a significant fraction of the plume mass at MADE (Fiori et al., 2019).

Subsurface Transport Models

We compare six subsurface transport models, which were developed independently by the authors in the past or devised recently in the frame of the present study. Before providing a short description of each model, we outline the key features shared by all models.

Common Features and Prerequisites

The following features determined the selection of the models considered herein: We consider only predictive models. They rely solely on structural and flow data, which can be measured independently of transport. Models with parameters calibrated by transport experiments are not considered. Flow is steady and driven by the natural head gradient J in the x direction. Transport of solutes is the result of advection by the Eulerian velocity field with neglect of local dispersion, which has a negligible effect on . Spreading is caused by the spatial variability of conductivity K(x), while the effective porosity θ is constant. Along Equation 1, we do not consider dual‐domain or transport models which use mass transfer parameters. Mass transfer effects are modeled indirectly here by transport in low conductivity zones. A plume of mass M 0 of a conservative solute is injected in the aquifer at t = 0, through the five injection wells. Spreading is quantified by the mass arrival at a control plane at x: , where C is the concentration. For a fixed x, M = M total/M 0 is the BTC, whereas for a fixed t it is the relative mass accumulated beyond x. We determine the fine scale relative mass spatial distribution m(x, t) = −∂M/∂x at a few times t. However, in line with MADE observations, we calculate the (upscaled) relative mass averaged over a medium slice of length Δ = 10 m centered at x, which is given by = (1/Δ)[M(x − Δ/2, t) − M(x + Δ/2, t)] such that . The upscaled relative mass is derived for the MADE conditions at T = 49, 126, 202, 279, 370, and 503 days after injection toward comparison of models outcome with measured . Additionally, models are applied to prediction of at t = 1,000 days for models intercomparison. The analysis is focused on the prediction of the mass ensemble mean for comparison with measured mass. Although a complete analysis would also comprise measures of uncertainty, this task is not undertaken here because of the limited extent of the paper and its aims. Remark on additional MADE transport models: Several other transport models which are not discussed here have been presented in the literature for MADE. We do not consider models calibrated on transport observations since they are not predictive, such as the work of Barlebo et al. (2004). Similarly, we exclude dual‐domain models (e.g., Feehley et al., 2000 and Harvey & Gorelick, 2000) and the continuous time random walk (CTRW) model of Berkowitz and Scher (2001). The lithofacies approach of Bianchi and Zheng (2016) or the analog data‐based approach of Pirot et al. (2015) would have been candidates, but results are only available for the MADE‐2 tracer experiment setting rather than MADE‐1 considered here. The same holds for the work of Salamon et al. (2007). Their conceptual framework is, however, underlying the facies model herein. Similarly, the fractional advection‐dispersion model of Benson et al. (2001) was applied only to MADE‐2 data. Furthermore, not all parameters, such as the skewness parameter β, are fully predictive based on structural data only. Dogan et al. (2014) presented a predictive fully numerical transport model for the MADE‐1 experiment based on flowmeter and DPIL measurements. They generated detailed representations of the K field conditioned on observations in a small sector of the MADE site aquifer and subsequently solved the flow and transport equations. However, due to the computational effort, transport simulation results are limited to a fraction of the total plume transport distance. Thus, their results are unsuitable for model comparison, particularly with prediction beyond observation times.

Brief Description of Models

Here, we summarize the main features of the six models, their application to MADE and the use of observational K data.

First Order Approximation (FOA)

Background: The solution of flow and transport in heterogeneous formations of a random log‐normal conductivity distribution, that is of normal Y = ln K, by a first order approximation in the log‐conductivity variance was the topic of intensive research in the last four decades, for example, Dagan (1989); Gelhar (1993); Rubin (2003), which led to various analytical solutions. K‐Structure: The random Y field is regarded as stationary and multi‐Gaussian. It is characterized completely by the mean K , the variance , and the axisymmetric two point covariance C of horizontal integral scale I and vertical one I . The anisotropy ratio e = I /I is generally smaller than unity. Flow: The mean velocity is given by U = K eff J/θ, with the effective conductivity determined from the solution of the flow equations for the random velocity field. The latter is obtained by expanding the mass conservation equation and Darcy's law in power series of Y′ = Y − 〈Y〉. Transport: The mean relative mass distribution 〈m(x, t)〉 was derived for an initial condition of given deterministic resident concentration C 0(x, 0) either in a volume V 0 with C 0 = M 0/(θV 0), or with mass concentrated over an area A 0 at x = 0 and quantified by m 0 = M 0/A 0 (Kreft & Zuber, 1978). Similarly, detection was in the resident mode with the mean cumulative mass 〈M〉 and the distributed one 〈m〉, satisfying the advection‐dispersion equation (ADE); The solution for the resident mode is given by and (Fiori et al., 2017; Kreft & Zuber, 1978). The Gaussian solution 〈m 〉 applies to weakly heterogeneous aquifers and/or far from the injection zone. Fiori et al. (2017) presented the solution of the ADE Equation 1 for the more realistic conditions of flux proportional injection. The solution for the BTC is given by the inverse Gaussian distribution: The spatial mass distribution for a tracer pulse follows from Equation 2 with 〈m(x, t)〉 = −∂〈M〉/∂x. In contrast to the Gaussian solution 〈m 〉, it displays a skewed shape and lack of upstream dispersion, thus providing a more realistic representation of transport in heterogeneous aquifers. The macrodispersion coefficient D (t) = α ⋅ U in the ADE Equation 1, with longitudinal macrodispersivity α , was determined in the Lagrangean framework with the aid of the solute particles trajectories. At first order, α (t) grows from zero at t = 0 to an asymptotic constant value (Dagan, 1989). Preasymptotic α (t) was determined by a quadrature for an exponential covariance. It is approximated accurately by an analytical expression as function of mean flow velocity U, time t and aquifer statistics I and e (Dagan & Cvetkovic, 1993, Equation 20). The asymptotic value is attained after a travel distance of a few integral scales I. Jankovic et al. (2017) showed that the solution Equation 2 with a dispersion coefficient D (x) given by the first order approximation represents accurately the results of numerical simulations even for the large value of , except an underestimation of a few percent of the mass in the tail of late arrival. Application to MADE: Application of solution Equation 2 to MADE was based on the parameters derived from Bohling et al. (2016) by DPIL as follows: K G = 0.58 m/d, θ = 0.31, I = 9.1 m, I = 1.8 m, . Rather than the first order approximation, we used the more general formula by Zarlenga et al. (2018, Equation 5) to arrive at K eff = 2.28 m/d. After identifying the representative mean head gradient in the plume zone (Figure 1b) as J = 0.0036, the mean velocity is given by U = K eff J/θ = 0.026 m/d. Transient preasymptotic dispersion D L(x) = Uα L(x) is calculated according to Fiori et al. (2019, Equations C1 and C2) based on α L of (Dagan & Cvetkovic, 1993, Equation 20), with the asymptotic value m. Solving Equation 2 with D L(x) provides 〈M〉 and the related at the t and x values pertinent to MADE. Calculation of mass just takes seconds with an implementation of the analytical solution in a computer algebra system.

Multi‐Indicator Model and Self Consistent Approximation (MIMSCA)

Background: MIMSCA was developed in the last 15 years as an approximate model of flow and transport for aquifers of arbitrary degree of heterogeneity (Cvetkovic et al., 2014; Dagan & Fiori, 2003; Fiori et al., 2006). Its outcome has been compared with accurate numerical solutions for and applied to MADE. Fiori et al. (2019) recently used the model to assess the uncertainty of prediction due to nonergodic conditions or parametric uncertainty. K‐Structure: The aquifer is modeled as an ensemble of rectangular inclusions tessellating the space, similarly to layers of bricks. The elements are of dimension 2I × 2I × 2I . The block K values are assigned independently with random values from a univariate pdf f(K) and the structure is designated as the MIM. The univariate f(K) was chosen to be lognormal of parameters K , . This way, the random K field is completely defined in terms of four parameters, similar to the FOA (Section 3.2.1). Flow: The mean velocity is given by U = K eff J/θ. The mean effective conductivity K eff is derived by the self‐consistent approximation (SCA), a well‐established method in the literature on heterogeneous aquifers (Dagan, 1989). Here, it consists in solving the flow equations for a generic inclusion of conductivity K, submerged in a homogeneous matrix of the unknown K eff and determining the latter by the SCA argument. K eff follows as solution of a simple integral equation. Suribhatla et al. (2011) compared it with accurate numerical simulations with satisfactory agreement. Transport: Solute transport in the multi‐indicator conductivity structure was also solved with the SCA. The starting point is the travel time of a solute particle for the passage through an inclusions of conductivity K embedded in a matrix of K eff, which Fiori et al. (2003) determined analytically. The solute BTC 〈M〉 at x follows as sum over the travel‐time random residuals pertaining to the inclusions of different K lying between the injection and the control plane. Initial condition was of flux proportional injection at x = 0 while the mean BTC was derived by fast Fourier transform. Jankovic et al. (2003) arrived at a satisfactory agreement between the semi‐analytical results and accurate 3D numerical simulations. Fiori et al. (2017) showed that the bulk of the BTC is well approximated by the FOA, while MIMSCA captures also the long tail of a few percents of the solute mass observed in numerical simulations. Application to MADE: Fiori et al. (2013) applied the method to the MADE site transport setting to derive 〈m〉 and with an update in Fiori et al. (2019), motivated by a reevaluation of geostatistical input parameters by Bohling et al. (2016). The needed parameter values are the same as those given above for FOA. Calculating the mass distribution with the semianalytical solution of Fiori et al. (2013) required a few minutes in a computer algebra system.

Time‐Domain Random Walk (TDRW)

Background: TDRW and CTRWapproaches have been used extensively over the past two decades for the modeling of transport in heterogeneous porous media (Noetinger et al., 2016). Within this framework and based on the stochastic TDRW method of Comolli et al. (2019), Dentz et al. (2020) derived a predictive upscaled model that avoids calibration against transport observations. The basic idea is to quantify particle motion in spatially variable flow fields through a Markov process for equidistant particle velocities, whose steady state distribution is given by the flux‐weighted distribution of flow velocities. While details can be found in Dentz et al. (2020), we describe here the main features of the model. This modeling approach has been used and verified for the prediction of the evolution of particle velocity statistics, particle distributions, dispersion and breakthrough curves in pore, and Darcy‐scale heterogeneous porous media (Comolli et al., 2019; Hakoun et al., 2019). K‐Structure: Hydraulic conductivity is represented by a 3D spatial random field of log‐normal distribution. Thus, the random K‐field is characterized in terms of four parameters, similar to the FOA (Section 3.2.1): conductivity geometric mean K G, ln K variance, and correlation lengths ℓ and ℓ . Random K‐realizations were filtered such that the spatial mean and variance of the log‐hydraulic conductivity are within a 5% tolerance interval around the target values. Flow: The steady state groundwater flow equation is solved numerically on an ensemble of random K‐realizations of specified conductivity statistics. A unit head drop between inlet and outlet is considered and no‐flux boundaries are specified at the horizontal domain boundaries. The target variable is the magnitude of the Eulerian velocity v (x) (absolute value of the Eulerian velocity), which is characterized by a univariate distribution. Thus, thus v (x) = q(x)K J/θ, where q is the numerically derived flux. Transport: Transport is modeled by a continuous time random walk. We provide the key aspects of the particular conceptualization. For a detailed model description, the reader is referred to Dentz et al. (2020). Particles move at constant space increment at transition times that are obtained from the particle velocity. The plume mass distribution at a given time is equivalent to the particle distribution. The particle velocity is modeled as a stationary Markov process, whose steady state distribution p (v) is given by the flux‐weighted Eulerian flow velocity. The flux‐weighting is due to the fact that in this framework particle velocities sample the flow velocity equidistantly along path lines. This is in contrast to isochrone sampling in classical Lagrangian frameworks, for which the steady state distribution of particle velocities is equal to the Eulerian velocity distribution (Comolli et al., 2019; Dentz et al., 2016). The Markov process describing particle velocities is modeled through an Ornstein‐Uhlenbeck process for the normal scores. The normal scores are obtained by mapping the velocities first to a uniform and then to a unit Gaussian random variable. The model requires the Eulerian velocity distribution and advective tortuosity as inputs. The latter is given by the ratio of the mean Eulerian velocity and the mean Eulerian velocity component along the mean hydraulic gradient. Application to MADE: The model is parameterized based on the description of experimental conditions and aquifer properties as head gradient of J = 0.0036 and porosity of 0.31, according to Boggs et al. (1992) and Adams and Gelhar (1992). The retardation coefficient is set equal to one. The distribution of Eulerian velocity magnitude as the propagator of the upscaled transport model is derived using the geostatistical parameters of log‐normal hydraulic conductivity reported by Bohling et al. (2016) (Figure 2). The average velocity component in mean flow direction is given by m/s = 0.0167 m/d. The average Eulerian velocity magnitude is m/s = 0.0193 m/d. A point source particle distribution is assumed. Following Boggs et al. (1992) and Fiori et al. (2013), the initial mass distribution is approximately flux‐weighted. Thus, in this modeling framework, the initial distribution of particle velocities is set equal to the flux‐weighted Eulerian velocity distribution (Dentz et al., 2020). Computation of particles transport in a C‐implementation requires minutes on a usual computer.

Binary Facies

Background: An alternative approach to the continuous univariate Y distribution, used in the previous models is a hydrofacies model, which generates hydrofacies distributions compatible with the observational data. The method uses hydrofacies classification, instead of hydraulic conductivity as observational data. Among the geostatistical methodologies adopted for this purpose, the one based on the combined use of transition probability and Markov chain has been mostly employed since its introduction by Carle and Fogg (1996). It enjoys flexibility in handling juxtapositional tendencies among hydrofacies and the availability of the software T‐PROGS (Carle, 1999). In the context of the studies of MADE, Bianchi and Zheng (2016) presented an application to the MADE‐2 experiment by using five hydrofacies. Here, we apply the methodology to the MADE‐1 experiment in a more parsimonious way. We maintain the highly conductive hydrofacies and combine the remaining four into a single low conductivity hydrofacies. K‐Structure: The porous media is modeled by using two or more facies, each one of constant hydraulic conductivity. The spatial distribution of the facies is generated randomly based on transition probabilities, for example, by using T‐PROGS. Hydrofacies identification is usually based on granulometry analysis. Here, we focus on two hydrofacies of probabilities of occurrence p 1 and p 2, respectively. Hydraulic conductivities K 1,2 are the weighted arithmetic means of the hydraulic conductivity of all samples belonging to the same granulometry hydrofacies class. The hydraulic conductivity of the samples are calculated from the characteristic diameters d 10 and d 25 of the sediments by using a modified version of the Kozeny‐Carman expression proposed by Riva et al. (2010). Transition probabilities between hydrofacies are obtained by fitting a Markov model to the experimental transition probabilities (Carle & Fogg, 1996). They are expressed with the aid of the characteristic thickness and length for each facies, denoted as L , L , L , and L respectively. Commonly, samples density is relatively low in the horizontal directions, leading to uncertain experimental transition probabilities in these directions, even after assuming isotropy in the horizontal plane. Flow and Transport: Flow and transport are solved repetitively in a Monte Carlo framework by making use of numerical solvers, namely Modflow 2005 in combination with particle tracking (Pollock, 2012). Mean velocity U and mean relative mass distribution are obtained by ensemble averaging. Application to MADE: The Binary Hydrofacies model was based on the granulometry of 214 samples taken from the 38 boreholes at MADE (Boggs et al., 1990), which is a data set completely independent from those of DPIL used by the FOA and MIMSCA models. The parameters values pertinent to MADE were identified as: p 1 = 0.145, p 2 = 0.855, K 1 = 190 m/d, K 2 = 1.49 m/d, L = 0.702 m, L = 4.15 m. The low borehole density prevented reliable estimate of the transition probability in the horizontal directions. Thus, isotropy is assumed in horizontal direction and the characteristic lengths were based on the integral scales identified by Rehfeldt et al. (1992) and assuming the relationship L 1,/L 1, = L 2,/L 1, = e, with e being the anisotropy ratio. Thus, horizontal length scales are obtained by dividing the vertical ones, by the estimate of e = 0.0437 (Rehfeldt et al., 1992), arriving at L = 16.0 m, and L = 94.8 m, respectively. In the vertical direction, the Markov model of transition probability fitted to the experimental one was used. As already mentioned above, the horizontal length scale values shall be regarded as estimates, affected by uncertainty. A check of the results obtained with different, smaller, values (not shown) led to similar plume mass distributions except for the long distance tail of minute mass. An ensemble of over 200 independent Monte Carlo realizations of the hydrofacies distribution were thus generated. Flow was solved numerically with Modflow 2005 in a computational domain of 300 × 100 × 10 m3 with grid spacing of 1 m in horizontal and 0.5 m in vertical directions for each hydrofacies realization. Constant head boundary conditions were applied with a mean head gradient of J = 0.003. The adopted porosity value was θ = 0.31. Advective transport was simulated by tracking 1,000 particles, distributed according to the mass distribution measured at day 9 since the beginning of the tracer test and advected by the random velocity (in the absence of local dispersion) by means of the ModPath 6 package. The resulting ensemble mean velocity is U = 0.079 m/day with a standard deviation of 0.0046 m/d. Computations in a Modflow‐flopy environment allow to perform ensemble simulations within two hours on a common PC.

Binary Inclusions

Background: Further simplification of the previous Binary Facies model is achieved by representing the high conductivity zones as rectangular inclusions submerged in the low conductivity matrix. Furthermore, for the high length to thickness ratio of the inclusions it was found that flow and transport can be modeled approximately as two‐dimensional. We describe here briefly the application of the model to MADE presented in Zech et al. (2021), in which the stochastic conceptualization of the binary hydraulic conductivity took large scale deterministic information into account. K‐Structure: The aquifer is modeled as a binary random structure. The two possible conductivity values K 1 and K 2 represent areas of high and low conductivity, respectively. The spatial structure is modeled as nonoverlapping conductivity blocks of length I and width I , placed randomly within the computational domain. Site specific topological features, are integrated as deterministic structures, such as layers or blocks of different average conductivity. The MADE‐specific K‐structure model is outlined below. Flow and Transport: Flow and transport is solved for every random K realization, numerically making use of Darcy's Law and ADE solvers. Mean velocity U and mean relative mass distribution are obtained by ensemble averaging. Application to MADE: Zech et al. (2021) adopted the K‐structure for MADE and its characterizing parameter values based on the inspection of the piezometric surface map of Boggs et al. (1992, Figure 3), on two large scale pumping tests and on a few flow meter logs (Boggs et al., 1992). According to observations, the area near the source is dominated by low conductivity K 2 with inclusions of high K 1. The relative area of the inclusions is p = 15% determined from flow meter log analysis. Beyond a distance of 20 m downstream of the injection location the distribution is inverted: the bulk is dominated by the high conductivity K 1 with p = 15% inclusions of low K 2. The inclusions are of thickness ℓ = 0.5 m and length ℓ ∈ [5, 10, 20] m. The former was determined from flowmeter observations, while the latter are subjected to parametric uncertainty. The range was determined from the ratio of thickness and observed anisotropy value. Three inclusions of 0.5 m thickness are placed at random in the vertical direction, while the length and horizontal position of inclusions are fixed for every realization. Given the inclusion occurrence of p = 15%, an aquifer thickness of 10 m and a thickness of 0.5 m, a total number of three inclusions per block are randomly placed in vertical position with equal probability. Altogether, an ensemble of 600 conductivity structures was created with random inclusion structures, with groups of 200 realizations of same length inclusion length ℓ of 5 m, 10 m, or 20 m. Flow and transport was simulated for each random K‐structure by solving the groundwater flow equation, and subsequently the ADE with the Finite Elements Method software OpenGeoSys in a 2D cross section of 220 × 10 m. The boundary conditions were of constant head on the vertical boundaries defined by the selected head gradient J = 0.003. The initial condition for transport was imposed by injecting the solution at a constant flow rate of Q = 1.15 ⋅ 10−5 m3/s during a period of T = 48.5 h (Boggs et al., 1992). Porosity θ = 0.32 and local dispersivities α = 0.01 m and α = 0.01 m were assumed. The resulting ensemble mean velocity was U = 0.0254 m/d with a standard deviation of 0.02 m/d. Computation time on a common PC was 200 h of CPU time for the entire ensemble.

Flow Reactors

Background: The Flow Reactor model, or short Reactors, conceptualizes subsurface solute transport as series of reactors which are linked to aquifer structure. The concept allows to model transport in a setting where tracer is injected into a low permeability zone, capturing strongly constrained downstream movement and skewed plume shapes. The concept was applied for example, by Molin and Cvetkovic (2010). K‐Structure: Hydraulic conductivity is conceptualized as stationary log‐normal spatial random function Y with continuous two‐point correlation structure. The geostatistical parameters are those reported by Bohling et al. (2016) based on DPIL for the model application to MADE of same values as those adopted in the First Order and MIMSCA models (Sections 3.2.1 and 3.2.2). Flow: Mean uniform flow velocity is calculated analytically based on Darcy's Law: U = K eff ⋅ J/θ. The effective conductivity K eff is a function of the geostatistical parameters of log conductivity. Transport: Transport is modeled analytically as series of flow reactors. The number of reactors is a function of the ratio x/Δx of the transport domain size x and the velocity fluctuations length scale Δx. We view the transport domain as x/Δx flow reactors, each with a mean turnover time Δx/U. The residence time within each reactor is an exponential function exp(−t/Δτ) = exp(−tU/Δx). Thus, the residence time pdf of the reactor series is the Gamma function: The cumulative density function of residence time is obtained by integration as where γ is the lower incomplete Gamma‐function. The spatial tracer distribution, as the tracer position pdf p(x, t) (1/L) at time t follows from as: where G is the Meijer g‐function, and ψ 0 is the Polygamma function. With unit tracer mass, we have m(x, t) = p(x, t). Note that in the limit Δx → 0, the number of reactors tends to infinity x/Δx → ∞, and we recover plug flow as f(t, x) → δ(t − x/U). Like in the MIMSCA model, the length parameter Δx, the integral scale of the velocity fluctuations, is assumed to be in the range of two to three log‐conductivity horizontal correlation length I. Application to MADE: Mean flow velocity U = 0.026 m/d is derived analogously to the FOA model from DPIL‐based geostatistical K‐parameters (Bohling et al., 2016), resulting in K eff = 2.28 m/d (Zarlenga et al., 2018), head gradient of J = 0.0036 and porosity of θ = 0.31. The mass distribution follow from applying the pdf Equation 5 with Δx = 2.5I making use of the correlation length I = 9.1 m from DPIL observations (Bohling et al., 2016). Calculation of mass with an implementation of the pdf Equation 5 in a computer algebra system is a matter of seconds.

Results and Comparison

Data Preparation and Display

Visualization of modeled mass distributions allows qualitative assessment and comparison of models. Perception of model performance is subjective depending on the display mode with critical aspects being the display scale, data upscaling (aggregation), and normalization. We present longitudinal mass distributions at linear and logarithmic scales as well as in a cumulative form 〈M〉, to achieve a comprehensive display. The various display modes of the spatial mass distributions allow to interpret a few plume's specific features: (a) mass peak location and bulk behavior; (b) the tails (forefront and trailing zones), and (c) mass recovery. They are relevant to specific goals such as risk assessment and remediation. In line with the MADE experimental results, we remind that is aggregated over intervals of 10 m. We illustrate the effect by displaying the model outcomes at T = 126 days in both forms, the fine scale 〈m(x, t)〉, and the upscaled (Figure 3) as well as 〈M〉. All modelled longitudinal mass distributions are normalized with respect to the injected mass such that the area beneath m or is unity. For the MADE data, this is only true for t = T = 126 days, where mass recovery is around 99%. Spatial moments are commonly used to quantify the comparison between different models and measurements. However, this is not appropriate here for two reasons: (a) the skewed mass distribution leads to biased results when using a moment analysis and (b) the decreasing mass recovery in combination with the observation data uncertainty of the leading tail. The latter precludes the use of common fitness measures. Instead, we compare in Figure 4 recovery locations at T = 126 days for 5%, 50%, 95% as predicted by models relative to MADE values of x = 0.7, 8.6, 42.8 m. The recovery location x is defined as the distance for which α% of the mass is upstream of x.

Figure 4

Model‐experiment‐comparison of recovery locations x for 5%, 50%, and 95% mass recovery (columns) at 126 days after injection where mass recovery was 99% in the experiment. Macrodispersion experiment (MADE) values at the x‐axis (in m) and model values as absolute (in m) and relative (in %) difference in colored bars and in numbers.

Comparison Between Models Prediction and MADE Experiment

Figure 3 displays the longitudinal mass distribution for all models and the MADE experimental data at T = 126 days after injection: 〈m〉 at model's fine resolution and aggregated over 10 m intervals, including MADE. Direct comparison is the most revealing at that time since the experimental recovery rate is 99%. The various models display some differences in their mass distribution 〈m〉 at fine scale. Particularly, the peak value is higher for the flow reactors by a factor of 2 than the other models. The Binary Facies model displays plume tailing downstream of the other models, with x 95% ≅ 60 m, while for all the others models 30 ≲ x 95% ≲ 40 m (see Figure 3e). The comparison between 〈m〉 and the upscaled in Figure 3 reveals a few interesting features: upscaling reduces the peak values of 〈m〉 by a factor of around 2, the spreading zone is expanded, and the differences between model predictions are greatly reduced. In particular, the predicted agrees quite well with MADE, much better than 〈m〉, as far as visual inspection reveals. This is expectable though in the past models prediction at fine scale were compared with MADE (e.g., Dogan et al., 2014; Harvey & Gorelick, 2000). The upstream spread of the aggregated for MADE is partly an artifact of upscaling: it smears the upstream forced injected mass of the initial solute body over 10 m. It could be erroneously interpreted as upstream dispersion. The quantitative results in Figure 4 on recovery locations relative to MADE strengthen the conclusions from the visual inspection. All models are in good agreement with MADE for the location x 50%. We attribute that to the closeness of the mean velocity U of various models (see previous section). This conclusion is in agreement with the results of Eggleston and Rojstaczer (2000), who found significant differences in the prediction of m(x, t) at MADE by different interpolation schemes for K; such schemes led to quite different estimates of U, as manifested by the different arrival times of the peak. The Binary Facies differ by 15% due to the higher velocity U, as a result of the different conductivity statistics of the grain size analysis (Figure 2). The agreement is still good for x 5% with some deviations for the Binary Inclusion model. Last, the models prediction relative to MADE of x 95%, reflecting the “fast” moving solute, is more variable, but still within acceptable differences in practice. Figure 5 summarizes the model and MADE results of for all times for which MADE experimental data is available. The comparison at these times with MADE is more difficult than for T = 126 days because of the variable mass recovery, which is visible in Figure 5 in the cumulative mass panel 〈M〉. For none of the displayed times, experimental mass recovery is complete, explaining the apparent mismatch between experiment and models.

Figure 5

Longitudinal mass distribution of models and macrodispersion experiment (MADE)‐1 experiment at observation times 49, 202, 279, 370, and 503 days after injection (a–e) and for models at t = 1,000 days (f) at linear scale (first column) and log‐scale (second column). Third column displays cumulative mass with recovery locations of 5%, 50%, and 95% mass. Model results and MADE‐1 experimental data differ in the recovery rates: 1 for models at all times while 2.06, 0.68, 0.62, 0.54, and 0.43 for the MADE‐1 experimental data, respectively. In spite of the incomplete mass recovery for times T ≥ 202 days, models agree reasonably well with both measured peak value and its distance from the injection zone. While for the portion upstream the peak position recovery was likely achieved, disagreement is visible at distances beyond 20 m from the injection zone (Figures 5b3–5e3). We attribute the reduced mass recovery for later times to the less dense sampling in the downstream zone (Figure 1a) in combination with not‐sampled solute quickly moving in high conductivity channels (discussed in detail in Section 2.3). For late times, such as T = 503 days the low mass recovery (43%) makes the comparison more problematic. Still, the peak location and also its value are within acceptable differences in practice. Despite variable mass recovery through time, we find the agreement of model predictions and experimental results on these key plume parameters to be encouraging. As for intercomparison of models prediction, inspection of the robust cumulative mass 〈M〉 at different times shows remarkable closeness except for Binary Facies. The latter overestimates the location pertaining to values of 〈M〉 > 0.4. This enhanced tailing is attributed in part to the larger integral horizontal scale identified from granulometry analysis (Figure 2), as compared to that resulting from DPIL, which was used by most of the models. In addition, it may be related to channels of high conductivity present in some realizations of the binary K field. It is remarkable that for both and 〈M〉, the predictions by FOA, MIMSCA, and TDRW are very close for all x (see discussion in Section 5).

Prediction Beyond MADE Experiment

The important role of models is to provide prediction of future solute plumes development. In order to compare the outcome of the six different models considered in the present study, we have used them for the same MADE conditions, but at larger time than T = 503 days. Thus, the long term (at T = 1,000 days) predicted plume mass spatial distribution is displayed in Figure 5f. All models agree in the peak travel distance as a consequence of similar flow velocities U. However, the differences between the peak value are more pronounced but still within a factor of two and even less for all models, except the reactors, which predicts a higher peak and reduced tail. As for prediction of forefront tailing associated with fast moving solute, both Binary Facies and Binary Inclusions display longer tails with 80 ≲ x 95% ≲ 140 m. Inspection of the cumulative distribution reveals again that prediction of models, except Binary Facies, are within a relatively narrow distribution.

Discussion

Longitudinal mass distributions (Figures 3 and 5) showed good agreement of all models in the general plume shape and mass peak, but observable differences among models. We attribute these differences to different conceptualizations of aquifer heterogeneity in combination with conductivity parameters from different field observation methods. Both aspects are discussed in detail in the following subsections. All modes share the same controlling physical processes (Section 3.1), despite various mathematical formulations. In light of the different conductivity conceptualization and parametrization, we attribute the similarities in plume shape to the fact that models share the key features: mean flow velocity, a parameter reflecting log‐conductivity variability, and taking into account horizontal correlations of conductivity values.

Modeling

The six models applied to the MADE setting can be divided into three groups based on conceptual similarities and in line with simulation results: (a) FOA, MIMSCA, and TDRW; (b) Binary Facies and Binary Inclusions; and (3) reactors.

FOA, MIMSCA, and TDRW

The three models regard conductivity as a multi‐Gaussian stationary random field, completely characterized by geostatistical parameters K G, , I, I . The mean flow velocity is given by U = K eff J/θ (Zarlenga et al., 2018). Models differ in conceptualization and computational complexity. FOA is analytical, leading to an inverse Gaussian mass distribution 〈m〉 which satisfies an ADE (Equation 1) with macrodispersivity derived analytically from fist order approximation. MIMSCA is semianalytical, based on summation of travel time through inclusions of random K. TDRW is seminumerical, with the velocity field derived by Monte Carlo simulations, while transport is based on an approximation of the Lagrangian velocity field. Computational effort for solution of transport is low for all models, being in the order of minutes for a common PC. The longitudinal mass distributions of all three models are very close and in good agreement with the bulk of MADE experimental data. Plume travel distances are similar (Figure 4), the same holds for mass peak locations and values even without data aggregation (Figure 3), which we attribute to the use of similar flow velocities. Hence, models with a multi‐Gaussian conductivity conceptualization provide a robust prediction of plume behavior. Reliable predictions depend less on the specific model implementation but rather on selecting appropriate velocity U and heterogeneity representative parameters and I.

Binary Facies and Binary Inclusions

Both numerical models use a binary conceptualization of heterogeneity: hydraulic conductivity is modeled by adopting two values K 1, K 2 of volume fractions p 1 and p 2 = 1 − p 1. Two length scales L , L characterize the K‐facies, whose geometry has random elements. Flow and transport are solved numerically and repeatedly by Monte Carlo simulations. Mean values of flow velocity U and mass distributions are achieved from ensemble averaging, which also allows obtaining the statistical moments of these quantities. The two models differ in dimensionality, binary structure generation, and measurement input data: binary Facies is 3D and random facies geometries are generated by transitional probability using TPROGS (Carle, 1999). It can be interpreted as a simplified form of the facies approach applied by Bianchi and Zheng (2016) for MADE2, by considering two facies instead of five. All the structural parameters are obtained from granulometry measurements. The binary inclusions approach simplifies further the 3D random structure by considering regular inclusions in two dimensions. It consists of identical rectangular inclusions of conductivity K 2 submerged in the K 1 matrix for two dominant deterministic zones of different mean conductivity which could be delineated from head observations. Structural parameters are derived from pumping tests and few flowmeter measurements in a procedure adapted to the particular features of the MADE site. Both models agree with the MADE1 experimental data. They are particularly able to reproduce the spreading of solute in the upstream zone of the plume, as well as migration of mass at low concentrations far downstream. While both models predict the peak mass and location, the Binary Facies displays enhanced downstream transport of tracer which is related to the higher mean flow velocity implied by the grain size conductivity statistics (Figure 2). The mean flow velocity of the Binary Inclusions model is slightly smaller than that used by FOA, MIMSCA, and TDRW from DPIL data due to the explicit representation of the low conductivity injection zone. The favorable comparison with MADE1 shows that numerical ADE models provide robust predictions of plume mass distribution derived through a Monte Carlo approach. This confirms results of Dogan et al. (2014) obtained with the aid of a detailed representation of conductivity distribution of a portion of the MADE site. Furthermore, the results for binary structures shows that the key element is accounting for conductivity contrasts rather than high resolution and a detailed specification of conductivity through conditioning. The semianalytical models are relatively simple for computations but use a more extensive DPIL data set, whereas Binary Inclusions and Binary Facies modeling approaches are heavier on computation but use more readily accessible data sets.

Reactors

This model of different conceptualization was motivated by its general use in engineering and convenience for reactive transport modeling. Central parameter is the velocity U, which is derived from effective conductivity K eff (Zarlenga et al., 2018). The second parameter, the velocity longitudinal correlation length is related to the conductivity integral scale I. The degree of heterogeneity quantified by is not included. Instead, the series of reactors aggregate the inherent dispersion making the model not a general candidate for modeling advective transport in a heterogeneous aquifer. Its analytical nature makes the application of the model very simple, allowing calculations of mass within seconds. The mass distribution and the aggregated one agrees well with the observed one at MADE, though it overvalues the peak, particularly at larger times, for example, by a factor of 1.5 compared to all other models for T = 1,000 days. The transferability of the reactors model to other sites need to be tested. However, the findings confirm the predominant role of the mean velocity U and correlation scale I for robust transport prediction.

Data Selection

The conductivity distribution at the MADE site as inferred from different characterization methods is summarized in Figure 2: (a) hydraulic profiling (DPIL) is an affordable technique for shallow aquifers (Dietrich et al., 2008) by which the most comprehensive data set was obtained. It is suitable for a geostatistical interpretation, including two‐point statistics; (b) Granulometry (or grain size analysis) is a standard method in hydrogeology that yields highly uncertain conductivity estimates (Vienken & Dietrich, 2011); (c) flowmeter measurements and pumping tests are the standard methods for hydrogeological characterization, the main limitation being accuracy of low K values (Figure 2). The semi‐analytical models FOA, MIMSCA, TDRW, and Reactors used DPIL observations to estimate the flow velocity and the plume spreading. The Binary Facies model used granulometry data while the Binary Inclusions model used pumping test estimates, information from head maps and a few flowmeter data. The mean velocity U of all models rely on the measured mean head gradient J and the porosity θ. The models differ primarily in the use of the K data. Still, the resulting estimates of the mean velocity U are relatively close: 0.026 m/d for FOA, MIMSCA, and Reactors; 0.019 m/d for TDRW, 0.025 m/d for Binary Inclusions, and 0.079 m/d for Binary Facies. Even the deviation for Binary Facies is within an acceptable range, various approximations notwithstanding. Thus, the estimates of velocity are quite robust, which explains the relative closeness of the predicted and measured locations of the peak of in Figures 3 and 5. Similarly, the prediction of plume spreading is quite robust. A potential explanation for the relative closeness of model results despite different data use is the closeness of the magnitude of the asymptotic longitudinal macrodispersivity . The resulting values for the different values of log‐conductivity variance and the horizontal correlation scale I (Figure 2) are 53.7, 54.1 and 49.6 m for DPIL, flowmeter and grain size, respectively. Although the above observations strictly apply to the MADE site only, they are nevertheless encouraging and motivate similar comparative analysis for example, for less heterogeneous aquifers for which experimental data are available.

Summary and Conclusions

With a variety of hydraulic data available, the MADE site provides a unique opportunity for a comparative analysis of predictive modeling. The present study takes advantage of these possibilities and compares predictions of plume transport by six previously developed models that use different information to characterize the spatial variability of hydraulic conductivity. Thus, the FOA (Fiori et al., 2017), the MIM and self‐consistent approximation (MIMSCA) (Fiori et al., 2013), and the TDRW (Dentz et al., 2020) models utilize extensive hydraulic profiling data (Bohling et al., 2016), for inferring geostatistical parameters. In contrast, the binary inclusions model (Zech et al., 2021) uses pumping tests and a few flowmeter data whereas the Binary Facies model relies on granulometric data. Models differ in the conceptualization of aquifer structure and theoretical formulations. They further rely on different field data input and differ in their computational effort. Common features of the models are: flow is steady, uniform in the mean and driven by a constant head gradient; solute spreading is caused by aquifer conductivity heterogeneity; models rely on structural and flow data with no calibration against transport observations; plume mass distribution is assumed to be ergodic, meaning that the modeled mean relative mass distribution is compared with the upscaled measured mass at MADE at a few times. Model predictions of advective transport is compared to the MADE‐1 transport experiment findings as quantified by the longitudinal mass distributions for time snapshots from 49 days upto 503 days. A model comparison is included at 1,000 days, which is beyond the period of measurements. Despite decreasing mass recovery of MADE‐1 observations, mass data is not normalized. The seeming mass loss for times beyond 126 days is attributed to limited sampling and thus not taken into account in the models. The discrepancy between models and observed total mass strongly impacts the perception of match of model and experimental data at late times. However, mass distribution peak value and location as well as general plume shape at MADE are well predicted by all models. The main and encouraging result for practitioners is that models conceptualized by relatively simple flow and transport equations, but explicitly taking aquifer heterogeneity into account, predict the complex MADE1 mass distribution reasonably well. The adopted measures of the solute plumes are robust and models are reliable as long as they are underlain by a few basic parameters: mean velocity U, a parameter reflecting log‐conductivity variability and one reflecting the horizontal degree of log‐conductivity correlation. However, the reasonable agreement in model results is also related to the particular quantity under examination: the longitudinal mass distribution which is aggregated over spatial intervals is quite robust itself. If other measures are employed, such as local concentrations, results might differ. The purpose of comparing models was not to recommend adopting a particular one, but to show that the robustness of model predictions allows the use of a variety of observation data and representations of hydraulic conductivity structures. The choice of models compared here is dictated by the requirement of their being predictive without reliance on transport data. Consequently, classes of models relying on parameter calibration by transport experiments, such as mass transfer models in porous media are not considered. However, our model choice does not preclude investigations of model classes which use calibration to transport observations when they are available. Although the model comparison is limited to the data and setting at MADE, we surmise that it indicates applicability to other sites as well as for predictive modeling of transport. Rendering the above conclusions of general validity requires extending the study to other field sites. Synthetic examples, for instance, log‐normal conductivity structures with different connectivities (Fiori et al., 2017) or facies structures (Carle & Fogg, 1996) would be an alternative given the limitation of controlled experiments at highly heterogeneous sites. Our results can provide guidance to the design of such new subsurface transport experiments.

3 in total