Literature DB >> 28303148

Differentiating Wheat Genotypes by Bayesian Hierarchical Nonlinear Mixed Modeling of Wheat Root Density.

Anton P Wasson¹, Grace S Chiu², Alexander B Zwart³, Timothy R Binns⁴.

Abstract

Ensuring future food security for a growing population while climate change and urban sprawl put pressure on agricultural land will require sustainable intensification of current farming practices. For the crop breeder this means producing higher crop yields with less resources due to greater environmental stresses. While easy gains in crop yield have been made mostly "above ground," little progress has been made "below ground"; and yet it is these root system traits that can improve productivity and resistance to drought stress. Wheat pre-breeders use soil coring and core-break counts to phenotype root architecture traits, with data collected on rooting density for hundreds of genotypes in small increments of depth. The measured densities are both large datasets and highly variable even within the same genotype, hence, any rigorous, comprehensive statistical analysis of such complex field data would be technically challenging. Traditionally, most attributes of the field data are therefore discarded in favor of simple numerical summary descriptors which retain much of the high variability exhibited by the raw data. This poses practical challenges: although plant scientists have established that root traits do drive resource capture in crops, traits that are more randomly (rather than genetically) determined are difficult to breed for. In this paper we develop a hierarchical nonlinear mixed modeling approach that utilizes the complete field data for wheat genotypes to fit, under the Bayesian paradigm, an "idealized" relative intensity function for the root distribution over depth. Our approach was used to determine heritability: how much of the variation between field samples was purely random vs. being mechanistically driven by the plant genetics? Based on the genotypic intensity functions, the overall heritability estimate was 0.62 (95% Bayesian confidence interval was 0.52 to 0.71). Despite root count profiles that were statistically very noisy, our approach led to denoised profiles which exhibited rigorously discernible phenotypic traits. Profile-specific traits could be representative of a genotype, and thus, used as a quantitative tool to associate phenotypic traits with specific genotypes. This would allow breeders to select for whole root system distributions appropriate for sustainable intensification, and inform policy for mitigating crop yield risk and food insecurity.

Entities: Chemical Disease Species

Keywords: generalized linear mixed models; heritability; hierarchical modeling; root architecture; wheat phenotyping

Year: 2017 PMID： 28303148 PMCID： PMC5332416 DOI： 10.3389/fpls.2017.00282

Source DB: PubMed Journal: Front Plant Sci ISSN： 1664-462X Impact factor: 5.753

1. Introduction

Meeting the food production requirements of a growing human population who are encroaching on arable land and generating a changing climate will require an intensification of agriculture, where greater yields are obtained from crops on existing farms with sustainable inputs of water and fertilizer (Gregory et al., 2013). This will involve identifying the constraints on yield in agricultural systems, many of which are to be found below ground in the root systems of crops. There are calls for a “second Green Revolution” (Lynch, 2007) focused on breeding crops with “ideotypic” (Donald, 1968) root systems (i.e., possessing desirable root system traits) that can overcome these constraints. This approach, called physiological breeding, is to be contrasted with breeding for increased yield alone, an approach which is no longer keeping pace with growing demands (Fischer and Edmeades, 2010; Richards et al., 2010; Hall and Richards, 2013). However, identifying ideotypic root systems for crops is fraught with difficulty. Root traits which can be identified in the laboratory are often difficult to translate to the field (Watt et al., 2013) because they are devoid of the developmental context of the soil. The soil environment is complex, and has a dominant effect on root system development (Rich and Watt, 2013). Furthermore, crop physiological models—which are used to formulate strategies for plant breeding and crop yield risk mitigation, and even to develop government policy—are often inadequate in addressing the spatial heterogeneity of root systems and soil properties (Holzworth et al., 2014). It is also difficult to sample roots in soil in the field, and the data obtained are complex to interpret. Nevertheless, it is in the field where the effects of soil, climate, and agronomy are integrated with the developmental genetics of the plants growing together as a crop, and hence it is also in the field where measuring root traits, identifying crop ideotypes, and modeling root development are most valuable. Selecting for root ideotypes in the field may speed up the identification of the best germplasm for breeding programs (Wasson et al., 2012; Rich and Watt, 2013). Therefore, integrating improved measures of root distribution/development into crop physiological models will improve farm management decision making and crop yield risk mitigation. Yet, indirect measurements of crop root systems are problematic, and most direct measurements are destructive, time-consuming and/or labor intensive (e.g., root washing, minirhizotrons) (Wasson et al., 2012). Hence the core-break method was developed as a method of rapidly observing and quantifying the presence of roots as a function of depth (Drew and Saker, 1980; van Noordwijk et al., 2001); a soil core sample is taken from the crop and broken at regular intervals (corresponding to increasing depth) and the exposed roots are counted. The counts correlate with the root length in the corresponding volume of soil. This technique has been used to phenotype root count distributions in 43 genotypes (Wasson et al., 2014) and efforts have been made to automate the root count process (Wasson et al., 2016) to reduce the labor requirements. However, root counts from the core-break method are subject to a high degree of variation between samples (van Noordwijk et al., 2001), which makes it challenging to identify genotype-specific traits from root counts or to associate genotypes with discernible properties of root count profiles. Similar types of experimental field data may have been analyzed by statistical linear models (Faraway, 2014) under an analysis-of-variance (ANOVA) framework (Wasson et al., 2014). However, a major limitation of linear models is their assumption of Gaussian (normally distributed) response data, whereas root counts are discrete, bounded below by zero, and with a distribution whose substantial right-skewness may not be easily removed by variable transformation. Indeed, root count data are more appropriately modeled as Poisson distributed, although a phenomenon known as overdispersion (McCulloch, 2008), commonly encountered in count data from field experiments, must be handled with care. More specifically, the Poisson distribution is characterized by a single parameter that represents the distribution mean as well as its variance. However, in practice, the count variable of concern often has a recognizably larger variance than its mean (hence, “overdispersion”), although the overall distribution still resembles Poisson in other respects. Therefore, linear models applied to field data thus far have focused on analyzing core-level summary metrics, such as maximum rooting depth, which, after variable transformation if necessary, can approximately behave as Gaussian (Wasson et al., 2014). However, such summary metrics by definition cannot reflect root structure over depth, discarding valuable information contained at the level of individual core segments, and consequently resulting in an undesirable loss of statistical power. To better facilitate our scientific objective of associating genotypes with discernible properties of root count profiles, in this paper we scrutinize the many facets of the inherent variability of the root count profile produced based on a field trial (Figure 1) that involved twenty genotypes (n = 20), each generating four replicated soil cores (n = 4) extracted in situ from each of four replicated plots or blocks (n = 4). Growing in a plot, as they would in a farmer's field, the plants' root systems interact and respond to each other. Their development is driven by the exploration of cracks and pores (White and Kirkegaard, 2010), which are randomly distributed. Likewise, variation in soil chemistry and nutrients, which can be patchy and vary with depth, drives the branching of roots. In contrast, impenetrable material and compaction can inhibit growth. As each soil core only captures a comparatively small piece of variation due to the various sources, results found in adjacent replicated cores can differ substantially.

Figure 1

Schematics of the field experiment. (A) Surface layout of the field experiment involving twenty genotypes (indexed by i) randomized within four blocks (indexed by j) of twenty ranges of plots. (B) Cartoon depicting the sampling in each plot. Four soil cores (indexed by k) were sampled from each plot in a steel tube. Each core was broken into 10 cm increments. The root count (y) at each 10 cm depth increment (indexed by t) is the sum of the counts on the lower face of the upper fragment and the upper face of the lower fragment. Thus, each root count, y, has four unique index values. The cartoon further depicts the variability that might be encountered by sampling soil cores in a single plot, e.g., contrast cores k = 1 and k = 4. Moreover, we note that many of the profiles of the average root count depicted in Figure 6 of Wasson et al. (2014) are consistent with random observations whose mean follows a functional form that roughly resembles the density function of the gamma probability distribution. Based on this observation, in this paper we develop a statistical modeling approach that can rigorously handle the non-standard nature of our root count data. Specifically, root counts at the observed depths (denoted by t) within a core are formally related through a nonlinear parametric expression θ(t) to reflect the one-dimensional spatial nature of individual soil cores. The parametric expression (with a small number of unknown parameters) is the common denominator that unifies this spatial behavior among all cores. Obviously, an appropriate parametric structure imposed on the root counts within a core would lead to much greater statistical power when compared to, say, an oversimplified ANOVA approach that regards depth as a mere design feature in a factorial experiment. We also note that our field experimental setup was such that the randomness in our data exhibits a hierarchical structure (Gelman and Hill, 2006) that comprises layers of mean and variance functions. In particular, the complex, non-standard experimental design features inherent in our data require hierarchical nonlinear mixed modeling (HNLMM), an approach which addresses our need to model overdispersed, Poisson distributed data (McCulloch, 2008) via a hierarchy of nonlinear mean functions and associated variance components due to the formulation of θ(t). Therefore, our approach in this paper is distinguished from existing studies particularly because of (a) our scrutiny of the root count profiles themselves, rather than the relationship between the counts and the root length density, and (b) our hierarchical modeling approach that integrates all identified facets of variability among all observed root count profiles in a comprehensive and collective manner. Additionally, our modeling framework gives rise to new heritability metrics that describe spatial and overall root architectural traits, the latter at the overall genotypic level. The remainder of our paper is structured as follows. Under the section Materials and Methods, we provide some details of the field sampling procedure that gave rise to our root count profile data, some data visualizations, a primer on the specification of our HNLMM under a Bayesian framework (Gelman et al., 2013), and biological interpretations of model parameters and their use in defining novel multiresolution heritability measures. Statistical inference results and corresponding key biological insights appear in the Results section, followed by the section Model Validation which briefly discusses the rigor and adequacy of our approach (Technical details that supplement these sections appear in Appendices A–F and the online Supplementary Material). Our paper concludes with an in-depth Discussion section on the biological and practical implications of our integrative modeling approach in the general context of facilitating effective wheat breeding programs via root phenotyping.

2. Materials and methods

2.1. Data and modeling framework

Each soil core sampled was partitioned in the field into five-centimeter segments from which the number of roots, y, was determined every 10 cm up to 180 cm using a fluorescence imaging system (Wasson et al., 2016). Each value of y at Depth t(= 1, …, n where n = 18) is the sum of the count imaged from the bottom face of the segment above t and that from the top face of the segment below t (See Appendix A for details on data collection). Let y denote the total number of imaged live roots of Core k at Depth t for Genotype i in Block j. Thus, each ith genotype is associated with 288 (= nnn) observations of y in total. Equivalently, each tth depth is associated with 320 (= nnn) observed counts. Data visualizations for Genotype G18 (Figure 2) and other genotypes (not shown) suggest that our observed root counts, y, perceivably follow a smooth nonlinear trend over core depth, but subject to substantive noise from the effects of soil physical and chemical properties described above, plus sampling and handling errors. These sources of noise culminate in the profile plots (Figures 2A,B) and associated boxplots (Figure 2A) for y. Therefore, a modeling framework comprising the following main model statements was developed to capture the complex noise structure around an idealized smooth trend:

Figure 2

Data visualizations. (A) Boxplots of root counts, by depth for genotype G18, pooled across replicate plots (4) and depth-specific core segments (4 per plot). The horizontal axis is depth from 10 cm to 180 cm, at 10 cm intervals. The blue line is the empirical mean root count profile over depth, which, along with the corresponding mean profiles for other genotypes, resembles those in Figure 6 of Wasson et al. (2014). (B) Root count profiles (in thin black) over depth, by block (replicate plot, shown as panel label), for genotype G18. Superimposed in bold blue within each block is the within-block empirical mean root count profile.

where θ(t) denotes the underlying plot-specific Poisson intensity curve over depth, i.e., the modeled mean root count at Depth t (= 1, 2, …, 18) from the {i, j}th plot (for Genotype i (= 1, 2, …, 20) observed in Block j (= 1, 2, 3, 4)). Data visualizations. (A) Boxplots of root counts, by depth for genotype G18, pooled across replicate plots (4) and depth-specific core segments (4 per plot). The horizontal axis is depth from 10 cm to 180 cm, at 10 cm intervals. The blue line is the empirical mean root count profile over depth, which, along with the corresponding mean profiles for other genotypes, resembles those in Figure 6 of Wasson et al. (2014). (B) Root count profiles (in thin black) over depth, by block (replicate plot, shown as panel label), for genotype G18. Superimposed in bold blue within each block is the within-block empirical mean root count profile. Intensity θ(t) itself is decomposed into fixed and random effects (shaded nodes in Figure 3). Specifically, θ(t) comprises a smooth genotype-specific “kernel function,” γ(t), and two sources of multiplicative Gaussian errors: genotype-specific deviation τ and core segment-specific deviation ϕ. The intensity function's proportionality multiplier ψ, on the logarithmic scale, represents the plot-specific intercept of the {i, j}th intensity function. The intercept can be regarded as the modeled mean count (log scale) of the root system just below the soil surface. Therefore, τ corresponds to the genotypic random effect on this near-surface mean count. As such, ψ itself is random. It is modeled as log-linear, where its mean can be expressed as a study-wide constant ψ0 plus a non-random block-specific shift κ (both taken to be fixed effects) (see Appendix B).

Figure 3

Hierarchical structure of our modeling framework. Boxes denote data, and ovals denote model parameters (unobservable). Shaded nodes collectively determine the modeled Poisson intensity, θ.

2.2. The root system's Bulk and Exploration parameters

The idealized function has two genotype-specific parameters, α and β, respectively representing the non-negative shape and rate of the gamma probability density function. Holding β constant and increasing α causes the ith kernel function to (a) peak at a lower depth and (b) exhibit more spread around the peak (Figure 4A). Thus, α corresponds to both the depth at which the root system is most dense and its tendency to explore spatially around this depth. Henceforth, we refer to α as the “bulk parameter.”

Figure 4

An illustration of the effect of changing the parameters α. The vertical axis is the idealized relative root count (a dimensionless value). (A) Increasing α while fixing β(= 1) causes the idealized function's peak to be located deeper under the soil surface and to be less concentrated. Thus, α is a “bulk parameter” that reflects the depth and density of the “bulk” of the root system. (B) The effect of increasing β while fixing α(= 7) causes an increased skewness in the tail of the idealized function, and consequently decreases the depth of the function's peak from the soil surface and increases its concentration. Hence, β is an “exploration parameter.” Similarly, holding α constant and increasing β causes the ith kernel function's tail to taper off more quickly, i.e., to exhibit a more slender tail (Figure 4B). Thus, β roughly corresponds to the decline rate of the root system's downward exploration. In other words, the less slender (i.e., fatter) the kernel function's tail, the slower the decline of the root system's downward exploration (or, the bigger the tendency for the root system to explore downwards). Henceforth, we refer to β as the “exploration parameter.” For each ith genotype, parameters α and β are modeled as bivariate log-normal random variables (i.e., they are bivariate Gaussian on the logarithmic scale with unknown correlation ρ). These parameters and the noise terms τ and ϕ are each modeled to have a mean that is constant across the study (i.e., not indexed by i, j, k, or t), and similarly for all the (co)variance parameters in the model (See Appendix B). A visualization of the overall hierarchical structure of our modeling approach appears in Figure 3. Finally, under the Bayesian inference framework, we specify reasonably non-informative prior distributions to reflect our lack of knowledge, in the absence of data, about the model parameters (see Appendix B). Collectively, the HNLMM and prior distributions as specified above are referred to as Model 1. Details on the implementation of Model 1 appear in Appendix C.

2.3. Novel heritability measures

The general notion of heritability is the proportion of phenotypic variation that can be attributed to genetics. Loosely, we have This definition of heritability assumes that genotypic and environmental variables are independent, linear components of the phenotypic response variable of interest. In practice, the biological notions of phenotype, genotype, and environment are abstract, and their quantifications that can be measured in an experiment may exhibit a complex co-dependence in a nonlinear fashion. Indeed, Moran (1973) pointed out that a quantification of heritability that purely stems from a linear decomposition of the phenotypic response can be nonsensical in practical settings. In the case that the measurable quantities and experimental design can be reasonably described using Poisson regression, Foulley et al. (1987) adapts the linear (Gaussian) model-based definition of heritability to the scale of the linear predictor in a Poisson regression model, rather than the scale of the phenotypic response. Recently, this definition was extended to a longitudinal Poisson mixed model (Mair et al., 2015). We further extend these ideas to define heritability measures based on segment-level count data. Our adaptations below emphasize the challenge of detecting trends in root architecture from root count data that are both highly noisy and highly non-Gaussian, and that deviate substantially from a simple Poisson model; while considering data at a reasonably high spatial resolution may mitigate the challenge due to noise, it necessarily requires additional model complexity to address the non-standard statistical behavior of the data, and consequently, a novel quantification of heritability based on our new modeling paradigm. The formulation of Model 1 as presented in Appendix B gives rise to a mean number of roots that is nonlinear in its parameters, even on the logarithmic scale. Hence, this mean is not a linear predictor in the usual context of generalized linear models. Nevertheless, at each tth depth, we decompose the variability of log θ(t) into and both of which are spatial in nature. Here, we must address various aspects of complexity that are non-standard in heritability studies: (1) our analog of Var(G), namely, , is attributable to the variability of the trio of genotypic parameters τ, α, and β; and (2) it is a spatial function. Thus, it is reasonable to further decompose this Var(G) analog into τ-, α-, and β-specific components, as each of the trio pertains to different root architectural features; and the α- and β-specific components are also functions of t and are co-dependent except in the naïve case. In Appendix D, we present the four mathematical definitions of heritability (corresponding to , and β) to handle such complexity. Finally, we pool depth-specific values by taking the harmonic mean across depths, thus defining a quantity at the genotypic level that summarizes the particular architectural feature across all depths (see Appendix D). The pooling of spatial elements to form an overall heritability measure gives rise to the multiresolution nature of our approach.

3. Results

We discuss three major biological insights that arise from the Bayesian inference, i.e., the joint posterior distribution among the parameters of Model 1.

3.1. Root intensity profiles are statistically distinguishable among genotypes

Posterior inference allows us to examine the intensity profiles θ(t) and their idealized (denoised) counterparts ψγ(t) for any given replicate block j. A ψγ(t) profile is effectively the intensity profile θ(t) but ignoring the random genotype-block interaction ϕ. Note that ψ is log-linear in τ and κ without an interaction term. Thus, the behavior among the 20 idealized profiles within any jth block is necessarily consistent across all four blocks but for an intercept shift κ. Hence, Figure 5 focuses on j = 1 to represent the study-wide behavior of the idealized profiles.

Figure 5

Inference for root distribution. (A) Posterior mean (Bayesian estimate) of idealized intensity profile ψ γ(t) for replicate block j = 1 for all 20 genotypes. Other blocks appear similarly, differing only in the intercept due to the block-specific fixed effect κ in which log ψ is linear. (B) Posterior means for Genotypes G12 and G18 (from panel A) which are respectively the maximum and minimum curves, each surrounded by a 95% credible band (Bayesian confidence band). Note that credible bands are constructed from depth-wise 95% credible intervals of ψγ(t); thus, the lower band limit is constructed by connecting, across the 18 values of t, the 2.5th percentiles of the ψγ(t) posterior distribution; similarly, the upper band is constructed by connecting the corresponding 97.5th posterior percentiles. All posterior means (Bayesian estimates) of the 20 genotypic idealized profiles ψγ(t) are visually distinguishable (Figure 5A); and the study-wide statistical power is very high in determining that the genotypes do not all exhibit the same idealized profile (Figure 5B): 95% credible bands (Bayesian confidence bands) around the maximum and minimum idealized profiles (G12 and G18, respectively) are clearly non-overlapping. (It is analogous to rejecting the null hypothesis in a classical ANOVA at a very low significance level.) This lack of overlap at such a high credible level indicates that, among the 20 genotypes, at least G12 and G18 are highly statistically discernible with respect to their idealized intensity profiles. While ψγ(t) necessarily behaves similarly across all j, the genotype-block interaction intrinsic in the plot-specific intensity profile θ(t) induces variability in the 20 profiles' collective behavior across j, as is evident in Figure 6: in each block, this variability reduces the statistical distinguishability among the 20 genotypes, although in each of Blocks 2, 3, and 4, at least two intensity profiles are highly discernible. Specifically, despite the noisy nature of θ(t), Figure 6 shows that in each of Blocks 2–4, at least two intensity profiles θ(t) (respectively, (i =){G2, G17} in Block (j =)2, {G6, G13} in Block 3, and {G6, G15} in Block 4) are highly statistically discernible due to the general lack of overlap between the pair of block-specific 95% credible bands around θ(t).

Figure 6

Posterior mean of intensity profile θ(, j = 2 (C,D), j = 3 (E,F), and j = 4 (G,H).

3.2. Root intensity profiles are substantially heritable

Each of our four genotypic heritability measures is a model parameter that exhibits a posterior distribution, shown in black in Figure 7; three of these are pooled measures, each comprising 18 depth-specific components (Appendix D), shown in color.

Figure 7

Posterior distributions of pooled measures of heritability (black), pertaining to (A) overall root architecture, (B) the near-surface intensity, (C) the root bulk's location (and size), and (D) the root system's decline of penetration; the middle vertical line marks the posterior median (a Bayesian estimate), and the outer lines delimit the 95% credible interval. In (A,C,D), pooling corresponds to integrating depth-dependent heritability over all 18 depths via the harmonic mean of the 18 depth-specific heritability values; the posterior distribution of the unpooled heritability at a given depth is shown in shades of “burnt grass,” where more burnt corresponds to greater depth. Focusing on the genotypic level (Figure 7 in black; Figure 8A), the Bayesian estimate and 95% credible interval for heritability of the intensity function are, respectively, 0.65 and (0.52, 0.75); for that of the near-surface mean count they are 0.14 and (0.03, 0.26); the “bulk” parameter, 0.62 and (0.35, 0.82); and the “exploration” parameter, 0.19 and (0.05, 0.37).

Figure 8

(A) 95% credible intervals and posterior medians for the four heritability measures (1 = root architecture; 2 = near-surface mean count, log scale; 3 = bulk parameter; 4 = exploration parameter). (B) Posterior distribution of ρ, with vertical lines indicating the 2.5, 50, and 97.5% quantiles. Note in Figure 7 that (i) the depth-specific components of each of , and tend to increase as depth increases, and (ii) the near-surface intensity of root count has low heritability (). These features of our results indicate that root count features at deeper depths are more heritable than those at shallower depths. In other words, our results provide quantitative rigor for three ideas: the heritability of root architectural traits varies substantively across depth; traits that are associated with a deeper spatial location tend to be more informative about plant genetics; and the depth at which the root system develops its bulk is negatively associated with its tendency to explore deeper. The third notion is further evidenced by another feature of Model 1, which is discussed in the next subsection. It is also interesting to note that, although overall heritability is constituted from , and , Figure 8A (which summarizes the genotypic aspects of Figure 7) suggests that each of the latter three tends to be less than itself, thus root architecture on the whole tends to be more heritable than any of these standalone features of the root system.

3.3. Linkage exists between near-surface root density development and downward exploration

The modeled correlation, ρ, between the bulk and exploration parameters (both on the log scale), is estimated to be 0.64, with 95% credible interval = (−0.85, 0.90) (see Figure 8B). Due to skewness, the posterior probability for ρ to be positive is 0.88, substantiating that the root system's bulk and downward exploration are generally positively associated architectural features. Specifically, a shallower and more concentrated bulk (small α) is associated with a larger tendency for the root system to explore deeper (small β). This phenomenon may be regarded as “a small β canceling out a large α,” or, the tendency of exploring downwards to exhibit the effect of negating the tendency to develop root density further away from the surface. We elaborate on this discovery under Discussion.

4. Model validation

Details of model validation procedures appear in Appendix E. In summary, residual analyses suggest only minor statistical inadequacies of Model 1. With respective to the model's predictive performance as measured by the Watanabe-Akaike information criterion (WAIC) (Gelman et al., 2013), its hierarchical structure is essential. Specifically, ignoring the hierarchical structure between the root count intensity function and its various random components that are specific to genotypes, plots, and depths leads to a naïve model that agrees poorly with the empirical behavior of our root count data. Employing the hierarchical structure, the model's predictive performance remains effectively unaffected whether a priori dependence between the bulk and exploration parameters is considered; however, we regard this extra dependency as a key biological feature because (a) it improves the interpretability of the model by providing an explicit assessment of the interplay between the bulk and exploration parameters, and (b) this interplay is shown to be substantive based on our field data (as indicated by a smaller effective number of parameters for Model 1 despite its extra complexity due to the a priori dependency).

5. Discussion

The development of roots in response to the extreme heterogeneity of the soil results in a lack of discernible root system characteristics that can be measured in the field and integrated into crop breeding programs. The inability to breed on the root development of crops and the inadequacy of conventional crop physiological models in addressing the spatial heterogeneity of root systems and soil properties are barriers to the sustainable intensification of agriculture, where root traits are known to be critical to resource-use efficiency and resistance to climatic extremes. Consequently, they are barriers to effective crop yield risk mitigation and food security. Crop physiological models with better predictive ability are much sought after, and novel statistical models can facilitate this pursuit by effectively teasing apart root system physiology from severe heterogeneity. In this paper, we have addressed this knowledge gap by scrutinizing root counts observed using a core-break count method, and by developing a novel modeling approach that accounts for all root count data holistically. Our approach gave rise to new multiresolution heritability metrics, each describing a specific feature of the root count distribution spatially and at the overall genotypic level, which we showed to be substantially heritable. Our integrative approach can allow selective pre-breeding programs for root distribution and may facilitate the identification of genetic markers from field data. The holistic nature of our approach is an inherent advantage of hierarchical modeling. For model inference, we employed the Bayesian paradigm, which is intrinsically hierarchical in structure. It also has the potential of being greatly flexible: as long as the model is mathematically sound and sufficient computational resources and algorithms are used to implement the model, rigorous statistical inference can be straightforward even for a model with highly complex nonlinear parameters and random quantities that follow non-standard probability distributions. In contrast, classical statistical inference can be too impractical when models or data structures deviate from well-studied scenarios. In our case, the experimental setup and the notion of root architecture together led to a highly non-standard scenario that, under a classical paradigm, would have been much less straightforward to model and subsequently draw inference from. Not only was y (the response variable of interest) strictly non-Gaussian, the data were also three-dimensionally spatial in nature, where replicate plots were arranged in a certain 2-dimensional structure (indexed by {i, j}), and in turn each plot generating numerous 1-dimensional spatial observations of y (indexed by t). Irrespective of the inference paradigm, a caveat of hierarchical modeling is that model complexity in the form of highly nonlinear functional forms and/or intricate hierarchical dependence structures can render the inference so computationally challenging that determining the posterior distribution (for Bayesian inference) or the sampling distributions of estimators and test statistics (for classical inference) would require novel numerical algorithms that are yet to be developed. However, for Bayesian inference in our case, model implementation and model diagnosis/validation were reasonably straightforward to conduct. The satisfactory predictive performance of our Model 1 (Appendix E) suggests that Model 1 is scientifically sensible and has yielded biological insights that are superior to what could have been drawn from previous linear models applied to core-level metrics (from collapsing segment-level data). Although Model 1 does not account for potential within-core spatial dependency among segment-level root count data (see Appendix F), the biological implications of this model nonetheless will help to define root traits for breeding. The canonical model of root distribution with depth is that of a negative exponential function (Gerwitz and Page, 1974). Gale and Grigal (1987) describe a nonlinear function Y = 1 − β where Y is the cumulative root fraction from the surface to the depth (d cm), and the coefficient β is genotypically determined. This model was later employed by Jackson et al. (1996) to model root distributions across a range of terrestrial biomes. However, this 1-dimensional model takes no account of the horizontal distance from the base of the plant. It has been observed that root distribution is 1-dimensional with depth in grassland, 2-dimensional in crops planted densely in rows, and 3-dimensional where plants are widely spaced (Bengough et al., 2000). The simulation studies by Grabarnik et al. (1998) showed that root length density—typically the length of root per volume of soil (cm/cm3)—for maize decreased nonlinearly with horizontal distance from the stem in the top 40 cm, but below that depth they were homogeneously distributed with horizontal distribution from the plant. Grabarnik et al. (1998) also showed that the roots were subject to clustering at all depths, and that whilst there was no preferential growth in a horizontal plane, the orientation of root growth deviated from the uniform distribution with increasing depth. Similar findings were generated in the simulation study by Bengough et al. (1992), and both studies drew attention to the likely effect of soil structure to further perturb the uniform directional distribution of root development parameters. The similarity between the model by Gale and Grigal (1987) and the special case of our gamma kernel function where α = 1 should be noted (Figure 4A). Rather than root length density, our model accounts for root counts that are random with respect to sampling position by row in a crop. The model is designed to explain the distribution of root counts with depth at the crop level (and not the plant level). However, the sampling position is likely to have a strong influence on the surface root counts, which explains the low heritability of τ in our model. Interpreting the biological meaning of the “bulk” and “exploration” parameters (α and β, respectively) is also interesting. In the gamma kernel function, β also affects the depth and intensity of the peak otherwise defined by α (Figure 4). Indeed, our data analysis implicated that α and β were positively correlated (Figure 8B). For our HNLMMs, predictive performance remained largely unaffected whether a priori dependence between α and β is considered; however, including this extra dependency improved the interpretability of the model by providing an explicit assessment of the interplay between the root system's tendencies to branch beneath the surface and to explore vertically, deep below the surface. An explanation for this effect may be found in the structure of the soil; root growth in deeper layers is perceivably constrained to networks of cracks and pores (Gao et al., 2016). White and Kirkegaard (2010) show that in a dense, structured subsoil 85–100% of roots below 60 cm were clumped in pores and cracks in the soil (compared to 30–40% above 60 cm), and 44% of the roots were clumped in pores with more than three other roots. Exploration of the soil for cracks and pores may define the exploitation of the soil by a root system. It has been suggested that plants have evolved randomness and instability in their root system development (Forde, 2009), which may facilitate exploration. The exploration of the shallow layers for cracks and pores may be what determines the eventual depth; our model implies that more branching near the surface gives better access to the subsoil. The primary purpose of our modeling approach was to distinguish genotypes from root count data that are statistically noisy. The inference for heritability based on the intensity functions suggests that our approach can be used to identify genetic markers of root system distribution in field data; identified markers then could be integrated into breeding programs. The high heritability of the “bulk” parameter also suggests that a breeding program could successfully alter the depth at which a root system proliferates. Notwithstanding, residual plots (Appendix E and Supplementary Material: Supplementary Figures) suggest some minor statistical inadequacies of Model 1. Therefore, it may be advantageous to (1) explicitly model gene-environment interactions (which are implicitly modeled by our current HNLMM due to the marginal dependence among genotypic terms indexed by i and environmental terms indexed by j and/or t); (2) formally model the within-core spatial dependence (possibly at a higher spatial resolution of core depths than the current 10 cm intervals); and (3) also incorporate an additional two-dimensional spatial correlation structure among field plots. In Appendix F, we suggest a possible decomposition at Level 1 of the model hierarchy to address (1), and discuss practical implications of modeling the 3-dimensional spatial dependence to address (2) and (3). Finally, it may also be of benefit to develop a new quantitative framework to predict root length density from the posterior mean root count profiles while accounting for trials in different soil and climate conditions, under which the response of the intensity functions and their underlying parameters to subsoil constraints could be rigorously exploited.

Author contributions

AW and GC designed research; AW, GC, AZ, and TB performed research; AW and GC contributed new data processing and analytic tools; GC, AZ, and TB analyzed data; and AW, GC, and AZ wrote the paper.

Funding

Bayer CropScience (a) funded CSIRO for this research project, part of which was subcontracted to the Australian National University to sponsor GC's involvement after June 2015; and (b) sponsored the CSIRO Agriculture Vacation Scholarship 2014 that was held by TB, under which he contributed to the basic model that was later extended to the approach in this paper.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Table A1

Values of the Watanabe-Akaike information criterion (WAIC) as a measure of predictive performance by our Bayesian HNLMMs.

Model	Description	WAIC
1	Most sophisticated among our models, as presented in this paper	34198
2	Same as Model 1, but with a prespecified ρ = 0	34195
3	Naïve preliminary model: same as Model 2, except with ϕ_ijt term missing and all of ψ₀, τ_i, α_i, β_i, and κ_j taken as fixed effects	46627

A smaller WAIC value suggests better model performance.

13 in total

1. A rapid, controlled-environment seedling root screen for wheat correlates well with rooting depths at vegetative, but not reproductive, stages at two field sites.

Authors: M Watt; S Moosavi; S C Cunningham; J A Kirkegaard; G J Rebetzke; R A Richards
Journal: Ann Bot Date: 2013-07 Impact factor: 4.357

2. Genetic evaluation of traits distributed as Poisson-binomial with reference to reproductive characters.

Authors: J L Foulley; D Gianola; S Im
Journal: Theor Appl Genet Date: 1987-04 Impact factor: 5.699

Review 3. Is it good noise? The role of developmental instability in the shaping of a root system.

Authors: Brian G Forde
Journal: J Exp Bot Date: 2009-09-16 Impact factor: 6.992

Review 4. Contributions of roots and rootstocks to sustainable, intensified crop production.

Authors: Peter J Gregory; Christopher J Atkinson; A Glyn Bengough; Mark A Else; Felicidad Fernández-Fernández; Richard J Harrison; Sonja Schmidt
Journal: J Exp Bot Date: 2013-02-01 Impact factor: 6.992

Review 5. Soil conditions and cereal root system architecture: review and considerations for linking Darwin and Weaver.

Authors: Sarah M Rich; Michelle Watt
Journal: J Exp Bot Date: 2013-03 Impact factor: 6.992

Review 6. Traits and selection strategies to improve root systems and water uptake in water-limited wheat crops.

Authors: A P Wasson; R A Richards; R Chatrath; S C Misra; S V Sai Prasad; G J Rebetzke; J A Kirkegaard; J Christopher; M Watt
Journal: J Exp Bot Date: 2012-05-02 Impact factor: 6.992

7. The distribution and abundance of wheat roots in a dense, structured subsoil--implications for water uptake.

Authors: Rosemary G White; John A Kirkegaard
Journal: Plant Cell Environ Date: 2009-11-04 Impact factor: 7.228

8. A Bayesian generalized random regression model for estimating heritability using overdispersed count data.

Authors: Colette Mair; Michael Stear; Paul Johnson; Matthew Denwood; Joaquin Prada Jimenez de Cisneros; Thorsten Stefan; Louise Matthews
Journal: Genet Sel Evol Date: 2015-06-20 Impact factor: 4.297

9. Deep roots and soil structure.

Authors: W Gao; L Hodgkinson; K Jin; C W Watts; R W Ashton; J Shen; T Ren; I C Dodd; A Binley; A L Phillips; P Hedden; M J Hawkesford; W R Whalley
Journal: Plant Cell Environ Date: 2016-02-05 Impact factor: 7.228

10. Soil coring at multiple field environments can directly quantify variation in deep root traits to select wheat genotypes for breeding.

Authors: A P Wasson; G J Rebetzke; J A Kirkegaard; J Christopher; R A Richards; M Watt
Journal: J Exp Bot Date: 2014-06-24 Impact factor: 6.992

8 in total

1. CRootBox: a structural-functional modelling framework for root systems.

Authors: Andrea Schnepf; Daniel Leitner; Magdalena Landl; Guillaume Lobet; Trung Hieu Mai; Shehan Morandage; Cheng Sheng; Mirjam Zörner; Jan Vanderborght; Harry Vereecken
Journal: Ann Bot Date: 2018-04-18 Impact factor: 4.357

2. Identifying Developmental Patterns in Structured Plant Phenotyping Data.

Authors: Yann Guédon; Yves Caraglio; Christine Granier; Pierre-Éric Lauri; Bertrand Muller
Journal: Methods Mol Biol Date: 2022

3. Seedling and field assessment of wheat (Triticum aestivum L.) dwarfing genes and their influence on root traits in multiple genetic backgrounds.

Authors: Cathrine H Ingvordsen; Pieter-Willem Hendriks; David J Smith; Kathryn M Bechaz; Greg J Rebetzke
Journal: J Exp Bot Date: 2022-10-18 Impact factor: 7.298

4. Genomic prediction of yield and root development in wheat under changing water availability.

Authors: Xiangyu Guo; Simon F Svane; Winnie S Füchtbauer; Jeppe R Andersen; Just Jensen; Kristian Thorup-Kristensen
Journal: Plant Methods Date: 2020-07-01 Impact factor: 4.993

5. Phenotypic variability in bread wheat root systems at the early vegetative stage.

Authors: Yinglong Chen; Jairo Palta; P V Vara Prasad; Kadambot H M Siddique
Journal: BMC Plant Biol Date: 2020-04-28 Impact factor: 4.215

6. Assessment of root phenotypes in mungbean mini-core collection (MMC) from the World Vegetable Center (AVRDC) Taiwan.

Authors: Muraleedhar S Aski; Neha Rai; Venkata Ravi Prakash Reddy; Harsh Kumar Dikshit; Gyan Prakash Mishra; Dharmendra Singh; Arun Kumar; Renu Pandey; Madan Pal Singh; Aditya Pratap; Ramakrishnan M Nair; Roland Schafleitner
Journal: PLoS One Date: 2021-03-04 Impact factor: 3.240

Review 7. Crop Root Responses to Drought Stress: Molecular Mechanisms, Nutrient Regulations, and Interactions with Microorganisms in the Rhizosphere.

Authors: Jian Kang; Yunfeng Peng; Weifeng Xu
Journal: Int J Mol Sci Date: 2022-08-18 Impact factor: 6.208

8. Root System Architecture Plasticity of Bread Wheat in Response to Oxidative Burst under Extended Osmotic Stress.

Authors: Omar Azab; Abdullah Al-Doss; Thobayet Alshahrani; Salah El-Hendawy; Adel M Zakri; Ahmed M Abd-ElGawad
Journal: Plants (Basel) Date: 2021-05-08

8 in total