Literature DB >> 33864072

Genomic prediction of hybrid crops allows disentangling dominance and epistasis.

David González-Diéguez¹, Andrés Legarra¹, Alain Charcosset², Laurence Moreau², Christina Lehermeier³, Simon Teyssèdre³, Zulma G Vitezica¹.

Abstract

We revisited, in a genomic context, the theory of hybrid genetic evaluation models of hybrid crosses of pure lines, as the current practice is largely based on infinitesimal model assumptions. Expressions for covariances between hybrids due to additive substitution effects and dominance and epistatic deviations were analytically derived. Using dense markers in a GBLUP analysis, it is possible to split specific combining ability into dominance and across-groups epistatic deviations, and to split general combining ability (GCA) into within-line additive effects and within-line additive by additive (and higher order) epistatic deviations. We analyzed a publicly available maize data set of Dent × Flint hybrids using our new model (called GCA-model) up to additive by additive epistasis. To model higher order interactions within GCAs, we also fitted "residual genetic" line effects. Our new GCA-model was compared with another genomic model which assumes a uniquely defined effect of genes across origins. Most variation in hybrids is accounted by GCA. Variances due to dominance and epistasis have similar magnitudes. Models based on defining effects either differently or identically across heterotic groups resulted in similar predictive abilities for hybrids. The currently used model inflates the estimated additive genetic variance. This is not important for hybrid predictions but has consequences for the breeding scheme-e.g. overestimation of the genetic gain within heterotic group. Therefore, we recommend using GCA-model, which is appropriate for genomic prediction and variance component estimation in hybrid crops using genomic data, and whose results can be practically interpreted and used for breeding purposes.

Entities: Chemical

Keywords: GenPred; dominance; epistasis; genetic variance; genomic models; genomic prediction; heterosis; shared data resources

Mesh：

Year: 2021 PMID： 33864072 PMCID： PMC8128411 DOI： 10.1093/genetics/iyab026

Source DB: PubMed Journal: Genetics ISSN： 0016-6731 Impact factor: 4.562

Introduction

Many plant species are presently cultivated in the form of single-cross hybrid varieties, especially when a strong heterosis effect is observed for yield-related traits (e.g. maize, sunflower, sugarbeet, etc.). These hybrids are generally obtained by crossing inbred lines originated from two complementary populations, called heterotic groups. Breeders’ objective is therefore to identify (i) the best single-cross hybrids among all possible crosses between existing inbred lines from the two groups and (ii) create new lines within heterotic group, from crosses of existing lines, which will improve the performance of candidate hybrids at a next generation. Models for genetic improvement of hybrid crops (e.g. maize) across two heterotic groups are typically based on the notions of general combining ability (GCA) and specific combining ability (SCA) (Griffing 1962; Stuber and Cockerham 1966; Bernardo 2010). The genotypic value of the cross of lines i and j, as a function of uniting gametes from i and j, can be written as follows: where GCA of line i is the average effect of a gamete when ideally crossed to all gametes from the reciprocal heterotic group. SCA of the combination of line i and j is the remainder. It is important to notice, for readers not familiar with hybrid crops, that in many hybrid crops such as maize, parents are pure homozygous individuals (inbred lines). Thus, all gametes produced by i (and j) are identical, and all F1 descendants of i and j are identical. This is different from crosses of other species such as animals (pigs for instance) where full-sibs show genetic variation. As a result, GCA contains single locus (additive, in the statistical sense) and multiple loci (additive by additive and higher additive interactions) effects. This is because the whole genotype (gamete) of the pure line is transmitted to the F1 descendants, including any possible epistatic combination, and regardless of whether loci in interaction are in the same or in different chromosomes. In this, GCA is different from the concept of Breeding Value in Animal Genetics, which captures the part of functional epistatic effects that is contained in the additive substitution effects, but it does not contain epistatic deviations as they are broken down by meiosis. Informally, the GCAs within group 1 (group 2) are the sum of additive, additive x additive, additive x additive x additive… deviations within group 1 (group 2), whereas SCA are the sum of dominance, all epistatic interaction involving dominance, and any epistatic additive interaction across both groups. Stuber and Cockerham (1966) presented the covariance across GCAs of different lines and SCAs of different pairs of lines as a function of probabilities of alleles at loci being identical by descent. These probabilities are, in their work, implicitly based on pedigree, and are the same across all loci or pairs of loci. For this reason, and based on pedigree of lines alone, some components of the variance cannot be distinguished; for instance, dominance and across-groups additive epistasis cannot be separated (Bernardo 2010). With the advent of molecular markers (SNPs), it is possible to obtain better predictions based, ultimately, on similarity across lines (or pairs of lines) measured using markers, using for instance kinship matrices based on VanRaden (2008). Bernardo (1994) transposed the concepts of Stuber and Cockerham (1966) to the prediction of hybrid lines of corn using molecular markers. These ideas have been used extensively (e.g.Massman ; Technow ; Bouvet ; Kadam ; Acosta-Pech ; Westhues ; Schrag ). However, the infinitesimal assumptions of Stuber and Cockerham (1966) are not needed or completely pertinent anymore. SNP markers allow finer distinction of patterns of relationships across and within regions. For instance, it is possible to distinguish relationship within locus and across loci—thus making it possible to split dominance and across-groups epistasis. Also, relationships across dominance deviations are not simple expressions built from additive relationships (Vitezica , 2017). Therefore, it is important to properly re-define statistical models for genomic prediction, in order to have models that are more adequate to the physical nature of the genome (finite and not infinitesimal), better understood, and potentially more accurate. The model including a GCA component for each group is convenient because individuals are selected within groups. In this work we develop a new model, called GCA-model, which corresponds to Stuber and Cockerham (1966) and to Bernardo (2010) ideas, and whose theory we re-develop from scratch for the case of markers. We rederive orthogonal models for genomic prediction in hybrid crops using the notions of effects defined “according to origin” (GCAs and SCAs), using quantitative genetics theory and considering the substitution effects of markers. We present expressions for additive, dominant and epistatic relationships across pure lines and hybrids. Then, as an illustration, we use our method, in an increasing order of complexity, to estimate variance components and predict hybrid performance using a publicly available data set (Technow ). We compare results of our orthogonal model (GCA-model) with effects defined “according to origin” to the model with effects “defined uniquely” at the hybrid level (G-model), and to the existing formulation of Technow that we show to be a simplification of our approach.

Theory

Here we derive the theory for genomic relationship matrices in the analysis of hybrid crosses of inbred lines from two populations. We draw on the tradition of separately modeling effects of gametes coming from each heterotic group (Sprague and Tatum 1942; Griffing 1962; Stuber and Cockerham 1966; Bernardo 2010). To derive the genetic model, an ideal (issued from random mating of heterotic pools), and large population of hybrids was assumed. In this way the content of a random sample of gametes depends on the allele frequency in the population. This is the traditional treatment in Quantitative Genetics. The two parental populations or heterotic groups (e.g. Dent and Flint in the case of North European maize) are named 1 and 2. Extension to more than two heterotic groups is immediate. Our aim is to split the total genotypic value of a single-cross hybrid in two statistical additive effects (one from each group), a single dominance deviation (particular to each cross) and epistatic interactions (either intra-group or across-groups). Ideally, this partition is orthogonal and all components should be estimable. Orthogonality is a definitional system that guarantees that variance components are “a priori” (before seeing the data) independent, whereas practical orthogonality depends on the information available in the data set. To our knowledge, this partition and its use in hybrid crops using markers have not been presented elsewhere.

Additive substitution effects and dominance deviations in hybrid crops

We start from a genotypic model to derive statistical effects (Falconer 1981; Bernardo 2010). Consider a biallelic single locus/gene with alleles B/b and the allele origin denoted as i = 1 and j = 2, thus population 1 has and with respective frequencies and and population 2 has and with frequencies and . The hybrid population has genotypes (frequencies) (), (), () and (). We assume additive and dominant gene action, and separate effects of gametes coming from 1 and 2. Then the genotypic value of a hybrid can be written (up to a common constant) as where is the functional additive effect for from P1, is the functional additive effect for from P2, and d is the functional dominance value of both heterozygotes ( and ). The genetic mean of the hybrid population is therefore Classically, the genotypic values of a hybrid are split into statistical additive values, one per parent, and a dominant deviation for the hybrid, as in (Bernardo 2010): where () is the additive effect of a gamete from population 1 (from population 2) combined with a gamete from population 2 (population 1), whereas is the dominant deviation. Additive values and of the gametes include average effects of each gene/allele. The average effect of alleles (, , and ) are derived from the genotypic values. The reasoning is set out in Table 1 (following Table 7.2 in Falconer (1981)). If gametes carrying from population 1 are mated at random with gametes from population 2, the frequencies of the genotypes produced will be of and of . The genotypic value of a hybrid is , that of is , and the mean of these two, taking into account of the proportions in which they occur is .

Table 1

Representation of the average effect of a gene

	Values and frequencies of genotypes produced
Type of gamete	B1B2 (a1+a2)	B1b2 a1+d	b1B2 a2+d	b1b2 0
B1	p2	q2
b1			p2	q2
B2	p1		q1
b2		p1		q1

Representation of the average effect of a gene For instance, considering allele , the difference between the mean value conditional on a particular genotype of the gamete (e.g. ) and the population mean is the average effect of the allele, . The average effects of alleles (, , and ) are therefore Now, we can derive the average effect of the allele-substitution, for instance, letting be substituted by . From the alleles taken at random from the population for substitution, a proportion will be found in genotypes and a proportion in . The substitution will, respectively, change the value from to and from 0 to (see Table 1). Thus, the average effect of the allele-substitution of population 1 is The same result can be obtained as the difference between average effects: . For population 2 , it is . The average effects of alleles can also be rewritten as function of the allele substitution effect as follows: , , and . Note that allele-substitution effects involve both functional additive and dominant effects, and allele frequencies of the other parental population. Similar expressions were presented by Vitezica but an identical functional additive effect was assumed in both parental populations, ignoring the origin of the allele. The statistical additive effects of a gamete are equal to the sum of the average effects of the alleles it carries. Thus, for a single locus, the statistical additive effects are This can also be written as and with for gametes , and for gametes Subtracting statistical additive effects from genotypic values (, , and ) gives dominance deviations which are interactions between the alleles received from parental populations. This is detailed in the Appendix, and the dominance deviation of the hybrid according to its genotype is So, the dominance deviation of a hybrid individual can be written as with Or, equivalently, for , defined as above. Finally, the model for analysis of hybrid crosses considering additive and dominance effects can be written in matrix form for a set of crosses as where, for a single locus, , ,

Derivation of additive and dominance genomic relationships

Now we extend the analysis to multiple markers, using , , , where and are matrices with as many rows as inbred lines present in each heterotic group, and as many columns as the number of markers, . The matrix has as many rows as hybrid individuals and as many columns as markers. Genotypes in pure lines are in matrices and which contain zero for genotypes and , respectively; and 1 for genotypes and , respectively. The observed B allele frequencies for marker j in the heterotic groups composed by n lines can be computed as . Matrices are obtained subtracting (which is equal to centering if is computed from observed genotypes), as for population 1. It is analogous for population 2. Now, we can set up the covariance matrices. Additive covariance matrix for population 1 (for ) assuming linkage equilibrium, is where is the variance of the allele-substitution effect of population 1. For one locus, if the population 1 is a population of pure lines (individuals are homozygotes), the genetic variance of its gametes is By construction of the matrices, and then we have for one locus the following table of gametes and their effects So, and . Assuming linkage equilibrium, we generalize this result to all markers and Therefore, we can now divide above by this variance and we have where is the additive genomic relationship matrix across lines in population 1 of size . The reasoning is identical for population 2 and only the allele frequencies change. These results are similar, but not identical, to VanRaden (2008). In particular, using VanRaden’s method 1 directly, while coding genotypes in pure lines as 0/2, results in . The reason for this discrepancy is because of the reference population used for the additive variance. For a single population with an arbitrary level of inbreeding, the covariance of additive values is expressed as the relationship matrix times the additive variance in an outbred population with the same allele frequencies (Endelman and Jannink 2012). The additive variance, defined here is for the fully inbred population, and these two definitions of additive variance differ by a factor of 2. VanRaden’s additive relationship matrix divided by two to obtain a kinship (or coancestry) matrix results in which is equal to our result. Note that the choice of reference population changes the scaling for . For the dominance deviations, the covariance matrix for hybrids, assuming linkage equilibrium, is where is the variance of the dominant effect at the locus level, defined at the hybrid population. Because gametes are uncorrelated, . The variance of dominance deviations for F1 hybrids is therefore, for one locus, . This is as in Reif , Hallauer and Vitezica although the effect is defined differently in their models, that use “uniquely defined” effects. From here and assuming LE across loci we have which results in the covariance matrix where is the dominance relationship matrix across hybrids of size . Note that we assume that allele substitution effects and dominance effects are random with respective variances and . Technow modelled specific combining abilities using element-by-element products of matrices and , following Stuber and Cockerham (1966). Clearly this is not the same as our matrix above, that results directly from modelling dominance deviations. We will show later that Technow approach, in fact, only models additive by additive across-heterotic groups epistasis, which is indeed a part of the SCA, and that their approach is an approximation to our . Also we show that the method of Stuber and Cockerham (1966) using element-by-element products to obtain relationships across SCA assumes an infinitesimal model, and should not be directly transposed to marker-based models.

Some properties of the additive and dominance relationship matrices

Matrices , and have an average diagonal equal to 1 and an average value equal to 0 across the whole matrix. This implies that estimates of variance components can be interpreted as genetic variances (Legarra 2016). Also, and , the underlying incidence matrices to , and are orthogonal (see Appendix for the proof), which implies that by construction statistical estimates are a priori independent from each other. Also, because the basic bricks and are orthogonal, extension to higher order of interaction (epistasis) is immediate and also orthogonal (as mentioned later). In the next section, we present epistatic relationship matrices.

Epistasis in hybrid populations

So far, we have written down the model for analysis of hybrid crosses including additive and dominance relationships. Now we use Kronecker products to extend the incidence matrices and to epistatic interactions. The classical model (1) including GCA and SCA effects for a hybrid individual can be written as , where is the phenotypic value of the hybrid, is the population mean, is the GCA of line , is the GCA of line and is the SCA which depends of the combination of alleles received from i and j.

Epistasis intrapopulation

Stuber and Cockerham (1966) showed that the GCA-term includes, in addition to the additive gametic effects, the additive-by-additive epistasis across loci for alleles present in the line (equation 1, page 1279), and all higher order additive interactions in the line. This is because the lines are inbred—so exactly the same gamete is always transmitted to the hybrid, contrary to animal breeding where recombination breaks down epistatic combinations. So, for instance the GCA of a gamete from population 1, considering two loci, k and m and second-order epistasis, where is the deviation due to epistatic interaction across loci in population 1. The coefficients of the incidence matrix, , for second-order epistatic effects between two loci can be computed as the Kronecker products () of the respective incidence matrices for single locus effects. A Kronecker product of orthogonal incidence matrices results in an orthogonal incidence matrix (Van Loan 2000), so the orthogonality (always under the assumption of linkage equilibrium) extends to any order of epistasis. For multiple loci, the matrix of additive-by-additive interaction effects can be written using Kronecker products of each row (corresponding to each line) of the preceding matrices as Matrix is of large size (the number of rows is ) but it is not explicitely used. Following Vitezica , we know that where is the Hadamard product; following developments in Vitezica the genomic additive-by-additive epistatic relationship matrices of lines of population 1 with themselves is thus in agreement (up to a scaling factor) with Cockerham (1954), Henderson (1984), Martini , and Vitezica . In the above expression, is the trace and is the number of lines in population 1. Therefore, the covariance matrix for the additive-by-additive interaction within population 1 is: The reasoning for population 2 is the same resulting in and The dimensions of and are and , respectively.

Epistasis across populations

According to Stuber and Cockerham (1966), the SCA-term in addition to the dominant deviation effects, includes the additive-by-additive epistasis across loci in alleles coming from different populations (equation 2, page 1279), the additive-by-dominant and dominant-by-dominant interactions, plus higher order interactions that we will not detail here as the reasoning is the same. So, SCA for a hybrid from populations 1 and 2, considering two loci, k and m, The different and come from two parental lines i and j. Let be a matrix relating hybrids to lines in population 1 with 1 in the k, l position if the k-th hybrid comes from the l-th line in population 1 and a similar matrix linking hybrids to lines in population 2. The covariance matrix for the additive-by-additive interaction between populations 1 and 2 () can be calculated as: where is the number of hybrids and the matrix has size . In other words, the matrix is formed as follows: For each pair of hybrids with respective parents (from population 1), (from population 2) and (from population 1) and (from population 2) do: Scale the resulting matrix to an average diagonal of 1. Technow used to model the relationship matrix of SCA’s. We have shown that this is incorrect because models across-population epistasis (interactions across loci for alleles coming from different heterotic groups) but it does not model dominance deviations (interactions within loci). We will see in the Discussion section that is in fact an approximation of . Relationships for the other pairwise epistatic interactions (all of them present in the SCA and of size ) are: Additive in population 1 by dominant: Additive in population 2 by dominant: Dominant by dominant: In the same manner, it is possible to derive relationships for third and higher order interactions, using Hadamard products of , and including the incidence matrices for across-population interactions. As the two gametes in each hybrid are uncorrelated, the genotypic variance (Stuber and Cockerham1966) (ignoring third and higher order epistatic terms) is where and In our analysis on real data, we will ignore the epistatic interaction terms and as their estimate is very inaccurate. We remark that a breeder is interested in (because it indicates how much variation is expected in hybrids) but also in , which determines the genetic progress of hybrids that is achievable by selecting lines, crossing them, and producing new inbred lines within heterotic groups (Stuber and Cockerham 1966). This is because epistasis combinations are broken down by recombination when creating new source populations for line development within each heterotic group.

Materials and methods

As illustration of the genomic relationship matrices developed here, variance components were estimated using the publicly available data set from the breeding program of the University of Hohenheim (https://doi.org/10.1534/genetics.114.165860) (Technow ).

Phenotypes and genotypes

Here, a brief description of the phenotypic and genotypic data set is given (more details in Technow ). We analyzed the adjusted entry means of single-cross hybrids for grain yield expressed in quintals per hectare (q ha−1), representing an incomplete factorial design between Dent and Flint inbred lines (two genetically divergent heterotic groups) with high linkage disequilibrium (LD) within heterotic groups. The hybrid data were collected in 14 years (1999-2012) and on average, 95 hybrids produced from 15 Dent and 11 Flint lines, were tested each year. All parental inbred lines were genotyped with the Illumina Maize SNP50 BeadChip (Ganal ). Here, we used the 35,478 SNP available after quality control (see details in Technow ). Markers that were monomorphic in one group but segregating in the other group were kept. Genotypes of tested single-cross hybrids were derived from parental genotypes.

Genomic evaluation models

Our new GCA-model was compared with Stuber and Cockerham (1966) model for effects of genes “defined uniquely”, whose implementation in a marker based-model for hybrid crops is actually the NOIA system (Álvarez-Castro and Carlborg 2007; Vitezica ). We will call this the G-model. Both genomic models were used for analysis of hybrid records (either estimation of genetic parameters or cross-validation). Note that the difference from our GCA-model to previous studies is that we use genomic relationship matrices that we have completely developed from theory (and not by transposition of pedigree-based concepts). GCA model. Here, effects were defined “according to origin” (Sprague and Tatum 1942; Griffing 1962; Stuber and Cockerham 1966), but with relationship matrices developed in the Theory section. So, GCA-model for the () hybrid resulting from the combination of parental lines (from population 1) and (from population 2) can be written as: where is the hybrid phenotype (entry mean), is the overall mean. Our models differ in the explicit modelling of the GCA and SCA into sub-components due to additive, dominant and epistatic statistical effects. In classical settings, GCA and SCA may be modelled either as fixed or as independent random effects (Bernardo 2010; Hallauer ). For instance, Giraud used a model with random, uncorrelated GCA and SCA for the analysis of a half-diallel design. As mentioned before, GCA contains additive, additive x additive and further within-heterotic epistatic effects, whereas SCA can be split into dominance, across-population additive effects, and epistatic effects including dominance. For instance, where all terms are obviously not always fit in the model or estimable in practice. Still, and because there are potentially several hybrids for each line, we use “residual genetic” effects (e.g.Endelman ) (see explanation below—similar to the “permanent environmental effect” in animal breeding) as a catch-all term that includes genetic effects that are not explicitely included. For instance, if we assume , the term captures and further terms, but if we assume , then . The “residual genetic” r effects are assumed random and uncorrelated, and can be estimated because there are repeated hybrids for each line. In this manner the models are more robust to the fact of fitting genetic effects up to an arbitrary complexity that may be enough or not. The strategy of fitting “catch-all” “residual genetic” effects could also be followed for the SCA, but because hybrids are not repeated in the entry means, “residual genetic” r effects are not estimable as they are confounded with the residual. Then we make several choices of genetic effects to be explicitly included in GCA and SCA, leading to several models that will be described later. The most complete model includes , , . We ignore second-order epistasis including dominance and higher order epistatic interactions. Thus, the most complete model is: which in vectorial form (all phenotypes in vector ) is: where the incidence matrices assign hybrids to parents in each heterotic group. The additive effects of gametes from each inbred line are assumed distributed as and , the dominance deviation effects for each hybrid combination, as ; the epistatic interaction effects within each heterotic group are , and between heterotic groups is . Finally, is the vector of random residual genetic effects , is the vector of random residual genetic effects , and is the vector of random residual effects . All the relationship matrices have been defined in the Theory section. Fit of “residual genetic” GCA effects. As discussed before, the GCA effect conceptually contains within-population additive effects and additive epistatic interactions of any order. As not all these interactions are explicitely modelled, and because pure lines are repeated in hybrids, we fit “residual genetic” effects in GCA-models. For instance, if only additive effects are fit, this “residual genetic” effect account for all additive epistasis within-group present in the GCA effect. Fitting residual genetic GCA effects is similar to fitting individual permanent environmental effects in animal breeding (e.g. for a cow that gives repeated performances of milk yield). It is known in animal breeding that this effect captures, among other things, genetic effects not explicitly modelled such as dominance or epistasis (Kruuk 2004; Vitezica ). Indeed, historically GCAs have been estimated as random, unrelated effects, for instance in diallel designs (Sprague and Tatum 1942; Hallauer ). Thus, all the models detailed above (shown in Table 2) include “residual genetic” effects.

Table 2

Definition of genomic models for maize single-cross hybrids

				Variances
Models	Effects	Model Code	Additive	Dominance	Epistasis	Residual genetic
GCA	gA1+gA2+r	GCA:A	σA12,σA22			σr(1)2,σr(2)2
	gA1+gA2+gD+r	GCA:AD	σA12,σA22	σD2		σr(1)2,σr(2)2
	gA1+gA2+gAA1,2+r	GCA:AAA1,2	σA12,σA22		σAA1,22	σr(1)2,σr(2)2
	gA1+gA2+gD+gAA1,2+r	GCA:ADAA1,2	σA12,σA22	σD2	σAA1,22	σr(1)2,σr(2)2
	gA1+gA2+gD+gAA1,1+ +gAA2,2+gAA1,2+r	GCA:ADAA1,1AA2,2AA1,2	σA12,σA22	σD2	σAA1,12,σAA1,22 ,σAA2,22	σr(1)2,σr(2)2
G	gAH	G:A	σAH2
	gAH+gDH	G:ADH	σAH2	σDH2
	gAH+gAAH	G:AAAH	σAH2		σAAH2
	gAH+gDH+gAAH	G:ADHAAH	σAH2	σDH2	σAAH2

GCA-model (effects () and variances () defined within heterotic group): additive ( and ), dominance (D), “residual genetic” () and additive-by-additive epistasis (AA) within heterotic groups ((1,1) and (2,2)) and between heterotic groups (1,2).

All the models detailed above were also run without the “residual genetic” effect term.

G model. This model ignores the origin of the gametes and uses a “uniquely defined” effect per hybrid (Stuber and Cockerham 1966), as developed in a genomic context by Vitezica using the NOIA approach, to correctly model dominance deviations under the constraint that hybrids are not in HWE. The G-model for single-cross hybrid individuals can be written as: where are the additive genetic effects of hybrids distributed as , the dominant genetic effects are , and the additive-by-additive epistatic interaction effects are . The matrices , and are the additive, dominant and additive-by-additive genomic relationship matrices, defined as (Vitezica ) where the matrix has elements equal to for genotypes , and , and is the frequency of at the marker of the hybrid population. It is the same as VanRaden’s G but with a different denominator to account for lack of HWE. The dominance matrix is where contains elements for each individual and locus equal to according to Vitezica . This is different from the matrix proposed by Su which does not correctly model dominance deviations (and captures part of the GCA), and it is also different from the matrix in Vitezica (which assumes HWE). The additive-by-additive epistatic relationship matrix can be written as

Sub models and model comparison

Variance components were estimated for nested models (GCA-model and G-model) that added, in succession, additive effects (), dominance effects (), additive-by-additive genetic effects () in addition to “residual genetic” r effects of lines in the GCA-model. The additive-by-additive epistatic effects can be interactions of loci within line from population 1 , within line from population 2 and interactions between loci across lines from populations 1 and 2 , or within hybrids . For details, see Table 2. Also, the variance attributable to GCAs is , and for SCAs is . Also, another set of models was identical but “residual genetic” r effects of lines were not fit in any of the models. Goodness-of-fit of models was compared based on the deviance information criterion (DIC), which balances model fit and model complexity to avoid overfitting (Spiegelhalter ). The lower the DIC value, the better fit of the model to the data. Predicted ability of phenotypes of “untested hybrids” for the different models was tested performing a T2-T1-T0 cross-validation scheme as in Technow et al. (2014). The prediction accuracy of T2, T1, and T0 hybrids (two, one and zero parents, respectively) was obtained for 300 hybrids (ND = 90 and NF = 53 of Dent and Flint parental lines) in the training set. Predictive performance of hybrids was computed separately for each group of hybrids by dividing the correlation of predicted and observed values by . is the genomic broad-sense heritability. The estimated with the full GCA-model was used. The cross-validation process was repeated 100 times. Estimation of variance components and cross-validation were performed in a Bayesian approach using the BGLR R-package (Pérez and de los Campos 2014). To speed up computation, the eigenvalue decomposition of the variance-covariance matrices was done according to Acosta-Pech and modeled as Bayesian Ridge Regression (BRR). For each model, inferences were based on 30,000 samples collected from 60,000 iterations after discarding 30,000 for burn-in and thinning of 10. Convergence of variance parameters was inspected by trace plots and convergence diagnostic was assessed using the BOA R-package (Smith 2007).

Data availability

The data set from the breeding program of the University of Hohenheim is available with the publication of Technow (https://doi.org/10.1534/genetics.114.165860). Estimation of variance components and cross-validation were performed in a Bayesian approach using the BGLR R-package (Pérez and de los Campos 2014). The software can be downloaded from https://cran.r-project.org/web/packages/BGLR/index.html. A program to build the genomic matrices is available at http://genoweb.toulouse.inra.fr/~zvitezic/maize.

Results

Variance components estimates and heritabilities

Variance components and broad-sense heritabilities () for GCA- and G-models are shown in Tables 3 and 4, and for the model without “residual genetic” effects, in Table A2.

Table 3

Estimated posterior means and standard deviation (in parenthesis) of genetic variance components obtained with two genomic models for maize grain yield

Model Code	Additive		Dominance	Epistasis			Residual genetic		Residual
Model Code	σA12,σA22 or σA(H)2		σD2 or σD(H)2	σAA1,12	σAA2,22	σAA1,22 or σAAH2	σr12	σr22	σe2
GCA:A	23.16 (4.78)	12.92 (3.49)					6 (1.59)	6.47 (1.81)	17.63 (0.77)
GCA:AD	22.97 (4.67)	13.07 (3.5)	3.59 (0.72)				5.2 (1.47)	5.9 (1.71)	15.01 (0.79)
GCA:AAA(1,2)	22.87 (4.75)	13.05 (3.55)				4.75 (0.93)	5.13 (1.43)	5.89 (1.72)	13.8 (0.84)
GCA:ADAA1,2	22.82 (4.84)	13.02 (3.46)	2.48 (0.54)			3.6 (0.86)	4.84 (1.45)	5.51 (1.66)	13.46 (0.82)
GCA:ADAA1,1AA2,2AA1,2	19.06 (4.78)	10.61 (3.48)	2.3 (0.56)	5.41 (2.01)	5.57 (2.15)	3.24 (0.79)	3.8 (1.23)	4.56 (1.56)	13.67 (0.81)
G:A	51.77 (6.75)								18.02 (0.79)
G:ADH	47.81 (6.35)		6.18 (1.06)						14.97 (0.78)
G:AAAH	42.22 (6.30)					10.2 (1.76)			13.89 (0.82)
G:ADHAAH	42.26 (6.08)		4.14 (0.81)			7.19 (1.55)			13.59 (0.80)

Estimates of additive ( or ), dominance ( or ), additive-by-additive (, , or ), residual genetic effects (, ) and residual () variances for GCA- and G-models and successively added additive effects (A), dominance effects (AD), additive-by-additive effects (AD(AA)). Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids .

Table 4

Estimated posterior means and standard deviation (in parenthesis) of broad-sense heritability and Deviance Information Criteria (DIC) values obtained with two genomic models for maize grain yield

Model Code	H2	DIC
GCA:A	0.73 (0.05)	7321.5
GCA:AD	0.77 (0.04)	7254.27
GCA:AAA(1,2)	0.79 (0.04)	7201.91
GCA:ADAA1,2	0.79 (0.05)	7203.69
GCA:ADAA1,1AA2,2AA1,2	0.80 (0.04)	7209.92
G:A	0.74 (0.03)	7335.75
G:ADH	0.78 (0.02)	7260.88
G:AAAH	0.78 (0.02)	7206.89
G:ADHAAH	0.80 (0.02)	7212.46

GCA- and G-models are models that successively added additive effects (), dominance effects (), and additive-by-additive genetic effects (). The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids . Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. is the genomic broad-sense heritability.

	Genotype at P2
Genotype at P1		B2B2	b2b2
	B1B1	q1α1+q2α2	q1α1-p2α2
	b1b1	-p1α1+q2α2	-p1α1-p2α2

Definition of genomic models for maize single-cross hybrids GCA-model (effects () and variances () defined within heterotic group): additive ( and ), dominance (D), “residual genetic” () and additive-by-additive epistasis (AA) within heterotic groups ((1,1) and (2,2)) and between heterotic groups (1,2). All the models detailed above were also run without the “residual genetic” effect term. Estimated posterior means and standard deviation (in parenthesis) of genetic variance components obtained with two genomic models for maize grain yield Estimates of additive ( or ), dominance ( or ), additive-by-additive (, , or ), residual genetic effects (, ) and residual () variances for GCA- and G-models and successively added additive effects (A), dominance effects (AD), additive-by-additive effects (AD(AA)). Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids . We consider first Table 3 and the GCA-model. The total variance of GCA e.g. for population 1 is , and this changes very little across models: total GCA variance for population 1 oscillates between 27.66 and 29.16, and for population 2 between 18.53 and 20.74. Within GCA, each individual component varies across the different models but changes are not large; the major change is the diminution from 23.16 to 19.06 in the estimate of when is fit, and similarly ( changes from 12.92 to 10.61) for population 2. This is due to reassignment of total GCA variance across its component parts; the effect does not fully account for the absence of the variance component. As for the SCA, its two components ( and ) also show some changes and there is some reassignment from one to the other. Still, for GCA and SCA components changes are not great and enter well within the confidence intervals of the estimates. On the contrary, when “residual genetic” effects were not fit (Table A2), changes were much larger. In particular, the inclusion of within-group epistatic variances ( and ) reduced additive variances from 31.08 to 22.30 for , and from 22.11 to 14.42 for in the full model. Thus, we conclude that the GCA-model is reasonably (empirically) orthogonal when “residual genetic effects” for GCA of lines are fit, regardless of the level of complexity for the genetic modelling (additive, additive by additive, etc.). Using “residual genetic effects” allows to accommodate non-additive effects of lines not explicitly modelled. Some of these changes can be attributed to similarity of relationship matrices. The correlation between and (and regression coefficient of ) was 0.39 (0.92); and the correlation between and (and regression coefficient of ) was 0.55 (1.63). These results show a high similarity between these relationship matrices and explain that the additive effects tend to capture additive by additive effects if the latter are not explicitly fit (as described above and shown in Table A2), something that is ameliorated fitting “residual genetic” effects (Table 3). Similarly, and matrices were highly correlated (0.88), suggesting that it is difficult to accurately separate dominance and across-groups additive by additive epistasis. To avoid redundancy between the and matrices, we corrected the matrix by subtracting the contribution of dominance as in Alves . However, the resulting had a similar correlation to . Also the regression coefficient of was 1.02, indicating that elements of are unbiased (but shrunken) estimators of the elements of . For the G-model, Table 3 shows that the additive variance estimate in the model (51.77) was higher than the addition of additive variance estimates for Dent and Flint groups (36.08) in the model. Similarly, estimates of dominance () and epistasis within hybrids () variances were higher than in GCA-models. Both results are in agreement with Stuber and Cockerham (1966). The inclusion of the additive-by-additive epistasis effects in the model reduced the estimate of additive variance () from 51.77 to 42.26. Furthermore, similar to GCA-models, the sum of estimates of and (11.33) obtained with the model was lower than the sum (16.38) of the estimates of (in the model) and of (in the ). In the G-model, it is impossible to include “residual genetic” effects because the model is fit at the hybrid and not at the line (GCA) level, and in this data set there is a single record per hybrid. Thus we conclude that the G-model, although constructed with an orthogonal formalism, is not empirically orthogonal with this data set. These results can also be explained because there was a correlation of 0.55 between and and of 0.67 between and . Estimates of genomic broad-sense heritability for grain yield ranged from 0.73 to 0.80, and from 0.74 to 0.80 in the full GCA- and G-models, respectively (Table 4). Estimates of residual variances were similar between GCA-models and G-models (Table 3). The estimates of residual variances decreased as the non-additive genetic effects were added in both GCA- and G-models.

Goodness of fit

Table 4 shows the Deviance Information Criteria (DIC) values for each model. Inclusion of non-additive effects in GCA- and G- models improved the goodness of fit of both models. DIC values between GCA- and G-models were very similar. Among all models, the best model (with lower DIC value) was the . Among the G-models, the best one was that accounted only for additive and additive-by-additive epistatic effects. Models including both dominance and epistatic effects had slightly worse DIC values than the best one. This can be explained because increasing the number of parameters may lead to overfitting and thus, it penalizes DIC values. Based on the inspection of the trace plot and convergence diagnostic with BOA R-package (Smith 2007), all estimates of the variance parameters in all models converged to the posterior distribution.

Cross-validation

The results for cross-validation are shown in Table 5. In general, prediction accuracy was considerably high for maize grain yield (>0.80 in all cases) and similar values were obtained with the G- and GCA- models in this data set. The inclusion of non-additive genetic effects did not improve the prediction accuracy of hybrid values in the testing sets compared to models including only additive effects. The only factor that counted in the predictive ability was the fact of having both, one, or no parents in the training data set, with respective prediction accuracies 0.80, 0.88 and 0.92.

Table 5

Predictive accuracy of T2, T1 and T0 hybrids obtained with two genomic models for maize grain yield

Model Code	T2	T1	T0
GCA:A	0.92 (0.03)	0.88 (0.02)	0.80 (0.08)
GCA:AD	0.92 (0.03)	0.88 (0.02)	0.80 (0.08)
GCA:AAA(1,2)	0.92 (0.03)	0.88 (0.02)	0.80 (0.08)
GCA:ADAA1,2	0.92 (0.03)	0.88 (0.02)	0.80 (0.08)
GCA:ADAA1,1AA2,2AA1,2	0.92 (0.03)	0.88 (0.02)	0.80 (0.08)

G:A	0.92 (0.03)	0.88 (0.03)	0.81 (0.08)
G:ADH	0.92 (0.03)	0.88 (0.03)	0.81 (0.08)
G:AAAH	0.92 (0.03)	0.89 (0.03)	0.81 (0.08)
G:ADHAAH	0.92 (0.03)	0.88 (0.03)	0.81 (0.08)

Estimated posterior means and standard deviation (in parenthesis) of broad-sense heritability and Deviance Information Criteria (DIC) values obtained with two genomic models for maize grain yield GCA- and G-models are models that successively added additive effects (), dominance effects (), and additive-by-additive genetic effects (). The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids . Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. is the genomic broad-sense heritability. Predictive accuracy of T2, T1 and T0 hybrids obtained with two genomic models for maize grain yield GCA- and G-models are models that successively added additive effects (), dominance effects (), and additive-by-additive genetic effects (). The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids . Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. The values refer to the mean (standard deviation) over 100 cross-validation runs with the different models. For T2, T1 and T0 group hybrids, two, one and zero parents were tested in the training set.

Discussion

In this study, the theory in the analysis of hybrid crosses of inbred lines from two populations using relationship matrices was revisited in a genomic context. Models for genomic prediction in hybrid crops using the notions of effects defined “according to origin” (GCAs and SCAs) were rederived and expressions for additive, dominant and epistatic relationships for hybrids were presented. These models were applied to a public data set to exemplify the theory and its consequences in real life.

Insights into relationships for dominance and across-population pairwise epistasis

A surprising fact (to us) is that, in the classical pedigree-based methods, it is not possible to disentangle dominance deviations from across-population epistasis, whereas using markers it is possible. This seems not to have been recognized by previous researchers, leading to the wrong conclusion that in a genomic setting the relationship of dominance deviations is a product of corresponding additive relationships of parental lines. In this section we try to explain why such a difference. Stuber and Cockerham (1966) used the notion of identity by descent (IBD) coefficients to model relationships, where the starting block is the use of coancestries - the probability that two alleles drawn at random from each of two pure lines are identical by descent. Although in our work we use genomic relationships (that are not probabilities), the concept of IBD is useful in the following. For two hybrids, the IBD dominance relationship coefficient at locus k (say ) is the probability that two complete genotypes at locus k in hybrids (i and j) are identical, and because the lines are fully inbred, this is the joint probability that both “parents1” (ancestors from population 1) are IBD at locus k (with probability ) and both “parents2” (ancestors from population 2) are IBD at locus k (with probability ) (see Figure 1). This results in . Across all m loci, . However, in practice, pedigree-based coancestries at specific loci are not observable and they are replaced by infinitesimal coancestries: resulting in which is the expression presented by Stuber and Cockerham (1966). The approximation results from the fact that the genome is finite. For instance, if there were m = 3 loci, there would be 3 local IBD, one at each locus, whose average will not in general be the same as the pedigree-based IBD which does assume infinite loci (Hill and Weir 2011). In an infinitesimal model, the approximation is exact.

Figure 1

Dominance relationship across two hybrids for locus k.

Dominance relationship across two hybrids for locus k. Now we address the across-population epistatic additive by additive relationships. Consider two loci k and l. In an IBD framework, across-population epistatic additive by additive relationship for hybrids i and j at two loci k and l (say ) is the joint probability that both “parents1” (ancestors from population 1) have the same genotype at locus k and that both “parents2” (ancestors from population 2) have the same genotype at locus l (see Figure 2). Thus, . Whole-genome epistatic relationship would therefore be

Figure 2

Additive by additive epistatic across-population relationship across two hybrids for locus k and l.

Additive by additive epistatic across-population relationship across two hybrids for locus k and l. However, based on pedigree, the different and are not observable and they are replaced by infinitesimal coancestries resulting in the approximation . Again, in an infinitesimal model the approximation is exact. It is worth noting that involves relationships across pairs of loci whereas involves relationships within single loci. On average, pairs of loci are transmitted in a manner similar to transmission of single locus (for instance two neighboring markers are often transmitted together), which explains why is an estimator (albeit not necesssarily a good one) of , and it explains why the elements of are unbiased (but not necessarily accurate) estimators of the elements of , as shown by the results. Thus, we have shown that the Stuber and Cockerham (1966) relationships assuming pedigrees are only exact under infinitesimal models. In previous sections we have shown that observing the genome (i.e. with markers), different relationships can be formed for each, additive substitution and dominant and epistatic deviations. Thus, contrary to pedigree-based formulations, a marker-based formulation allows disentangling of the different variance components. Thus, in pedigree-based models the dominance and across-population epistatic relationships are conceptually different, but the lack of other information forces to use the same estimator for both. This is not the case in marker-based models, where we can actually observe different relationships within locus or across loci from two populations.

Partition of genetic variance components and heritability

The partition of the genetic variance in terms of statistical additive effects, and dominance and epistatic deviations effects, was possible using the relationship matrices developed here. In our model, estimates of additive genetic variance based on allele substitution effects are useful for selection or in the prediction of potential selection response in pool improvement. Vitezica compared a classical model (in terms of statistical values for breeding purposes) with a genotypic model (biological values at the gene level) proposed by Su . When the genotypic model is used, additive and dominant genotypic variances are obtained. Both models are able to explain the data but their results and interpretation is different (Vitezica ; Varona ). The genotypic model has been used for hybrid genomic prediction (Fristche-Neto ; Werner ; Alves ; Ramstein ), but estimates of genotypic additive variance should not be interpreted for breeding purposes. The GCA- (proposed here) and G-models are equivalent models to explain the data only if all relevant gene actions (i.e. high order interactions) are included (Stuber and Cockerham 1966), but it is impossible to ascertain if all relevant interactions are included. In our results, both definitional systems perform similarly for prediction. However, as the G-model assumes gene effects uniquely within hybrids and does not provide additive values within pool, it can not be directly used for the selection of inbred lines within pools for recurrent pool improvement. Thus the GCA-model is more useful. Orthogonal partitioning of the effects has been described extensively (e.g.Cockerham 1954; Kempthorne,1954; Lynch and Walsh 1998) for classical HWE populations but also for hybrid crosses (e.g.Griffing 1962; Stuber and Cockerham 1966; Bernardo 1996). Statistically, orthogonality means that inclusion of new terms in the model does not change the definition (in practical terms: the estimates) of already included effects in an ideal, infinitely large population. For instance, by construction, in an orthogonal model there is no covariance across statistical additive and dominance effects. This implies that the covariance across hybrids can be split in covariance due to additive effects, covariance due to dominance deviations, and so on (Lynch and Walsh 1998). Another advantage of using orthogonality in Genetics and breeding is the interpretability. It is the only way to carry out the estimation of GCA (additive “statistical” effects + within-group epistatic “statistical” interactions) in an unambiguous manner, i.e. such that their definitions do not depend on other genetic terms that are fitted in the model. In practice, additive, epistatic and other variances can not be accurately disentangled with a small data set and many (unknown) QTL loci. However, even with a thousand records and thirty thousand markers (as in this work), it still makes sense to orthogonally define the genetic effects in the model. Not using orthogonal partitions might lead to ambiguous definitions of effects and to potential mistakes. For instance, if the additive variance is inflated, a possible consequence is that the genetic progress can be overestimated. If the dominance variance is inflated, the role of assortative mating of pairs of lines to produce a hybrid could be exaggerated. In our work we used orthogonal definitions of effects and the corresponding relationship matrices, as well as “residual genetic” effects to account for unmodelled higher-order effects. In this manner we obtained, in the GCA-model, empirically orthogonal estimates of additive, dominance and epistatic variances for maize grain yield. In the GCA-model, after fitting the “residual genetic” effect, additive variances were similar across different models (∼22 and ∼12 for group 1 and 2, respectively: see Table 3) showing empirical orthogonality (Hill and Mäki‐Tanila 2015; Vitezica ). For planning the breeding scheme (to estimate genetic gain and selection of within pools crosses), it is important to obtain good estimates of the genetic variance, and therefore we recommend fitting “residual genetic” effects, in order to avoid overestimation of the genetic additive variance. The latter option is only possible if each line contributes to several phenotyped hybrids. In the G-model, when within-group epistatic effects were not fitted, additive variance was overestimated. Similar results were observed by Bernardo (1995). He attributed this to multicollinearity between the additive and within-group epistatic relationships, as we observe. Working with repeated measures per individual, Vitezica fitted a G-model with “residual genetic” effects and they obtained empirically unbiased estimates of additive variance. However, in the present work it was not possible to fit “residual genetic” hybrid effects in the G-model because in our dataset each hybrid has a single record (adjusted entry means). Genomic relationship matrices for within and across groups epistasis (in the full GCA-model) allows to partition the genetic variance in terms of GCA and SCA effects, as was originally defined by Stuber and Cockerham (1966) in an infinitesimal context. With our model, it is possible to split the GCA effect into the additive gametic effect and the additive-by-additive epistasis interaction within the line; and split the SCA effect into dominance deviation effect and additive-by-additive epistasis across groups. This has practical implications in hybrid breeding programs that will be discussed later. Compared to the estimates of genetic variance component from Technow , we obtained similar estimates with the model (see Table 3). This makes sense because, as indicated in the theory, the estimate of SCA variation from Technow , is in fact the estimate of epistasis variation across populations . Their entry-mean heritability was 0.87, whereas our genomic estimate of broad-sense heritability was slightly lower (0.81). Differences between our and their estimates are mainly because we used entry means (publicly available) instead of the whole data set, which can be seen through the estimated residual variance which was much lower (e.g. ∼17) than their estimated values (179). Models with lower DIC values better fit the data, and a difference less than 7 units is often considered as irrelevant (Plummer ). In general, the inclusion of non-additive genetic effects improved the goodness of fit to the data in both GCA- and G- models in this set of hybrids. This result agrees with previous studies in maize hybrids ( Ferrão ; Alves ; Hunt ). DIC values obtained with the GCA-model were similar to those obtained in G-models, indicating that they are equivalent models in terms of fitting the data. The best model, with a best balance between goodness of fit and model complexity, was the , which corresponds to a frequently used model in genomic prediction of hybrids (Technow ). That means this model is efficient to fit the data. However, fit to the data is not the only aspect that should be considered—interpretation of the model in a genetic context is important. Overall, cross-validation analyses yielded a high prediction accuracy of hybrid performance (>0.80). This is because a high heritability generally results in high prediction accuracy, as was showed theoretically and empirically (Daetwyler ; Combs and Bernardo 2013). Inclusion of non-additive genetic effects did not show improvement in prediction accuracy. This result agrees with other studies using real data where virtually no benefit was observed by including SCA effects in genomic prediction models of inter-heterotic-group hybrids (Bernardo 1994; Schrag , 2018; Maenhout ; Kadam ). This is because in inter-heterotic-group hybrids the proportion of SCA variance is often low and GCA high (Reif ). We used the splitting of cross-validation considering T2, T1 and T0 (groups of hybrids with two, one and zero parents known in the training set) as in Technow . Our predictive abilities were comparable to those reported by Technow . For instance, the correlation obtained with the results in values of 0.92, 0.88 and 0.80, which are close to the correlations of 0.91, 0.85 and 0.77 (for 300 hybrids in the training set) for T2, T1 and T0, respectively, reported by Technow for grain yield. Assuming marker effects defined uniquely at the hybrid level (G-models) gave similar prediction accuracy than assuming gene effects according to origin (GCA-models). This result was also reported by Technow with the same data set, but also by Alves who analyzed a population of hybrids derived from a convergent population. Thus, GCA- and G- models are equivalent in terms of predictive ability of hybrid performance. However, our aim in this work is to introduce a more meaningful model (the GCA-model), and its superiority is not to be considered only in terms of better prediction ability in the hybrids.

Practical implications in hybrid breeding

The way of partitioning the genetic variance is to a certain extent a matter of convenience. Partitioning in terms of GCA (within group) is more convenient because inbred lines are actually created and selected within group. The magnitude of the GCA variance gives to the breeder an idea of how much overall genetic variation coming from the parents is expected in the hybrids. Further, splitting the GCA variance into additive and epistasis within group is relevant at the moment of planning the genetic progress in maize breeding programs. The genetic improvement in hybrid performance is through the selection of inbred lines. So that, breeders create new segregating (e.g. F2) populations by crossing elite lines within groups followed by subsequent generation of inbreeding to develop new inbred lines. Therefore, the particular additive-by-additive (and higher order) epistatic combination existing in a particular elite line is not transmitted as a whole to its F2 (and further selfing) progeny, because meiosis and recombination shuffles alleles of the two parents in the cross, breaking down the original epistatic combinations present in the elite inbred lines and creating new epistatic combinations. Thus, the use of the additive variance, instead of the total GCA variance, is more appropriate for the prediction of genetic progress that is achievable by selecting within heterotic pools (Stuber and Cockerham 1966). In addition, variance of epistasis within groups is expected to be converted in new additive genetic variance in the long term by random drift, thus, it affects the long-term selection response indirectly (Hill 2017). Also, for pool improvement, it is better to use estimates of additive effects instead of estimates of GCA, because the first reflect better expected genetic progress. Splitting the SCA variance into dominance deviations and epistasis across groups could also have practical implications. Estimates of additive and dominance effects might be important for hybrid pool development. For instance, Zhao suggested to use additive and dominance effects from an incomplete factorial in order to develop heterotic pools in wheat. Further, estimates of dominance deviations are relevant in the definition of mate allocation procedures (Varona ); for instance, they could be used to maximize hybrid performance or maintain diversity for long-term genetic gain in hybrid breeding programs (e.g.Allier ). In maize, there is evidence of directional dominance (Reif ; Ramstein ). Indeed, directional dominance as a biological mechanism should exist, given that hybrids show heterosis. When there is directional dominance (i.e. a higher percentage of positive than negative dominance effects, ), overall heterosis could be considered in the genetic evaluation model. If individuals expressing the trait show considerable variation in heterozygosity (e.g. in a diallel design with crosses within- and across- groups), a more diverse individual will show more positive heterosis at the trait. De Boer and Hoeschele (1993) showed analytically that not fitting this heterosis (usually as a covariate) leads to spurious overestimation of dominance variation, as shown with real data (Xiang , Aliloo , Varona ). Nonetheless, preliminary results in this work showed that heterosis (measured as number of heterozygotic loci) was very similar across hybrids and fitting heterosis in the models led to very similar results (not shown).

Conclusions

Models developed here, with effects defined according to origin (GCA-), and using genomic relationships properly defined for each statistical component, allow for a proper partition of statistical additive effects, dominance deviations, and epistatic deviations, in hybrids derived from inbred lines from two populations. Contrary to common belief, using SNP genotypes, it is possible to split SCA into dominance deviations and across-groups epistasis, and to split GCA into within-line additive effects and within-line epistatic effects. Our GCA-model is appropriate for genomic prediction and variance component estimation in hybrid crops using genomic data, and its results (estimates of genetic variance components, breeding values and deviations) can be practically interpreted and used for breeding purposes.

Genotype	Frequency	gA12	gA1
B1	p1	q1α12	q1α1
b1	q1	-p1α12	-p1α1

	Genotype at P2
Genotype at P1		B2B2	b2b2
	B1B1	a1+a2	a1+d
	b1b1	a2+d	0

	Genotype at P2
Genotype at P1		B2B2	b2b2
	B1B1	q1a1+q2a2-p1q2+q1p2d	q1a1-p2a2+1-p1q2-q1p2d

	b1b1	q2a2-p1a1+1-p1q2-q1p2d	-p1a1-p2a2-p1q2+q1p2d

Table A1

Values of genotypes in a two-allele system, measured as deviation from the population mean

Hybrid Genotypes	Frequency	Assignedgenotypic values	Deviations from population mean EG
Hybrid Genotypes	Frequency	Assignedgenotypic values	G*	gA1+gA2	gD
B1B2	p1p2	a1+a2	q1a1+q2a2-p1q2+q1p2d	q1α1+q2α2	-2q1q2d
B1b2	p1q2	a1+d	q1a1-p2a2+1-p1q2-q1p2d	q1α1-p2α2	2q1p2d
b1B2	q1p2	a2+d	q2a2-p1a1+1-p1q2-q1p2d	-p1α1+q2α2	2p1q2d
b1b2	q1q2	0	-p1a1-p2a2-p1q2+q1p2d	-p1α1-p2α2	-2p1p2d

is the total genotypic value of a hybrid deviated from the population mean.

is the additive-effect portion of a hybrid’s genotypic value.

is the dominance deviation of the hybrid.

Table A2

Estimated posterior means and standard deviation (in parenthesis) of genetic variance component obtained with GCA-model without including residual genetic effects from Dent and Flint groups

Model Code	Additive		Dominance	Epistasis
Model Code	σA12,σA22 or σA(H)2		σD2 or σD(H)2	σAA1,12	σAA2,22	σAA1,22 or σAAH2	σe2	H2
GCA:A	33.89 (5.52)	23.35 (4.56)					18.01 (0.79)	0.76 (0.02)
GCA:AD	31.89 (5.20)	22.56 (4.46)	4.38 (0.77)				15.03 (0.80)	0.80 (0.02)
GCA:AAA(1,2)	31.38 (5.13)	22.53 (4.42)				5.58 (0.96)	13.68 (0.85)	0.81 (0.02)
GCA:ADAA1,2	31.08 (5.06)	22.11 (4.42)	2.97 (0.58)			4.20 (0.87)	13.36 (0.81)	0.82 (0.02)
GCA:ADAA1,1AA2,2AA1,2	22.30 (5.20)	14.42 (4.11)	2.55 (0.56)	7.19 (2.50)	8.06 (2.89)	3.63 (0.82)	13.53 (0.81)	0.81 (0.02)

GCA-model is a model that successively added additive effects (), dominance effects (), and additive-by-additive genetic effects (). The additive-by-additive epistatic effects can be interactions between loci within group ( and ), across groups or within hybrids . Superscripts 1 and 2 in parenthesis refers to Dent and Flint heterotic groups, respectively.

In GCA-model, the variances are: additive ( and ), dominance (), and additive-by-additive epistasis within groups ( and ) and additive-by-additive epistasis between groups (). is the residual variance and is the genomic broad-sense heritability.

43 in total

1. The impact of genetic architecture on genome-wide evaluation methods.

Authors: Hans D Daetwyler; Ricardo Pong-Wong; Beatriz Villanueva; John A Woolliams
Journal: Genetics Date: 2010-04-20 Impact factor: 4.562

2. Comparing estimates of genetic variance across different relationship models.

Authors: Andres Legarra
Journal: Theor Popul Biol Date: 2015-09-02 Impact factor: 1.570

3. A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis.

Authors: José M Alvarez-Castro; Orjan Carlborg
Journal: Genetics Date: 2007-04-03 Impact factor: 4.562

4. On the additive and dominant variance and covariance of individuals within the genomic selection scope.

Authors: Zulma G Vitezica; Luis Varona; Andres Legarra
Journal: Genetics Date: 2013-10-11 Impact factor: 4.562

5. Orthogonal Estimates of Variances for Additive, Dominance, and Epistatic Effects in Populations.

Authors: Zulma G Vitezica; Andrés Legarra; Miguel A Toro; Luis Varona
Journal: Genetics Date: 2017-05-18 Impact factor: 4.562

6. Beyond Genomic Prediction: Combining Different Types of omics Data Can Improve Prediction of Hybrid Performance in Maize.

Authors: Tobias A Schrag; Matthias Westhues; Wolfgang Schipprack; Felix Seifert; Alexander Thiemann; Stefan Scholten; Albrecht E Melchinger
Journal: Genetics Date: 2018-01-23 Impact factor: 4.562

7. Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato.

Authors: Jeffrey B Endelman; Cari A Schmitz Carley; Paul C Bethke; Joseph J Coombs; Mark E Clough; Washington L da Silva; Walter S De Jong; David S Douches; Curtis M Frederick; Kathleen G Haynes; David G Holm; J Creighton Miller; Patricio R Muñoz; Felix M Navarro; Richard G Novy; Jiwan P Palta; Gregory A Porter; Kyle T Rak; Vidyasagar R Sathuvalli; Asunta L Thompson; G Craig Yencho
Journal: Genetics Date: 2018-03-07 Impact factor: 4.562

8. Genomic Prediction of Single Crosses in the Early Stages of a Maize Hybrid Breeding Pipeline.

Authors: Dnyaneshwar C Kadam; Sarah M Potts; Martin O Bohn; Alexander E Lipka; Aaron J Lorenz
Journal: G3 (Bethesda) Date: 2016-11-08 Impact factor: 3.154

9. Improving Short- and Long-Term Genetic Gain by Accounting for Within-Family Variance in Optimal Cross-Selection.

Authors: Antoine Allier; Christina Lehermeier; Alain Charcosset; Laurence Moreau; Simon Teyssèdre
Journal: Front Genet Date: 2019-10-29 Impact factor: 4.599

Review 10. Non-additive Effects in Genomic Selection.

Authors: Luis Varona; Andres Legarra; Miguel A Toro; Zulma G Vitezica
Journal: Front Genet Date: 2018-03-06 Impact factor: 4.599

5 in total

1. Genomic prediction of hybrid performance: comparison of the efficiency of factorial and tester designs used as training sets in a multiparental connected reciprocal design for maize silage.

Authors: Alizarine Lorenzi; Cyril Bauland; Tristan Mary-Huard; Sophie Pin; Carine Palaffre; Colin Guillaume; Christina Lehermeier; Alain Charcosset; Laurence Moreau
Journal: Theor Appl Genet Date: 2022-08-02 Impact factor: 5.574

Review 2. Genomic Prediction: Progress and Perspectives for Rice Improvement.

Authors: Jérôme Bartholomé; Parthiban Thathapalli Prakash; Joshua N Cobb
Journal: Methods Mol Biol Date: 2022

3. Improving genomic predictions with inbreeding and nonadditive effects in two admixed maize hybrid populations in single and multienvironment contexts.

Authors: Morgane Roth; Aurélien Beugnot; Tristan Mary-Huard; Laurence Moreau; Alain Charcosset; Julie B Fiévet
Journal: Genetics Date: 2022-04-04 Impact factor: 4.402

4. The utility of genomic prediction models in evolutionary genetics.

Authors: Suzanne E McGaugh; Aaron J Lorenz; Lex E Flagel
Journal: Proc Biol Sci Date: 2021-08-04 Impact factor: 5.530

5. Temporal and genomic analysis of additive genetic variance in breeding programmes.

Authors: Letícia A de C Lara; Ivan Pocrnic; Thiago de P Oliveira; R Chris Gaynor; Gregor Gorjanc
Journal: Heredity (Edinb) Date: 2021-12-15 Impact factor: 3.821

5 in total