Literature DB >> 28916649

Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations.

Pascal Schopp1, Dominik Müller1, Yvonne C J Wientjes2, Albrecht E Melchinger3.   

Abstract

A major application of genomic prediction (GP) in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs), experimental studies found substantial variation in prediction accuracy (PA), but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize (Zeamays L.) as parents to generate in silico 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP) models trained with individuals from either full-sib (FSF), half-sib (HSF), or unrelated families (URF) for various sizes ([Formula: see text]) of the training set and different heritabilities ([Formula: see text] In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41-0.97) with large variation only for [Formula: see text] and [Formula: see text] [Formula: see text] For HSF and URF, PA was on average ∼40-60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.
Copyright © 2017 Schopp et al.

Entities:  

Keywords:  GBLUP; GenPred; Genomic Selection; Shared Data Resources; biparental families; deterministic accuracy; genomic prediction; linkage disequilibrium; plant breeding

Mesh:

Year:  2017        PMID: 28916649      PMCID: PMC5677162          DOI: 10.1534/g3.117.300076

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


With the advent of low-cost genome-wide SNP markers, genomic prediction (GP, see Supplemental Material, Table S1 in File S1 for full list of abbreviations) proposed by Meuwissen has become a powerful tool in animal and plant breeding. The basic idea of GP is to combine the phenotypic and genotypic data of training individuals in a model for predicting the genetic merit of selection candidates that have only been genotyped. Complementing, or even replacing phenotyping can result in considerable cost savings and shortening of breeding cycles, thereby giving GP a big advantage over traditional selection methods (Bernardo and Yu 2007; Goddard and Hayes 2007; Lin ). Particular challenges of GP in plant breeding arise from (i) the specific population structures mostly characterized by multiple related or unrelated segregating biparental families (BPFs) derived from crosses between inbred parents, and (ii) small samples sizes available for model training (Jannink ). In commercial breeding of line and hybrid cultivars, up to several hundred BPFs are newly generated every year. Depending on the species and size of the breeding program, each family can comprise a variable number (usually <250) of lines, developed either by recurrent selfing or the doubled-haploid (DH) technology (Albrecht ). Since expected differences among BPFs can be reliably predicted based on the mean performance of their parents (Melchinger 1987), GP applied to populations comprising multiple BPFs aims primarily at the identification of superior lines within these families (Riedelsheimer ). Prediction models such as genomic best linear unbiased prediction (GBLUP) allow capturing Mendelian sampling—responsible for variation in the breeding values of siblings within BPFs—through cosegregation of SNP markers with quantitative trait loci (QTL) (Habier ). While several studies have investigated the accuracy of GP within and across BPFs, more attention is needed to assess the mean and variation of PA for training sets taken from full-sib (FSF), half-sib (HSF) or unrelated families (URF). Experimental results available so far are confined by the number and size of BPFs (Riedelsheimer ; Lehermeier ) and low marker density (Jacobson ; Lian ). Model training with individual BPFs has been studied intensively, and PA has been generally more promising for “within-family GP” than “across-family GP” (Riedelsheimer ). Various authors argued that for a given size of the training set, within-family GP would provide the highest possible PA owing to strong linkage disequilibrium (LD) between SNPs and QTL due to cosegregation and the same set of loci being polymorphic in the prediction and training set (Crossa ; Lehermeier ). Nevertheless, Lian reported for within-family GP substantial variation in PA among 969 BPFs and various traits, in line with the results of other studies on BPFs (Riedelsheimer ; Jacobson ; Lehermeier ). However, a systematic investigation on the extent and factors determining the mean and variation in PA among BPFs and traits is, to the best of our knowledge, not available to date. Since PA increases with closer pedigree relationships between training and predicted individuals (Habier ; Clark ), one obvious strategy is to use HSFs with one common parent between the training family (BPFtrain) and the predicted family (BPFpred) in across-family GP. Compared to within-family GP, PA for this strategy was generally much lower with the same sample size, but can reach similar levels if the sample size is strongly extended (Lehermeier ). By comparison, model training with only unrelated BPFs produced from the same ancestral population yields often poor or even negative PA (Riedelsheimer ; Jacobson ; Schopp ). Optimizing training set designs in GP with BPFs therefore requires better insights into how the pedigree relationship between BPFs, the sample size, and the heritability affect the mean and the variation in PA. Herein, we address these factors for the simple case of GP across individual pairs of BPFs, thereby providing a starting point for further investigations on the design of multi-family training sets in plant breeding. Forecasting PA based on existing molecular and phenotypic data could assist breeders in (i) choosing the most suitable BPFs for model training for prediction of existing or planned BPFs, and (ii) allocating resources to the training and prediction sets. Daetwyler , 2010) derived a deterministic equation for forecasting PA, which requires only population parameters (sample size heritability and the effective number of chromosome segments When averaged over several traits, empirical and deterministic accuracy agreed well within BPFs (Lorenz 2013; Riedelsheimer ; Lian ). There is little consensus, however, regarding the calculation of in general (Goddard 2009; Meuwissen and Goddard 2010; Goddard ; Wientjes ), and, specifically, for BPFs (Lorenz 2013; Riedelsheimer and Melchinger 2013; Lian ). Recently, Daetwyler’s equation was applied to both GP within and across cattle breeds (Wientjes , 2015). The authors extended Goddard approach for calculating from the variance of genomic relationship coefficients to multiple populations. Overestimation of PA was attributed to a violation of Daetwyler’s assumption that the genetic variance in the prediction set is fully explained by marker effects estimated in the training set. An aggravation of this problem is expected for across-family GP with BPFs due to a high fraction of QTL and markers that are not consistently polymorphic across BPFs. Herein, we propose to extend Daetwyler’s equation to cope with this problem and make the equation applicable to across-family GP in plant breeding. Alternatively, PA can be forecasted based on the estimated reliability of genomic-estimated breeding values (GEBVs) derived from selection index theory (VanRaden 2008). However, this approach has rarely been applied in plant breeding (Akdemir ; He ), and, to the best of our knowledge, not to GP of individual BPFs, despite promising results for GP within and across breeds of cattle (Hayes ; Wientjes , 2015). One problem is that the approach was developed for outbred populations, and needs modifications when applied to inbred genotypes. Moreover, several strict assumptions regarding the properties of the genomic relationship matrix must be satisfied to obtain meaningful results, which will be elaborated in this paper for the case of BPFs in plant breeding. The objectives of our study were to (i) investigate the mean and variation of empirical PA within and across BPFs of inbred lines, (ii) examine how the variation in PA is affected by differences in polymorphism at causal loci of polygenic traits between the training and prediction set, as well as by other factors (e.g., level of ancestral LD, pedigree relationship between BPFs, sample size, heritability), and (iii) adapt equations for deterministic forecasting of PA in BPFs of inbred genotypes and demonstrate their usefulness in simulated data sets. To simulate realistic scenarios, we used SNP data of inbred lines taken either from a public maize breeding program or a DH library of a European maize landrace and generated in silico numerous BPFs of DH lines. Besides flexibility in the choice of sample sizes, and exclusion of nuisance factors uncontrollable in experimental studies, this allowed us to simulate traits with known genetic architecture for a profound analysis of the causal factors affecting PA of GP within and across BPFs.

Materials and Methods

Ancestral populations

We considered two ancestral populations as source germplasm of parental genotypes for generating BPFs. Ancestral population Elite consisted of 72 elite inbred lines with medium long-range LD (Figure S1A in File S1) representative for the Flint heterotic group of the maize breeding program of the University of Hohenheim. Ancestral population Landrace consisted of 40 DH lines derived without any intentional selection from the German maize landrace “Gelber Badischer” with a rapid decay of LD to a low level (Melchinger ). All lines were genotyped with the Illumina chip MaizeSNP50, containing 57,841 SNPs, and were expected to be fully homozygous. Markers monomorphic in the ancestral population or heterozygous in at least one individual were removed for further analysis. Physical map positions were converted into genetic map positions required for simulating meioses as described by Schopp . In total, we retained 19,204 and 16,171 SNPs for Elite and Landrace, respectively, distributed over the 10 maize chromosomes ranging in length from 137 to 276 cM (1913 cM in total). Individuals in the ancestral population were regarded as unrelated for defining pedigree relationships between subsequently generated BPFs.

Simulation of BFPs

For generating BPFs, we first sampled at random = 25 parent lines from each ancestral population, and intermated them according to a half-diallel design to generate all possible crosses. Subsequently, 1500 DH lines were derived from each F1 cross to obtain the BPFs used for further analyses. According to the half-diallel, each predicted family (BPFpred ) was associated with several possible training families (BPFtrain ) with different pedigree relationships to These were: one FSF, corresponding to ; HSF sharing one common parent with ; and (iii) URF sharing no common parent with Meioses for in silico production of DH lines were simulated with the R package Meiosis (Müller and Broman 2017).

Description of factors analyzed

For systematic assessment of the factors influencing the distribution of the empirical PA, we defined various fixed and random factors (Table 1). As fixed factors, we considered (i) the ancestral population (Elite or Landrace), (ii) the pedigree relationship (FSF, HSF, or URF) between individuals in BPFpred and BPFtrain, (iii) the type of data (SNP marker genotypes or QTL genotypes) used to calculate the genomic relationship matrix for GBLUP, (iv) the sample size , and (v) the heritability of the trait The idealistic scenario was included to demonstrate how the variation in PA behaves when phenotypic accuracy is not a limiting factor. Random factors were the trait the BPFpred the BPFtrain as well as the actual sample of training individuals taken from
Table 1

Overview of factors with their corresponding levels analyzed in this study

TypeFactorModel ParameterNumber of Factor LevelsFactor Levels
Fixed factorsAncestral population2Elite, Landrace
Pedigree relationship between training and predicted family3FSF, HSF, URF
Data used to calculate the relationship matrix2QTL, SNPs
Sample size (Ntrain)325, 100, 250
Heritability (h2)30.3, 0.6, 1
Random factorsTraitT50
Predicted family (BPFpred)A50
Training family (BPFtrain)B1 (FSF), 25 (HSF/URF)
Training set sampleR3

Default values for the standard scenario are indicated in boldface.

Default values for the standard scenario are indicated in boldface. We simulated 50 truly polygenic traits = each governed by 1000 QTL. First, we sampled at random a subset of 5000 SNP markers from all SNPs available in the ancestral population, corresponding to a marker density of 2.61 SNPs cM−1. This fixed set of marker was used for GP of all traits, because resampling of SNP marker positions had a negligible influence on the results. Second, for each of the 50 traits we sampled at random the map positions of 1000 QTL from the remaining 14,204 and 12,171 SNPs in Elite and Landrace, respectively. Following Meuwissen , effects of each QTL were drawn from a Gamma distribution with equal probability of effect signs. Importantly, all traits were affected by the same number of loci, but differed in the position and effects of QTL. Thus, the realized number of polymorphic QTL loci could vary depending on the trait and the BPFpred and BPFtrain. Phenotypes of training individuals were simulated according to the model (cf. Goddard ), where is the vector of true breeding values (TBVs) calculated as is the matrix of genotypic scores at QTL coded as 2 or 0, depending on whether a DH line was homozygous for the 1 or 0 allele, respectively, and is the vector of QTL effects. Vector contains independent normally distributed environmental noise variables, where variance was assumed to be constant across BPFs derived from one ancestral population, implying independent environmental influence on the phenotypes. We calculated where is the a priori specified heritability (cf. Table 1) and is the genetic variance within a BPF, averaged across all 300 BPFs and 50 traits simulated. Finally, we sampled at random 50 out of the 300 BPFs, and considered them individually as the predicted family BPFpred From the 1500 DH lines in each BPFpred, we estimated GEBVs for the first 500 lines. For within-family GP, training individuals were sampled from the remaining 1000 lines to predict individuals within the same family ( FSF). For across-family GP ( HSF or URF), 25 BPFtrain serving individually for model training were sampled from the 46 available HSFs and the 253 available URFs, respectively. For given BPFpred and BPFtrain, we sampled from BPFtrain three disjunct samples of individuals of size (according to the fixed factor “sample size,” Table 1) with which the prediction model was trained. To minimize variation in PA attributable to sampling individuals from the BPFpred, we chose By contrast, the numbers were of realistic magnitude, and analyzing repeated samples allowed us to quantify the variation in PA due to finite sampling in BPFtrain.

Genomic prediction model

The GBLUP model can be written as where is the general mean, is an incidence matrix linking phenotypes with breeding values, is the vector of random breeding values with mean zero and variance-covariance matrix where is the genomic relationship matrix and and are the additive variances in the noninbred reference population of BPFpred and BPFtrain, respectively, which correspond to their (outbred) F2 generation. and are matrices of 1’s, is the genetic correlation between populations and which was assumed to be equal to 1 for reasons detailed in the discussion, and ∘ symbolizes the Hadamard product. Vector contains random residuals with mean zero and where is an identity matrix and is the residual error variance. We used representing a modified version of the block-structured genomic relationship matrix devised by Chen , where the across-population blocks had elementsand and are the genotypic scores of DH lines and in population and at locus respectively, coded as 2 and 0, and and are the allele frequencies at locus in and respectively, where or depending on whether QTL or SNPs were used to calculate (according to the fixed factor “data,” Table 1). Submatrices and are calculated accordingly, but here the denominator simplifies to and respectively, corresponding to the standard matrix without subpopulation structure (Habier ; VanRaden 2008). Importantly, the denominator for matrix in Equation 1 is different from that in Chen , who used Their approach effectively removes all loci that are monomorphic in and/or whereas our denominator retains these loci in the scaling of yielding a better approximation of the true relationship matrix, as discussed below. In any BPF derived from fully homozygous parents, the expected allele frequency of a locus is known to be either 0, 0.5, or 1, depending on the genotypes of the parents. These expected frequencies were used in the computation of genomic relationships. Since, in our study, only population had phenotypes, we used a single-group GBLUP model. Although we allowed for heterogeneous genetic variances among BPFs in the general model (Equation 1) and the derivation of reliability described below (see Appendix B), enters the computation of GEBVs in as a constant factor (see Equation B4) and, hence, does not affect the empirical PA. Estimates and for BPFtrain were obtained by restricted maximum likelihood from the individuals in the training set using the mixed.solve function from R-package rrBLUP (Endelman 2011). The empirical PA was calculated as the correlation between GEBVs and the TBVs for the 500 predicted individuals in BPFpred.

Analysis of variance of empirical prediction accuracies

For each possible combination of fixed factors (cf. Table 1), we partitioned the total variance of the empirically observed PA into variance components caused by each random factor, where we assumed a hierarchical structure for BPFpred BPFtrain and the training set sample as well as cross-classification with factor trait Estimates of the variance components were obtained from the following random-effects model using function lmer of R package lme4 (Bates ):where is the overall mean of PA for each of the three pedigree relationships (FSF, HSF, and URF) between individuals in and analyzed; is the effect of the BPFpred; is the effect of the BPFtrain nested within ; is the effect of the th sample of training individuals from nested within ; T is the effect of the trait, is the interaction effect of BPFpred with trait ; is the interaction effect of BPFtrain nested within with trait ; and is the interaction effect of the training set sample nested within with trait which corresponds to the residual error of the model. In the case of FSF (), all random factors involving were dropped. The degrees of freedom for each factor are shown in Table S2 in File S1.

Deterministic equations for forecasting prediction accuracy (PA)

We followed the theoretical framework of Wientjes for forecasting PA within and across populations using two deterministic equations. Both equations assume that actual relationships regarding QTL are known, and were originally developed for outbred individuals. Hence, modifications are required to apply the equations to inbred individuals. As mentioned above, the outbred reference population corresponding to a BPF of fully inbred (DH) lines with an inbreeding coefficient of is the F2 generation. The level of inbreeding in BPFs of DH lines is reflected in the diagonal elements of calculated according to Equation 1, yielding in the special case of BPFs derived from homozygous parents. The first approach is based on the reliability of GEBVs of each individual in (VanRaden 2008; Wientjes , 2015). Using the formula for the reliability of a selection index given by Mrode (2005, p. 15) and replacing the genetic covariance matrices by the genomic relationship matrices [multiplied by the corresponding genetic (co)variance components] yields the following formula that accounts for inbreeding in the predicted individual (see Appendix B):where is the squared genetic correlation between and (here ), is the vector of genomic relationships of individual in with all training individuals of is an identity matrix when assuming independent residual error variancesand is the relationship of individual with itself, providing an estimate of Dividing by assures that reliabilities are correctly scaled, given that variance components and inbreeding refer to an outbred reference population, as is the case when calculating according to Equation 1 (see Appendix B). The deterministic PA in population was subsequently obtained by averaging over all individuals in as where in our case The second equation was proposed by Daetwyler , 2010) and is based solely on population parameters, which was modified to account for unexplained variance in by accounting for different markers segregating in and (in cases where ):with where is the number of markers that segregate in both and in and is the number of markers that segregate in is the sample size, where is the average inbreeding coefficient of the individuals in refers to the estimated additive variance in the (outbred) F2 generation of and is the effective number of chromosome segments. Wientjes proposed an estimator for across outbred populations, which is calculated aswhere contains all genomic relationships between individuals from and training individuals from Given a uniform pedigree relationship between individuals in and (e.g., FSF, HSF, and URF), the denominator simplifies to because If the individuals from and from have inbreeding coefficients and respectively, we propose to use (see Appendix C):For DH lines from BPFs, and so that which was herein used as estimator for

Comparison of empirical and deterministic prediction accuracies

For all analyses except the ANOVA of we considered only one sample of training individuals and dropped index altogether. This simplifies the presentation of our results and corresponds to the realistic case of having only one specific sample of training individuals available. For comparison of PA between fixed factors (e.g., between samples sizes, heritabilities or ancestral populations), as well as for evaluating the overall agreement of empirical and deterministic PAs, we calculated the general mean of PA across all random factors and subsequently denoted as and for the empirical PA and the two deterministic PAs, respectively.

Causal analysis of the variation in PA among traits in GP across BPFs

Preliminary analyses showed that PA varied substantially among traits in across-family GP for HSFs and URFs, although we assumed the same polygenic architecture for all 50 simulated traits. Therefore, we devised additional simulations to investigate the underlying cause(s), using assumptions warranting almost ideal conditions for GP to largely eliminate the influence of nuisance factors on PA. We restricted these simulations to HSFs to demonstrate the key points in a simple fashion. First, we chose at random (i) a pair of HSFs BPFpred and BPFtrain produced from ancestral population Elite, and (ii) repeatedly sampled 1000 QTL positions from the entire set of 19,204 SNPs until we found a sample with corresponding to the average value of for HSF in our study (Table 2). Second, given and and the 1000 QTL positions, we sampled 1000 sets of different QTL effects as described above. This resulted in 1000 traits with and identical QTL positions, but different QTL effects. Finally, assuming and known QTL genotypes, we used RR-BLUP—yielding equivalent GEBVs as GBLUP (Habier )—to identify among the 1000 traits the two with lowest and highest PA and retrieved the corresponding QTL effect estimates.
Table 2

Mean (SD) of the estimated number of effective chromosome segments () and the proportion of polymorphic loci in the predicted family that also segregate in the training family () with different pedigree relationships (FSF, HSF, and URF) between and derived either from ancestral populations Elite or Landrace.

Ancestral PopulationPedigree RelationshipMe ± SDθAB±SD
EliteFSF21.00 ± 2.271.00 ± 0.00
HSF66.26 ± 27.030.50 ± 0.10
URF148.16 ± 77.870.40 ± 0.08
LandraceFSF22.24 ± 2.051.00 ± 0.00
HSF72.48 ± 24.830.50 ± 0.08
URF172.33 ± 77.030.40 ± 0.06
We surmised that variation in PA among traits arises from structural differences in the large chromosome segments containing cosegregating QTL alleles that DH lines inherit from their respective parents. To investigate this hypothesis, we analyzed the contribution of each chromosome segment along the entire genome to PA. The length of the chromosome segments within and was taken as the expected genetic map distance at which the LD between two QTL in BPFs falls below (cf. Giraud ), which amounted to cM (cf. File S3 in Schopp ). Using a sliding window approach, chromosome segments of this length moved in steps of 5 cM along each chromosome separately for each trait. Similar to Kemper , we subsequently calculated for each window the “local” TBV for all DH lines in the BPFpred aswhere is the genotypic score coded (2,0) for DH line at QTL and is the corresponding QTL effect. Analogously, we calculated the local GEBV in the BPFpred aswhere is the estimate of obtained from RR-BLUP in BPFtrain provided segregated in and otherwise Subsequently, we calculated for each window the correlation between local TBVs and local GEBVs among all 500 DH lines in Further, we defined chromosome segment substitution effects () for the parental chromosome segments of as the sum of allele substitution effects across all QTL where and are the parents of with being the common parent of and Thus, if and carry different alleles at QTL and otherwise. Values were calculated analogously with respect to parents and of Note that if QTL segregates in both and i.e., and carry the same allele that is different from the allele in In contrast, implies that QTL segregates in exactly one of the two HSFs or Thus, only if at one or more QTL and the magnitude of this difference depends on (i) the subset of QTL with (ii) the relative size of for each QTL in compared with the effects of other QTL in the genome, and (iii) whether these effects have identical sign or not, which is important, especially for QTL that are closely linked. Altogether, the magnitude of and its difference to for each trait along the genome were expected to strongly influence the PA of GEBVs in BPFpred estimated on the basis of BPFtrain All computations were carried out in the R statistical environment (R Core Team 2017).

Data availability

Genotypic data of the ancestral populations is available in File S2. All R packages used for simulating the data are publicly available. All simulation steps and equations are fully described within the manuscript.

Results

Means and variation of empirical PA

Figure 1A shows the distributions of empirical PA For the standard scenario (ancestral population Elite, and calculated from SNP markers, Table 1), the mean PA () across all pairs of BPFpred and BPFtrain and traits was highest for FSF (0.79, Table S3 in File S1), and decreased by 43% for HSF (0.45) and by 60% for URF (0.32). A reverse trend was observed for the SD of which amounted to 0.09 for FSF and more than doubled for HSF (0.20) and URF (0.22). The 5 and 95% quantiles of ranged from 0.61 to 0.89 for FSF, but from to for HSF and from to for URF.
Figure 1

(A) Boxplots of empirical prediction accuracies in BPFs of DH lines, and (B) variance components of different factors influencing the variation of Parents of BPFs were sampled from ancestral population Elite, and SNP markers were used to calculate the genomic relationship matrix Results are shown for different pedigree relationships (FSF, HSF, and URF) between the predicted family (BPFpred) and training family (BPFtrain) as well as for different sample sizes and heritabilities

(A) Boxplots of empirical prediction accuracies in BPFs of DH lines, and (B) variance components of different factors influencing the variation of Parents of BPFs were sampled from ancestral population Elite, and SNP markers were used to calculate the genomic relationship matrix Results are shown for different pedigree relationships (FSF, HSF, and URF) between the predicted family (BPFpred) and training family (BPFtrain) as well as for different sample sizes and heritabilities For reducing from to 25 resulted in – lower and increasing to 250 resulted in 12–18% higher for all pedigree relationships (Figure 1A). The SD increased for by 84% for FSF, but only by and for HSF and URF, respectively, because it was already large under For the SD reduced by for FSF, yet only by 6% for HSF and for URF. Altering for affected the PA similarly as altering under fixed In comparison with was reduced by– for and increased by – for depending on the pedigree relationship. The corresponding SDs changed considerably for FSF (+57 and −68%), but only marginally for HSF (8 and −11%) and URF (4 and −7%). Deriving BPFs from ancestral population Landrace instead of Elite generally reduced by <0.05, whereas the SD remained nearly identical (Figure 2A and Table S3 in File S1). By comparison, calculating the matrix from QTL instead of SNP data increased by only 0.02, 0.03, and 0.05 for FSF, HSF, and URF, respectively, but hardly affected the SD, regardless of the pedigree relationship and the ancestral population.
Figure 2

(A) Boxplots of empirical prediction accuracies in BPFs of DH lines and (B) variance components of different factors influencing the variation of Parents of BPFs were sampled from ancestral population Elite (left) or Landrace (right), and either genotypes at SNP markers or at QTL were used to calculate the genomic relationship matrix Results are shown for different pedigree relationships (FSF, HSF, and URF) between the predicted family (BPFpred) and training family (BPFtrain) and refer to and

(A) Boxplots of empirical prediction accuracies in BPFs of DH lines and (B) variance components of different factors influencing the variation of Parents of BPFs were sampled from ancestral population Elite (left) or Landrace (right), and either genotypes at SNP markers or at QTL were used to calculate the genomic relationship matrix Results are shown for different pedigree relationships (FSF, HSF, and URF) between the predicted family (BPFpred) and training family (BPFtrain) and refer to and

Analysis of variance of random factors affecting the empirical PA

Estimates of for were of similar magnitude for HSF and URF, but generally much smaller for FSF (Figure 1B). For the standard scenario, was small for FSF (0.01) and primarily attributable to By comparison, was 5.3 and 6.6 times larger for HSF and URF, respectively, with >50% contributed by followed by the residual variance (26 and 19%, respectively). All variance components not involving factor were substantially smaller, with contributing most for HSF (9%) and URF (6%). Decreasing to 25 or to 0.3 affected the relative importance and overall magnitude of the variance components similarly for the three pedigree relationships (Figure 1B). The residual variances (FSF) and (HSF, URF) increased substantially, accompanied by a moderate increase in for FSF and decrease in for HSF and URF. Conversely, increasing to 250 or to 1.0 strongly reduced the residual variances and nearly eliminated for FSF, whereas, for HSF and URF, remained large owing to a high even under these favorable conditions. Deriving BPFs from ancestral population Landrace instead of Elite had almost no effect on and its components (Figure 2B). Calculating the matrix from QTL instead of from SNP genotypes moderately reduced by 5% for HSF and 10% for URF, mainly due to decreasing In contrast to HSF and URF, for FSF was already minor when using SNP genotypes, leaving less room for improvement when using QTL instead of SNP genotypes than for HSF and URF, which both showed bigger changes in the absolute magnitude of the variance components than FSF. Figure 3 shows scatter plots for empirical versus deterministic prediction accuracies for the standard scenario. In general, empirical and deterministic accuracies for single traits agreed relatively well for FSF ( and ), but rather weakly for HSF ( and , respectively) and URF ( and , respectively). By comparison, the correlations between the means of empirical and deterministic accuracies across the 50 traits increased for FSF ( and ), but even more so for HSF (0.94 and 0.92, respectively) and URF (0.89 and 0.88, respectively), indicating that trait-specific deviations from the mean empirical accuracy hampers the agreement with deterministic accuracies, particularly for HSF and URF.
Figure 3

Empirical prediction accuracy in BPFs of DH lines plotted against deterministic prediction accuracies and The top two graphs refer to observations for single traits ( for FSF and otherwise), and the bottom row to means over traits ( for FSF and otherwise). Parents of BPFs were sampled from ancestral population Elite and genotypes at SNP markers were used to calculate the genomic relationship matrix Results are shown for a random sample of 10,000 data points, and

Empirical prediction accuracy in BPFs of DH lines plotted against deterministic prediction accuracies and The top two graphs refer to observations for single traits ( for FSF and otherwise), and the bottom row to means over traits ( for FSF and otherwise). Parents of BPFs were sampled from ancestral population Elite and genotypes at SNP markers were used to calculate the genomic relationship matrix Results are shown for a random sample of 10,000 data points, and For the general mean of empirical and deterministic PA across and matched very well with for all pedigree relationships and values of and (Figure S2 in File S1). By comparison, generally underestimated with increasing bias for HSF and URF as compared with (Figure S3 in File S1), and particularly for smaller values of and (Figure S2 in File S1). Calculating the matrix from QTL instead of from SNP genotypes hardly influenced the bias of deterministic accuracies (Figure S4 in File S1) and the correlations with empirical accuracies.

Causal analysis of the variation in PA among traits

Figure 4 compares two traits T1 and T2 with divergent PA for one representative pair of HSFs. For both traits with identical QTL positions and QTL genotypes in the BPFpred and BPFtrain B, but different QTL effects, 376 QTL segregated in 286 in and 151 of them jointly in and For trait T1 with high the differences between chromosome segment substitution effects (CSSE) in and were generally small across the entire genome, in particular on chromosomes 2, 3, and 9, with sizeable CSSEs (Figure 4A). Conversely, for trait T2 with low the CSSEs in and differed substantially over large parts of the genome, and showed even opposite signs on several chromosomes.
Figure 4

(A) Chromosome segment substitution effects (CSSE, in red and CSSE, in blue) and correlation between local TBVs and local GEBVs in the predicted family (green) averaged in sliding windows (see Materials and Methods for definition). GEBVs were calculated from QTL effects estimated by RR-BLUP in training set (HSF) Results are shown for and two traits T1 and T2 with and large differences in prediction accuracy Both traits were generated from the same set of 1000 QTL with but different QTL effects. (B) Correlation between local TBVs and local GEBVs (green lines) shown together with true QTL effects (diamonds) and estimated QTL effects (circles) for T1 and T2 in on chromosome 5. Colors indicate QTL segregating in both and (orange) or only in (purple); grey bars in the background reflect the windows

(A) Chromosome segment substitution effects (CSSE, in red and CSSE, in blue) and correlation between local TBVs and local GEBVs in the predicted family (green) averaged in sliding windows (see Materials and Methods for definition). GEBVs were calculated from QTL effects estimated by RR-BLUP in training set (HSF) Results are shown for and two traits T1 and T2 with and large differences in prediction accuracy Both traits were generated from the same set of 1000 QTL with but different QTL effects. (B) Correlation between local TBVs and local GEBVs (green lines) shown together with true QTL effects (diamonds) and estimated QTL effects (circles) for T1 and T2 in on chromosome 5. Colors indicate QTL segregating in both and (orange) or only in (purple); grey bars in the background reflect the windows The correlation between local TBVs and local GEBVs of the DH lines were closely associated with the differences between the CSSEs for and in the corresponding windows (Figure 4A). If the difference in the CSSE for a segment was small, the correlation was generally high, particularly if both CSSEs in and had large magnitude and identical sign (see chromosomes 2, 3 and 9 for trait T1). Conversely, if the CSSEs for a window differed and had opposite sign in and the correlation between local TBV and local GEBV dropped substantially, and frequently became negative (see chromosomes 2, 5, and 8 for trait T2). Overall, the proportion of the genome showing low or even negative correlations was much smaller for trait T1 with high PA than for trait T2 with low PA. Zooming into chromosome 5—which had a large impact on the differences between the two traits—revealed that for trait T1, all large-effect QTL that segregated in also segregated in (Figure 4B). However, for trait T2, there was a large-effect QTL that segregated only in in windows with low correlation between local TBVs and local GEBVs. Neighboring windows not harboring this QTL showed higher correlations. The trends for this exemplary chromosome were consistent with other chromosomes and other HSF pairs and as well as other traits with high and low PA (results not shown).

Discussion

Experimental studies showed that PA can be highly variable for GP within, but even more so across BPFs. Moreover, PA was found to vary substantially among different target traits for distinct pairs of training and predicted families. Investigating the causes for this variability is hardly possible based on experimental data due to the limited number and sample size of available BPFs, and the generally unknown genetic architecture of agronomically important traits. Here, we used computer simulations to analyze in detail why PA varies among different combinations of training sets, prediction sets, and polygenic traits. Moreover, we demonstrate that modification of available deterministic equations enables accurate estimates of PA averaged across many polygenic traits for both within-family GP and across-family GP.

Variation in PA within and across biparental families

The average PA decreased under small and low (Figure 1A) for all pedigree relationships, as expected from theory (Daetwyler ). This was always accompanied by a large increase in the variation of PA (Figure 1A), which was mainly caused by inflated residual errors [ for FSF, for HSF and URF, Figure 1B]. These errors capture the variation in PA that arises due to the random sampling of (i) individuals (genotypes) from the BPFtrain, and (ii) their corresponding phenotypes for a specific trait. The larger residual errors in across-family GP are presumably due to incongruent sets of QTL segregating in pairs of HSFs and URFs, which can vary substantially across traits, as reflected by the SD of (Table 2). The fact that predictions became much more robust under 100 and illustrate that large sample sizes and heritabilities are mandatory to alleviate the trait-specific sampling variance in PA. Together with the generally optimal conditions in within-family GP (Crossa ), this nearly eliminated all variation in PA for FSF (Figure 1). The predicted family BPFpred accounted only for a marginal proportion of variation in PA, irrespective of the pedigree relationship with BPFtrain (Figure 1B, ). For within-family GP (where BPFtrain = BPFpred), this implies that the genetic distance between the parents of a BPF has at best marginal influence on the average PA across traits, in agreement with previous studies (Lehermeier ; Marulanda ). This conclusion is further supported by the similar variation in PA among predicted families derived from the two ancestral populations ( Figure 2B, FSF), despite the much weaker latent pedigree structure in Landrace compared with Elite (Figure S1B in File S1). By comparison, the generally substantial influence of in FSF (Figure 1B and Figure 2B) suggests that PA strongly depends on in the training set (Figure S5 in File S1), which can be highly variable among BPF × trait combinations (Figure S6 in File S1). This is in harmony with previous studies that attributed variation in PA partially to differences in the phenotypic variance of the training set (Lehermeier ; Marulanda ). For across-family GP, the expected PA depends largely on the pedigree relationship (Habier ; Riedelsheimer ) and on the variation in across-family genomic relationships. Since genomic relationships across families have a zero mean (if calculated according to Equation 1), their variation is equal to the mean squared genomic relationship between training and predicted individuals (Wientjes ). Generally, PA is expected to increase proportionally with these squared relationships. In the case of BPFs, genomic relationships between families are heavily influenced by the proportion of polymorphic markers in the BPFpred () segregating also in the BPFtrain (Figure S7 in File S1). Therefore, PA for across-family GP depends primarily on the magnitude of because larger implies that a greater proportion of the genetic variance in the BPFpred can be explained by the QTL in BPFtrain. Accordingly, the variation in among combinations of different HSFs or URFs (Figure S1D in File S1) was largely responsible for the notable contribution of to the total variation in PA (Figure 1B). Altogether, the much larger for across-family GP, compared to within-family GP, was mainly due to the overriding influence of besides the considerable contribution of to (Figure 1B, FSF vs. HSF or URF). Unraveling the genetic causes for this complex interaction required additional analyses, which are discussed in depth in the next section. Sampling of training individuals from a given BPFtrain barely contributed to the variation in PA, for both within- and across-family GP (Figure 1B, and ). Thus, compared with structured populations or diversity panels, there is little room for improvement by applying optimization algorithms accounting for genomic relationships in the sampling of training individuals within BPFs (Rincent ; Akdemir ; Bustos-Korts ), confirming previous findings (Lorenz and Smith 2015; Marulanda ). This is because already modest sample sizes (e.g., ) enable the Mendelian sampling term in the BPFtrain to be sufficiently captured. Nevertheless, we recommend to achieve a high mean and small variance of PA (Table S3 in File S1) arising from sampling of genotypes from a given BPFtrain (Figure 1B). Previous experimental studies found generally higher levels of variation in PA, particularly for within-family GP (Riedelsheimer ; Lehermeier ; Lian ). This is most likely attributable to miscellaneous additional factors present in these studies, which were not accounted for in our simulations. These factors include (i) small prediction set size, (ii) analysis of different types of progeny (F2 or backcross generations and DH lines derived from them), (iii) variation in QTL-SNP LD within BPFs due to low marker density, (iv) nonadditive gene action due to epistasis, and (v) estimation error in which affects calculation of PA from predictive ability. Further, the various agronomic traits investigated in the experimental studies differed likely in their genetic architecture, which further increases the total variation in PA compared with the polygenic traits simulated in our study ( Figure 1B). Consequently, our results should be regarded as a lower bound for the variation in PA that must be expected in practice for a given and

Unraveling the variation among traits in across-family GP

We adopted the concept of local breeding values (cf. Kemper ) to investigate the relationship between the strong variation in PA among traits and the large chromosome segments that DH lines of BPF inherit from their parents. The latter entails strong LD between QTL alleles and consequently small (Table 2), which is very different from the situation found in diverse populations such as cattle breeds () (Daetwyler ; Wientjes ). Thus, only a small number of local TBVs contribute to the “global” TBV of predicted individuals. Similarly, the PA can be thought of as the average accuracy of local GEBVs estimated from the training data, weighted by their relative contribution to the global TBV in the BPFpred. As a consequence of the small in BPFs, the accuracy of local GEBVs is prone to much larger sample variance than would be the case in more diverse populations. To illustrate this point, we examined for a given pair of HSFs exemplarily two traits with contrasting PA (Figure 4). Of all QTL, only those that segregated in the BPFpred (376/1, 000, Figure 4) contributed to the variance in local TBVs, which were estimated by local GEBVs from the training set. In our example, trait with showed, on average, much higher correlations between local TBVs and local GEBVs in the BPFpred along the entire genome than trait with (Figure 4A). For the trait with low PA, we found a larger proportion of local GEBVs that provided a false prediction signal, in the sense that negative effects were estimated for favorable parental chromosome segments and vice versa. These discrepancies between local TBVs and local GEBVs trace back to different chromosome segment substitution effects (CSSE, Equation 9) between the BPFpred and BPFtrain (Figure 4A), which, in the case of HSFs, occur if their noncommon parent carries different alleles at one or more QTL on the segment. If this is the case, one of the two BPFs will be monomorphic for the respective QTL. The effect of such a QTL compared with other QTL on a chromosome segment that may be polymorphic in both the BPFpred and BPFtrain determines the difference in CSSE between two families. For instance, if the variance in local TBVs among predicted individuals is dominated by a large-effect QTL, which is monomorphic in the training set, the ranking of local GEBVs based on the other polymorphic QTL located on this segment might deviate substantially from the ranking of local TBVs, resulting in low local PA (Figure 4B, ). The frequency of inaccurate local GEBVs along the whole genome together with the variance explained by the corresponding local TBVs will finally determine the PA of across-family GP. Hence, two traits with the same number and positions of QTL might have very different PA, depending on the effects of QTL that are poly- or monomorphic across the training and prediction set. This explains also why and thereby across-family genomic relationships, were closely associated with the average PA across many traits for different pairs of HSF and URF (Figure S7 in File S1), but poorly associated with PA for individual traits (Figure 3). Additional simulations showed further that reducing (i) the number of chromosomes on which QTL were located, or (ii) the total number of QTL, results in increased variation in PA (Figure S8 in File S1). Both these alterations reduce the number of local TBVs discernible for a trait, which underlines the relevance of small (i.e., a low number of segments carrying QTL) for the variation in PA. In conclusion, the large variation in PA among traits observed for across-family GP is caused by the strong LD among linked QTL within BPFs, and the resulting small effective number of chromosome segments contributing to polygenic traits, in combination with different QTL segregating across BPFs. Our analyses exemplify that BPFs represent a special case regarding the possibly strong fluctuations in PA, which is—to this extent—not expected for genetically more diverse populations.

Influence of LD in the ancestral population on the expected accuracy of GP across BPFs

Differences in the extent of LD in ancestral populations Elite and Landrace (Figure S1A in File S1) translated into sizable differences in QTL-SNP linkage phase similarity among URFs derived from these populations (Figure S1C in File S1). Surprisingly, this barely affected across URFs (Figure 2A and Table S3 in File S1). The low relevance of linkage phase similarity across URFs was confirmed by the similar PAs when substituting the SNP- with a QTL-derived matrix (Figure 2A), which eliminates the influence of this factor. This reflects most likely the overriding influence of on PA across URFs, because the mean was similar for URFs derived from the two ancestral populations (Figure S1D in File S1). Thus, the higher mean in PA for HSFs compared with URFs seems to be attributable to higher values (Table 2) rather than to the fact that QTL-SNP linkage phases are always consistent across HSF (Lehermeier ), but not necessarily across URF. This corrects a conjecture of Riedelsheimer , who suspected that low PA obtained from certain URFs was due to low linkage phase similarity with the respective BPFpred.

Deterministic equations for forecasting PA within and across BPFs

Forecasting PA based on estimated reliabilities of GEBVs requires that unrelated individuals have an expected genomic relationship of zero (Goddard ; Wientjes ). This can be achieved by a block-structured matrix based on population-specific allele frequencies (e.g., Chen ). Preliminary analyses showed that in the calculation of (Equation A5), correct treatment of SNPs polymorphic only in either BPFtrain or in BPFpred is very important. Different from empirical PAs, which remain unaffected by (see Appendix A), deterministic PAs across BPFs can be grossly inflated by ignoring in the calculation of (results not shown). While is generally high across diverse populations such as breeds of cattle (Matukumalli ), it can fall to <0.4 across different BPFs produced from inbred parents in plant breeding (Figure S1D in File S1 and Table 2). Calculating according to our improved method (Equation 1) largely eliminated the bias in deterministic accuracies attributable to and is therefore a prerequisite for applying Equation 3 to GP across BPFs. Accounting for inbreeding (see Appendix B for derivation) in the original reliability equation, resulted together with the modifications on the matrix in excellent agreement between empirical and deterministic accuracies averaged across traits, which is consistent with the findings of Wientjes for cattle populations. However, the trait-dependent variation in empirical PA observed for GP across BPFs cannot be accounted for by This is because for a given set of training and predicted individuals and two traits with the same but different QTL effects, the deterministic accuracy would be identical yet the empirical accuracy can differ substantially as illustrated in Figure 3 and Figure 4. Forecasting PA within FSF by Daetwyler , 2010) equation based on population parameters has been widely used in plant breeding (Lorenz 2013; Riedelsheimer ; Lian ). However, estimates of can differ substantially (Riedelsheimer and Melchinger 2013; Wientjes ) between the various proposed formulas to estimate from the effective population size and genome length (Goddard 2009; Meuwissen and Goddard 2010; Goddard ). Moreover, estimation of itself is problematic, because it assumes a base population of unrelated founders, which is often impossible to define in practice (cf. Figure S1B in File S1, Elite). Following Goddard , we calculated directly from the variance of genomic relationships, with extensions devised by Wientjes , 2016) for GP across populations (Equation 5). This has the advantage that is computed from the actual genotypes for which the PA is to be forecasted. The calculation of required in Equation 4 must account for inbreeding (Equation 6), because the variance in genomic relationships increases with the inbreeding coefficient (see Appendix C). Ignoring inbreeding would result in underestimation of and strong overestimation of the deterministic accuracy An important assumption of the equation of Daetwyler et al. is that the entire genetic variance in the prediction set is explained by QTL segregating in the training set (cf. in Wientjes ). This holds true for FSF (), but is violated for GP across BPFs (Table 2). As a solution for this problem, we propose multiplication with in calculating (Equation 4), which efficiently reduced the strong upward-bias observed otherwise (results not shown). With these modifications, empirical and deterministic accuracies agreed reasonably well when averaged across traits, but forecasting was problematic for individual traits for the same reasons as discussed above for (Figure 3). Compared with previous experimental studies (Riedelsheimer ; Lian ), we found overall better agreement of and for single traits in within-family GP (Figure 3). We suppose that, in addition to the lower variation in empirical PA (Figure 1), this is likely attributable to smaller deviations between estimated and true (Lian ) when dealing with real traits of diverse genetic architecture. An upward bias in deterministic PA must generally be expected if SNPs are not a good approximation of QTL due to incomplete QTL-SNP LD, (cf. vs. in Wientjes ), leading to “missing heritability” in genomic studies (Yang ). This is because empirical PA decreases as less variance at QTL is explained by SNPs under incomplete LD, whereas deterministic PA is hardly affected (Figure S9 in File S1). However, our results show that this is barely relevant in BPFs (Figure 3 vs. Figure S4 in File S1), if large chromosome segments are covered sufficiently by markers. Thus, a sizable reduction in empirical PA and overestimation of deterministic PA must only be expected under very low marker density (<100 SNPs) as in the study of Lian . Although these authors argued that 100 SNPs were likely sufficient for within-family GP in maize, our results indicate that at least 1000 and 2500 SNPs should be used for within- and across-family GP, respectively, to obtain acceptable empirical PA and minimize the bias in deterministic PA (Figure S9 in File S1). If such numbers are not available, deterministic equations must additionally account for incomplete LD (Wientjes ), using, for example, multiplication with the average LD () between adjacent markers as proxy for the QTL-SNP LD (Lian ). Besides low marker density, incomplete QTL-SNP LD can result from differences in the allele frequency distribution at QTL and SNPs (Goddard ), inter alia due to ascertainment bias of SNP chips. These differences are in reality unknown, and, as treated herein, commonly not accounted for in simulation studies (Daetwyler ). For GP across BPFs, differences in allele frequencies at QTL and SNPs in the ancestral population (cf. Figure S1E in File S1) would translate into different values at SNPs and QTL across BPFs, because the smaller the minor allele frequency, the larger the chances of a locus being monomorphic in a BPF. Thus, calculation of might be inflated by an upward-bias in (Equation 5), in addition to the possible overestimation of across-family genomic relationships affecting both and (Equations 3 and 4). Further research is needed to show how strongly overestimation of can affect application of deterministic equations in practice, for example, by comparing the equations under chip-based and sequencing-based genotyping (Pérez-Enciso ). We assumed in our derivations that the genetic correlation among BPFs = 1 (see Appendix B), which is expected to hold under a purely additive-genetic model, as applies in the absence of epistasis to (i) testcross performance for a given tester, and (ii) to per se performance of completely homozygous lines (Melchinger 1987). By comparison, in cattle breeds or diverse germplasm in plant breeding, genetic correlations between populations are typically < 1 (Karoui ; Lehermeier ). Accounting for genetic correlations is possible with multi-group models, but these require sufficient phenotypic data for the predicted population as well as estimating these correlations, which seems impractical in the case of GP of a single BPF. Despite generally promising results for both deterministic equations, we recommend using (Equation 3), because it depended less on the relatedness between BPFs, and (Figures S2 and S3 in File S1), rendering it more robust across a wide range of scenarios. Since and (as implemented here) require genotypic data of both the training and predicted individuals, they can be applied only after obtaining genotypic data of the individuals to be predicted. Alternatively, for newly planned crosses we propose to use computer simulations to generate in silico virtual genotypic data of the corresponding BPFs using known genotypes of the parents and genetic map information of the markers, as conducted in this study (cf. Mohammadi ). This would make both equations accessible prior to generating new crosses for use in optimizing training set designs and allocation of resources.

Conclusions and extensions to multi-family training sets

We demonstrated that the empirical PA in BPFs of inbred lines is prone to various sources of variation, which differ strongly in their relevance for GP within and across BPFs. It should be stressed that the conclusions drawn from our study do not only apply to DH lines, but also to inbreds developed by recurrent selfing and most likely also to partly inbred generations. Overall, our results corroborate within-family GP as a valuable and robust tool for the implementation of GP in plant breeding, provided the training set meets minimum standards for () and (0.3). However, the need for phenotypes from the predicted family represents the main drawback of within-family GP, because this increases both the costs and the time needed until selection can be applied. Our simulations on across-family GP were restricted to the simple strategy of using only a single HSF or URF for model training. This provided a manageable framework for analyzing the underlying causes affecting variation in PA. For a given BPFpred, we showed: (i) the PA in across-family GP expected across many traits differs systematically between different BPFtrain, even if they have the same pedigree relationship with the BPFpred, (ii) deterministic equations enable accurate forecasts of the PA across traits for given pairs of BPFpred and BPFtrain, and (iii) large variation in the PA among traits hampers the forecasting. Therefore, it is very unlikely to find a single BPFtrain that performs uniformly best across all target traits. This means that caution must be exercised when applying rules of thumb or deterministic equations for choosing the BPFtrain in GP of a specific trait given BPFpred. This issue can be even more severe if (i) traits deviate from the polygenic architecture assumed in our simulations, or (ii) in the BPFs is smaller than in maize due to fewer chromosomes and/or smaller genome size (Figure S8 in File S1). Thus, identification of useful, trait-specific BPFtrain might only be possible by directly evaluating the empirical PA for a small sample of individuals from the BPFpred. However, this would largely eliminate the time- and cost-related advantages of genomic selection based on previously available data from BPFs. In practice, breeders generally do not rely on single-family training sets in GP across BPFs, but rather use multi-family training set designs for the sake of increasing sample size (Heffner ; Riedelsheimer ; Hickey ; Jacobson ; Lehermeier ). Another important advantage of multi-family over single-family training sets in across-family GP most likely stems from the increased proportion of causal loci segregating in both the BPFpred and the training set, which we identified as the core problem leading to the large variation of PA in GP across single BPFs. One critical question in this context is whether or not a single BPFtrain that is poorly predictive of a given BPFpred (e.g., a HSFs that yields PA close to zero, Figure 4) is detrimental or harmless for PA if combined together with other predictive BPFs for extending the training set. The problem might exacerbate if URF are included in multi-family training sets (cf. Albrecht ), which might come at the expense of reduced linkage phase similarity (cf. Figure S1C in File S1) between a multifamily training set and the BPFpred (Lorenz and Smith 2015). Further research is warranted to investigate whether the current design of training sets can be improved by identifying and excluding adverse families to avoid disappointing outcomes of GP in BPFs.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.300076/-/DC1. Click here for additional data file. Click here for additional data file.
  39 in total

1.  Accurate prediction of genetic values for complex traits by whole-genome resequencing.

Authors:  Theo Meuwissen; Mike Goddard
Journal:  Genetics       Date:  2010-03-22       Impact factor: 4.562

2.  Efficient methods to compute genomic predictions.

Authors:  P M VanRaden
Journal:  J Dairy Sci       Date:  2008-11       Impact factor: 4.034

3.  Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.

Authors:  Pascal Schopp; Dominik Müller; Frank Technow; Albrecht E Melchinger
Journal:  Genetics       Date:  2016-11-09       Impact factor: 4.562

4.  Genome-based prediction of testcross values in maize.

Authors:  Theresa Albrecht; Valentin Wimmer; Hans-Jürgen Auinger; Malena Erbe; Carsten Knaak; Milena Ouzunova; Henner Simianer; Chris-Carolin Schön
Journal:  Theor Appl Genet       Date:  2011-04-20       Impact factor: 5.699

5.  The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction.

Authors:  Yvonne C J Wientjes; Roel F Veerkamp; Mario P L Calus
Journal:  Genetics       Date:  2012-12-24       Impact factor: 4.562

6.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

7.  Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle.

Authors:  L Chen; F Schenkel; M Vinsky; D H Crews; C Li
Journal:  J Anim Sci       Date:  2013-10       Impact factor: 3.159

8.  Genomic predictability of interconnected biparental maize populations.

Authors:  Christian Riedelsheimer; Jeffrey B Endelman; Michael Stange; Mark E Sorrells; Jean-Luc Jannink; Albrecht E Melchinger
Journal:  Genetics       Date:  2013-03-27       Impact factor: 4.562

9.  Genomic prediction in CIMMYT maize and wheat breeding programs.

Authors:  J Crossa; P Pérez; J Hickey; J Burgueño; L Ornella; J Cerón-Rojas; X Zhang; S Dreisigacker; R Babu; Y Li; D Bonnett; K Mathews
Journal:  Heredity (Edinb)       Date:  2013-04-10       Impact factor: 3.821

10.  Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space.

Authors:  Daniela Bustos-Korts; Marcos Malosetti; Scott Chapman; Ben Biddulph; Fred van Eeuwijk
Journal:  G3 (Bethesda)       Date:  2016-11-08       Impact factor: 3.154

View more
  13 in total

1.  Across-population genomic prediction in grapevine opens up promising prospects for breeding.

Authors:  Charlotte Brault; Vincent Segura; Patrice This; Loïc Le Cunff; Timothée Flutre; Pierre François; Thierry Pons; Jean-Pierre Péros; Agnès Doligez
Journal:  Hortic Res       Date:  2022-02-19       Impact factor: 7.291

2.  Across-years prediction of hybrid performance in maize using genomics.

Authors:  Tobias A Schrag; Wolfgang Schipprack; Albrecht E Melchinger
Journal:  Theor Appl Genet       Date:  2018-11-29       Impact factor: 5.699

3.  The utility of genomic prediction models in evolutionary genetics.

Authors:  Suzanne E McGaugh; Aaron J Lorenz; Lex E Flagel
Journal:  Proc Biol Sci       Date:  2021-08-04       Impact factor: 5.530

4.  Usefulness Criterion and Post-selection Parental Contributions in Multi-parental Crosses: Application to Polygenic Trait Introgression.

Authors:  Antoine Allier; Laurence Moreau; Alain Charcosset; Simon Teyssèdre; Christina Lehermeier
Journal:  G3 (Bethesda)       Date:  2019-05-07       Impact factor: 3.154

5.  The effects of training population design on genomic prediction accuracy in wheat.

Authors:  Stefan McKinnon Edwards; Jaap B Buntjer; Robert Jackson; Alison R Bentley; Jacob Lage; Ed Byrne; Chris Burt; Peter Jack; Simon Berry; Edward Flatman; Bruno Poupard; Stephen Smith; Charlotte Hayes; R Chris Gaynor; Gregor Gorjanc; Phil Howell; Eric Ober; Ian J Mackay; John M Hickey
Journal:  Theor Appl Genet       Date:  2019-03-19       Impact factor: 5.699

6.  Combining pedigree and genomic information to improve prediction quality: an example in sorghum.

Authors:  Julio G Velazco; Marcos Malosetti; Colleen H Hunt; Emma S Mace; David R Jordan; Fred A van Eeuwijk
Journal:  Theor Appl Genet       Date:  2019-04-09       Impact factor: 5.699

7.  Quantitative Genomic Dissection of Soybean Yield Components.

Authors:  Alencar Xavier; Katy M Rainey
Journal:  G3 (Bethesda)       Date:  2020-02-06       Impact factor: 3.154

8.  Adoption and Optimization of Genomic Selection To Sustain Breeding for Apricot Fruit Quality.

Authors:  Mariem Nsibi; Barbara Gouble; Sylvie Bureau; Timothée Flutre; Christopher Sauvage; Jean-Marc Audergon; Jean-Luc Regnard
Journal:  G3 (Bethesda)       Date:  2020-12-03       Impact factor: 3.154

9.  Genomic selection efficiency and a priori estimation of accuracy in a structured dent maize panel.

Authors:  Simon Rio; Tristan Mary-Huard; Laurence Moreau; Alain Charcosset
Journal:  Theor Appl Genet       Date:  2018-10-04       Impact factor: 5.699

10.  Strategies for Effective Use of Genomic Information in Crop Breeding Programs Serving Africa and South Asia.

Authors:  Nicholas Santantonio; Sikiru Adeniyi Atanda; Yoseph Beyene; Rajeev K Varshney; Michael Olsen; Elizabeth Jones; Manish Roorkiwal; Manje Gowda; Chellapilla Bharadwaj; Pooran M Gaur; Xuecai Zhang; Kate Dreher; Claudio Ayala-Hernández; Jose Crossa; Paulino Pérez-Rodríguez; Abhishek Rathore; Star Yanxin Gao; Susan McCouch; Kelly R Robbins
Journal:  Front Plant Sci       Date:  2020-03-27       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.