Literature DB >> 30075705

Estimates of genetic trend for single-step genomic evaluations.

Karin Meyer¹, Bruce Tier², Andrew Swan².

Abstract

BACKGROUND: A common measure employed to evaluate the efficacy of livestock improvement schemes is the genetic trend, which is calculated as the means of predicted breeding values for animals born in successive time periods. This implies that different cohorts refer to the same base population. For genetic evaluation schemes integrating genomic information with records for all animals, genotyped or not, this is often not the case: expected means for pedigree founders are zero whereas values for genotyped animals are expected to sum to zero at the (mean) time corresponding to the frequencies that are used to center marker allele counts when calculating genomic relationships.
METHODS: The paper examines estimates of genetic trends from single-step genomic evaluations. After a review of methods which propose to align pedigree-based and genomic relationship matrices, simulation is used to illustrate the effects of alignments and choice of assumed gene frequencies on trajectories of genetic trends.
RESULTS: The results show that methods available to alleviate differences between the founder populations implied by the two types of relationship matrices perform well; in particular, the meta-founder approach is advantageous. An application to data from routine genetic evaluation of Australian sheep is shown, confirming their effectiveness for practical data.
CONCLUSIONS: Aligning pedigree and genomic relationship matrices for single step genetic evaluation for populations under selection is essential. Fitting meta-founders is an effective and simple method to avoid distortion of estimates of genetic trends.

Entities: Chemical Disease Species

Mesh：

Year: 2018 PMID： 30075705 PMCID： PMC6091173 DOI： 10.1186/s12711-018-0410-1

Source DB: PubMed Journal: Genet Sel Evol ISSN： 0999-193X Impact factor: 4.297

Background

Genetic evaluation based on the use of genomic information has become a routine procedure in numerous livestock improvement schemes. Many employ the so-called single-step procedure for best linear unbiased prediction (SS-GBLUP) which allows for joint evaluation of genotyped and non-genotyped animals [1]; see Legarra et al. [2] for a comprehensive review. The most widely used implementation involves a ‘simple’ extension of the pre-genomic ‘breeding value’ model, which replaces the pedigree-based numerator relationship matrix (NRM) between animals by its counterpart, which combines the genomic relationship matrix (GRM) between genotyped animals with relationships derived from the pedigree. Henceforth, a predicted breeding value that is obtained by using the GRM and SS-GBLUP will be referred to as GEBV while PEBV is used to denote the corresponding value based on the NRM ignoring genotype information, and EBV alludes to both types. A problem inherent to combining genomically-derived and pedigree-based relationships arises due to different conceptual founder populations with potentially different means. For the NRM, the (unknown) parents of the first generation of pedigreed animals are considered to be the unrelated and non-inbred founder animals. Thus, the base generation is determined by the point in time where pedigree recording began. In contrast, genomic relationships are based on ancestral founders many generations back. Combining the NRM and GRM without accounting for these differences can result in biased predictions of breeding values, in particular EBV for genotyped animals may be biased downwards. This is akin to problems that have been encountered earlier on in genetic evaluation of beef cattle when PEBV of imported, superior bulls without appropriate local pedigree ties were found to be severely underestimated because they referenced the wrong, lower base [3]. Several procedures have been suggested to align the NRM and GRM [4-7]. Typically, such modifications have been found to reduce the overdispersion that is often reported for GEBV. However, observed effects on corresponding accuracies are generally small, e.g. [5, 8]. A standard measure, which is routinely computed to demonstrate the efficacy of selection programmes in livestock is the genetic trend. This is usually obtained as the mean EBV of cohorts of animals in a generation or born within a given time period. To date and to our knowledge, there are no studies that have examined estimates of genetic trend in the context of SS-GBLUP. This paper considers the effects of different modifications, which are suggested for the relationship matrices involved in SS-GBLUP, on the estimates of genetic trend. After a review of the methods proposed in the literature, we demonstrate by using simulated data that different ways of centering marker counts or aligning GRM and NRM can substantially affect estimates of trends, especially for populations that are subject to intense selection. This is followed by an application to data from LAMBPLAN, the Australian sheep genetic improvement scheme [9], representing a typical data structure where only relatively few animals have been genotyped so far and where these animals were born in the most recent years.

Review: on the use of relationship matrices in SS-GBLUP

Consider a SS-GBLUP analysis comprised of non-genotyped and genotyped animals, with allele counts for m markers summarized in matrix M of size . Assume standard coding of allele counts as 0 and 2 for homozygotes and 1 for heterozygotes. Let A and G denote the NRM and GRM, respectively, and H the joint relationship matrix. Assume animals are ordered so that A can be partitioned into blocks pertaining to genotyped () and non-genotyped animals () and the relationships between them (). The joint relationship matrix is then [10]with the inverse [11]The genomic relationship is commonly determined as a matrix of sums of squares and crossproducts of the matrix of centered marker counts, possibly with some differential weighting for individual markers. A popular form is Van Raden’s [4] method 1:where is the allele frequency of the i-th marker and P is a matrix comprised of columns with 1 a vector of length with all elements equal to unity. Subscript ‘M’ is used to denote the ‘raw’ GRM as derived from marker information, without any modifications. Other formulations, extend Eq. (3) to weigh contributions from individual markers according to their frequencies (e.g. method 2 of [4] or [12]). Van Raden [4] emphasized that the frequencies should be those in the unselected base a.k.a. founder population. In practice, these are generally unknown and frequencies are commonly determined from the observed genotypes. Another choice is to assume that for all i, equivalent to coding allele counts as and 1. An argument for the latter is that, for random choice of reference alleles, the expectation of is 0.5 [13]. Moreover, this coding is obtained when integrating the likelihood function for the single-step model over the unknown allele frequencies [6]. is often modified in some fashion to ensure that it can ‘safely’ be inverted, to improve alignment between the GRM and NRM or to account for residual polygenic variation. We use to denote the modified matrix with on the right hand side of the following equations representing the matrix to be changed. Since different procedures can be combined, the latter may represent either as given above or from a previous step.

Weighted average of GRM and NRM

A common modification is to ‘shrink’ the GRM towards the corresponding part of the NRM:Often values of close to unity [4] are used to counter-act the problem of not being positive definite when observed allele frequencies are used to center M. Smaller values of are used for analyses where it is deemed necessary to account for residual polygenic variation, i.e. additive genetic variance not explained by the markers, or to limit the influence of genomic information. For instance, values of have been chosen for SS-GBLUP genetic evaluation of Australian sheep [14] and beef cattle [15].

Adjusting the GRM to be compatible with the NRM

Suggestions for aligning with or involve a modification of the form:where denotes a matrix with all elements equal to unity. Factors and can be estimated by least-squares regression [4] or determined by equating the means of the elements of the two matrices, G and , and the means of their diagonals [6]. The latter may seem heuristic, but can be thought of as enforcing equality of a sample covariance matrix and its expectation for both matrices [7]. Legarra et al. [2] interpreted as ‘overall relationship’ and as change in scale or genetic variance due to drift or selection. This gives:where 1 is a vector with all elements equal to unity and ‘’ denotes the matrix trace operator. Similarly, Vitezica et al. [5] proposed values ofThis yields a value of which is equal to the mean difference between and G, so that the means of elements of the modified GRM and the corresponding part of the NRM are equal. Several studies recognized that adding a multiple of J to the GRM shifts GEBV for genotyped animals by a constant, i.e. is inconsequential for analyses which do not include individuals without genotypes. For instance, Stranden and Christensen [16] showed that allele coding did not affect relative differences between predicted genomic breeding values, provided the model included a fixed mean effect. Comparing different additions to the GRM, Tier et al. [17] demonstrated that adding very different multiples of J yielded analogous results. Vitezica et al. [5] emphasized that replacing G by implies fitting a mean term assumed to have variance (with the additive genetic variance), arguing that the mean of random breeding values (of genotyped animals) should also be a random effect. The authors further showed that this is an equivalent model to fitting a single genetic group for genotyped animals explicitly with group proportion for their non-genotyped relatives given by ‘pedigree regression’, . Similarly, Fernando et al. [18] proposed to fit a (fixed) mean for genotyped animals in SS-GBLUP implementations fitting a marker effect or hybrid model (rather than the breeding value model) to account for inappropriate centering of allele counts or imputation error. A simulation study considering such models for populations under selection, confirmed that estimates for represented the mean GEBV of genotyped individuals when observed genotypes were centered by their mean frequencies [19]. Moreover, Vitezica et al. [5] suggested that could be interpreted as twice the so-called or fixation index, which gives the average relationships of gametes for a given base population. They pointed out that the based adjustment to change base population described by Powell et al. [20] translates to a modification of with as above [see Eq. (5)] and . However, please note that, depending on the choice of values for and , is not guaranteed to be positive definite. Interpretation of as a variance implies a positive value. The adjustments of form of Eq. (5) were proposed for the scenario in which markers were centered using their observed frequencies—different choices for P could readily yield elements of much larger than of and thus a negative estimate for or an invalid , and should not be used naively.

Modifying the NRM to match the GRM

An alternative is to scale the NRM to be similar to the GRM, so as to account for ancestral relationships that are captured by genomic information but not the pedigree. This is similar to earlier attempts to account for prior inbreeding in genetic evaluation; see Van Raden [21]. Let represent the degree of ‘self-relationship’ among the base animals in the pedigree. Christensen [6] then proposed to replace A with:Using the Sherman–Morrison matrix identity, gives the inverse:as , with F denoting the number of founders in the pedigree. This modification is of the same form as the based adjustment of G [20]. Indeed, Garcia-Baccino et al. [13] showed that can also be interpreted as twice the index. Hence, it attempts to change the base population for pedigreed individuals. Legarra et al. [7] subsequently demonstrated that the same adjustment can be obtained by augmenting the pedigree with a so-called meta-founder, a conceptual parent which replaces the unknown parents of founder animals in the pedigree, acting as both sire and dam. This framework is attractive as it allows for computation of the terms required to build , specifically and the submatrix of for genotyped animals, , with minor modifications of commonly used existing algorithms for these tasks. Moreover, multiple base populations are readily accommodated by allowing for separate metafounders and replacing the scalar with a positive definite matrix with diagonal elements equal to the individual self-relationships and off-diagonal elements comprised of across population relationships. This makes it suitable for the analysis of crossbred populations; see, for instance, [22] for an application. Alternatively, metafounders can be thought of as a generalisation of the ‘phantom parents’ to model genetic groups for unknown parents for genetic evaluation using pedigree information [23]. Let and denote the equivalents to and for multiple meta-founders. Both [6] and [7] presented algorithms to evaluate the terms required to set up , or and or recursively, extending the procedures of Quaas [24] and Colleau [25]. Similarly, both described likelihood based approaches to estimate or . Centering marker counts by and with denoting the centered matrix of allele counts, this requires maximising:with respect to the elements of , withVariable s represents a measure of heterozygosity, similar to above. While estimation of from Eq. (11) involves numerical optimisation, Eq. (12) is of closed form and s can be obtained directly. Alternatively, and s can be estimated based on summary statistics. For a single metafounder [7],andSubsequently, Garcia-Baccino et al. [13] described generalised least squares estimators for or . For a single metafounder,where denotes the vector of uncentered allele counts for the i-th locus (). An estimate of is then obtained as twice the variance of across loci. Furthermore, [13] outlined corresponding maximum likelihood schemes, based on the assumption that the are normally and independently distributed. The authors presented a simulation study for a single metafounder, reporting that both methods estimated accurately while the summary statistics based approach tended to yield overestimates. As noted by Strandén et al. [26], quantities given in Eq. (15) are estimates of (twice) the founder allele frequencies as proposed by McPeek et al. [27]. These are readily calculated for large numbers of genotyped animals, using:where are the submatrices of corresponding to the partitioning of for genotyped and non-genotyped individuals. Hence, in Eq. (15) can be obtained by using sparse matrix calculations without the need to invert a large matrix, requiring the factorisation of instead [26].

‘Correction factors’ in building

In addition, it has been suggested to weigh and differently when constructing , i.e.or similar. Suitable values for and were commonly determined experimentally by evaluating their effect on the inflation of genomic breeding values. Reduction in bias for values different from unity with little effect on accuracy have been reported for dairy cattle genetic evaluation [28, 29]. In particular, reducing the weight on appeared advantageous () by reducing the effects of a high proportion of incomplete pedigrees [30]. Garcia-Baccino et al. [13] emphasized that the metafounder approach would act in a similar fashion, albeit with a theoretically justified basis. Martini et al. [31] showed that weighting of and as in Eq. (17) is equivalent to replacing the diagonal block for genotyped animals in H [see Eq. (1) above] with:which can be thought of as a weighted harmonic mean of and . Note that, depending on the choice of and , again is not guaranteed to be positive definite.

Methods

Simulation

To examine the effects of different methods of aligning and , data were simulated using the software package QMSim [32] adopting the scenario used by [5] and [13] to mimic a livestock population under selection (Parameter file available at http://github.com/alegarra/metafounder); see their papers for details. This was modified slightly to consider a trait recorded on both sexes and by reducing the number of markers. In brief, this yielded records and genotypes comprised of allele counts for 46,500 loci for 2800 unselected animals with unknown parents in generation 0 and 2600 animals in each of 10 overlapping generations, 28,800 in total. Records were sampled for a trait with a phenotypic variance of 10 and heritability of 0.3. Parents of the next generation were selected based on their breeding values, obtained by BLUP using the (pedigree) numerator relationship matrix. Selection involved replacement of 80 out of 200 sires and 520 out of 2600 dams in each generation. A total of 25 replicates were obtained and analysed. Data were analysed by considering successive subsets of genotypes in generations i through to 10, for and 10. Markers with minor allele frequencies lower than 0.02 were disregarded. In addition, a ‘pedigree BLUP’ analysis was carried out, ignoring all genotypes. Records and pedigree information for all generations were used throughout. Restricted maximum likelihood analyses to estimate genetic and residual variances were carried out by fitting a linear model with an overall mean as the only fixed effect and animals’ genetic merit as random effects, obtaining predicted breeding values at convergence. The inverse of the joint relationship matrix between genotyped and non-genotyped animals was built in three ways. First, was constructed without any attempt to align and , referred to as model M0. Second, for model MG the GRM was augmented by as proposed by Vitezica et al. [5]. Preliminary analyses had shown negligible differences in results for MG to corresponding analyses using the forms of adjustment suggested by [6] or [20], and the latter were thus not examined any further. Lastly for model MA, and in [see Eq. (2)] were replaced with and its submatrix for genotyped individuals, , with as for M0. This was done by augmenting the mixed model with a single meta-founder. The degree of self-relationship, , for each subset of genotypes considered was estimated using the generalised least-squares procedure described by Garcia-Baccino et al. [13] and and were built following Legarra et al. [7]. In turn, was constructed using up to four different sets of allele frequencies to center M. The first, denoted as all, was calculated using all observed genotypes in the subset, as is common practice. Second, only the genotypes for the first generation available were used, yielding case 1st. For comparison, the third scheme, fou, considered the founder frequencies, i.e. for individuals in generation 0. Finally, all frequencies were assumed to be equal to 0.5 (half). No weighted averaging between and was performed, i.e. was used. Instead, safely positive definite matrices were ensured by augmenting their diagonal elements with a constant of 0.05. The examined summary statistics included mean EBV for each generation and coefficients for the regression of true on predicted breeding values for individuals in generation 10. In addition, mean GEBV per generation for each genomic analysis were deviated from the corresponding PEBV values, after standardising the ‘location’ of curves by subtracting the mean GEBV or PEBV for generation 0 from the respective means for all generations. A measure of discrepancy between estimates of genetic trends for different analyses was then calculated as the Frobenius norm of the resulting vector. Standardisation ensured that this quantity reflected only differences in shape of the trajectories. Analyses were carried out using our mixed model software WOMBAT, building by using a recently added module to carry out the associated calculations [33, 34].

Application

Methods were also tested for Australian sheep data, using records from the LAMBPLAN [9] maternal breeds genetic evaluation for the trait ‘number of lambs born in one year old ewes’ (ynlb), defined as number of lambs born per ewe mated. A total of 19,564 ynlb records were collected on ewes born between 2007 and 2016. Of the ewes with records, 905 were genotyped. In addition, there were 275 animals in the pedigree with genotypes available but no records, mostly sires. This yielded a total of 1180 genotypes included in the SS-GBLUP analyses. Genotypes were comprised of marker allele counts from either 12 or 50 K ovine SNP chips (Illumina Inc., San Diego, CA, USA), with 12 K genotypes imputed to 50 K. Table 1 summarizes numbers of records and numbers of genotyped ewes per year of birth, showing that genotyped animals were concentrated in the more recent data. The pedigree records available included 34,947 animals, extending back to animals born in the late 1980’s. However only animals born from the year 2000 onwards were considered in the calculation of genetic trends.

Table 1

Distribution of numbers of records and ewes with genotypes and phenotypes across years of birth

Year	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
No. records	953	835	1273	1000	1819	1535	2490	2674	3829	3156
No. genotypes	–	–	–	–	89	147	37	118	514	–

Distribution of numbers of records and ewes with genotypes and phenotypes across years of birth Data for ynlb were analysed using WOMBAT which involved fitting single trait animal models with a fixed effect for contemporary group (250 levels) and an additional random effect for service sire (291 levels). The genetic effect was again fitted either without genotypes as ‘pedigree BLUP’, or with genotypes as SS-GBLUP fitting as described above: for M0, no attempt was made to align with , for MG, was augmented with , and for MA, the meta-founder approach with estimated from the data was used. To construct , all observed genotypes (all) were used to calculate allele frequencies. In addition, analyses were repeated assuming frequencies of 0.5 (half) for MA only. For M0, two values of the parameter (see Eq. (4)) were used to compute a weighted average of and , 1 as above and 0.5 as used in routine SS-GBLUP evaluations for Australian sheep [14].

Results

Estimates of genetic trend[a] for a single replicate for different scaling methods and allele frequencies[b][c]. [a]Simulated data, [b]Allele frequencies: ped ignoring genotypes (pedigree analysis), 1st using observed frequencies in first generation available, all using observed frequencies from all genotypes, and half assuming frequencies of 0.5, [c]Considering genotypes in selected generations: 0 to 10, 4 to 10, 7 to 10 and 10 only Estimates of genetic trend for different analyses and amounts of genotypes available are shown in Fig. 1 for a single replicate (results for the subset comprising generations 2 to 10 were omitted). This is representative of the typical pattern observed for all replicates. All panels show mean PEBV as a reference. The latter were virtually identical to corresponding true means, scaled to zero for generation 0 (not shown). Considering genotypes for generation 10 only, allele centering strategies all and 1st are the same and only all is shown. For model MG and 1st, the estimate of when considering all genotypes was negative and was not positive definite, causing the analysis to fail.

Fig. 1

Without modifications of or , using observed frequencies (all) to center allele counts, mean GEBV were shifted downwards. The resulting trend curve was parallel to the corresponding curve of pedigree means when all animals were genotyped, i.e. individuals were ranked correctly and, without the need to align GEBV for genotyped and non-genotyped animals, the shift is inconsequential. Indeed, in practical evaluation schemes, estimates are often scaled to a selected, fixed base, i.e. the ‘shape’ rather than ‘location’ of estimated trend trajectories is important. As fewer generations of genotypes were considered, the discrepancy between curves increased. For this method of centering, the mean of the genotyped animals used to determine the allele frequencies is forced to be zero [4]. As especially evident when considering genotypes in generation 10 only, this can lead to a marked distortion of the estimated trend. Conversely, as expected, using founder frequencies (generation 0, fou) resulted in trend estimates that are indistinguishable from the pedigree values for all subsets of genotypes (not shown). Consequently, using only the first generation of observed genotypes (1st) to estimate allele frequencies resulted in less biased estimates of trends but again altered the shape of the trajectory by forcing the mean GEBV for that generation to be zero. Similarly, assuming gene frequencies of 0.5 yielded mean GEBV per generation that are reasonably close to the pedigree values. In part, this may be attributable to the fact that the mean allele frequency (across loci and replicates) in generation 0 was 0.48, i.e close to 0.5. However, whereas the average proportion of loci with frequencies in the middle deciles (0.4–0.6) was 24%, 13% of markers had frequencies in the extreme deciles ( or ). Estimates of for MG for all increased with the first generation number for which genotypes were considered: values were 0.027, 0.041, 0.071 and 0.111 for genotypes in generations 2 to 10, 4 to 10, 7 to 10 and 10 only, respectively (if all genotypes were considered, the estimate was close to zero, 0.018, i.e. there was virtually no modification, as for M0). This yielded mean GEBV very close to mean PEBV when only genotypes in the last or last few generations were used, but less close agreement otherwise. The simulation involved strong selection and associated sizeable changes in allele frequencies over generations. While corrects for changes in mean due to selection or drift, it does not allow for the accompanying reduction in genetic variance from the conceptual base population [2]. Results suggest that estimates of a global for all generations may not be sufficient if many generations are genotyped. Centering using fou or half for MG was not considered as these resulted in negative estimates of and thus non-positive definite matrices . Finally, modifying to align with yielded parallel curves for all four types of centering. Estimates of decreased slightly as fewer generations of genotypes were considered, 0.548, 0.543, 0.540, 0.539 and 0.537 for generations 0 to 10 to generation 10 only. As above, using all observed genotypes (all) to construct resulted in the largest shift, with the mean GEBV for genotyped animals forced to zero. However, fitting a metafounder yielded an estimate of the shift. For instance, for all these were − 3.50 (0–10), − 4.34 (2–10), − 5.16 (4–10), − 6.34 (7–10) and − 7.56 (10 only). Subtracting these estimates from the corresponding mean GEBV then shifted the trajectories to be superimposed on the curve for mean PEBV. This held for all types of frequencies. Assuming frequencies of 0.5, GEBV for the metafounder were expected to be zero, i.e. GEBV should have been aligned correctly. In practice, there were small deviations. This may, at least in part, be attributed to sampling or other errors in the estimate of . Christensen [6] emphasized that, strictly speaking, should be estimated using observed phenotypes as well as genotypes, and reported slight overestimates when ignoring phenotypes. Selected means () and standard deviations (SD) across replicates for the norm of the vector of mean breeding values per generation deviated from ‘pedigree only’ values and for the regression coefficients of true on predicted breeding value for animals in generation 10 Model M0: No alignment between GRM and NRM, MG: Modifying the GRM by adding , and MA: Extending the NRM to include a meta-founder with self-relationship Frequencies used to center marker allele counts—all: using all genotypes in the subset, 1st: using only genotypes in the first generation available, fou: using founder frequencies (generation 0), and half: assuming frequencies of value 0.5 throughout There was little variation in results over replicates. Table 2 summarises selected means of the Frobenius norm of the vector of differences between mean EBV from pedigree and genomic analyses and their standard deviations across replicates. Mean Frobenius norms confirm the observations on a single replicate above: if founder allele frequencies were known and used to construct no alignments would be needed. In the absence thereof, fitting a metafounder and using the resulting estimates of its effect to account for the shift in GEBV due to selection yields comparable results. We also provide the corresponding statistics for the regression of true on predicted breeding values for individuals in the last generation. Mean regression coefficients were close to the theoretical value of unity when all animals were genotyped, using founder frequencies to center allele counts or fitting a metafounder. For M0 with frequencies half and MG with all, some under-dispersion of GEBV (regression coefficients ) was apparent when only individuals in the last few generations were genotyped. This was not reflected in the corresponding mean norm values. For pedigree only analyses, the mean regression was 0.984 with standard deviation of 0.034.

Table 2

Selected means () and standard deviations (SD) across replicates for the norm of the vector of mean breeding values per generation deviated from ‘pedigree only’ values and for the regression coefficients of true on predicted breeding value for animals in generation 10

M. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm{{a}}}$$\end{document}a	Fr. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm{{b}}}$$\end{document}b	Genotypes in generations
		0–10		2–10		4–10		7–10		10 Only
		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{x}$$\end{document}x¯	SD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{x}$$\end{document}x¯	SD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{x}$$\end{document}x¯	SD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{x}$$\end{document}x¯	SD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{x}$$\end{document}x¯	SD
Norm for vector of mean breeding values
M0	all	0.10	0.04	1.19	0.03	1.39	0.07	1.68	0.12	2.53	0.09
M0	1st	0.10	0.04	0.41	0.03	1.27	0.08	1.82	0.13	2.53	0.09
M0	fou	0.10	0.04	0.07	0.01	0.06	0.02	0.05	0.02	0.04	0.01
M0	half	0.09	0.04	0.60	0.02	0.65	0.04	0.44	0.05	0.13	0.01
MG	all	0.10	0.04	0.41	0.04	0.47	0.05	0.32	0.03	0.09	0.01
MA	half	0.09	0.04	0.09	0.02	0.09	0.02	0.06	0.02	0.01	0.01
Regression of true on predicted breeding values
M0	all	0.992	0.020	0.883	0.021	0.841	0.028	0.784	0.033	0.728	0.035
M0	1st	0.992	0.020	0.979	0.020	0.921	0.020	0.806	0.026	0.728	0.035
M0	fou	0.992	0.020	0.991	0.021	0.989	0.021	0.980	0.021	0.944	0.023
M0	half	0.989	0.020	1.031	0.023	1.058	0.026	1.114	0.038	1.274	0.055
MG	all	0.992	0.020	0.998	0.023	1.017	0.025	1.049	0.031	1.109	0.041
MA	half	0.989	0.020	0.990	0.021	0.993	0.021	0.999	0.023	1.001	0.027

Model M0: No alignment between GRM and NRM, MG: Modifying the GRM by adding , and MA: Extending the NRM to include a meta-founder with self-relationship

Frequencies used to center marker allele counts—all: using all genotypes in the subset, 1st: using only genotypes in the first generation available, fou: using founder frequencies (generation 0), and half: assuming frequencies of value 0.5 throughout

Simulations considered the scenario where all animals in a generation were either genotyped or not. In practice, this is unlikely. Simulations were thus repeated deleting genotype information for every third animal (not shown). For case M0 using observed frequencies, either all or 1st, reduced regression coefficients further, the more so the fewer genotyped generations were considered. For instance, for all and genotypes in generation 10 only, the regression of true on predicted breeding values dropped to 0.55. Whereas standard deviations across replicates increased slightly, regression coefficients for the other cases differed little from the values given in Table 2, i.e. they remained close to unity. Corresponding mean Frobenius norms (not shown) were somewhat smaller as fewer individuals were genotyped. Estimates of genetic trend[a] for different methods of scaling relationship matrices and assumed allele frequencies[b]. [a]Sheep data for trait ‘number of lambs born in one year old ewes’, [b]Allele frequencies: ped ignoring genotypes (pedigree analysis) all_0.5 using observed frequencies (all genotypes) for , all_1 using observed frequencies (all genotypes) for , and half assuming frequencies of 0.5 Estimates of genetic trend for different analyses of the sheep data are shown in Fig. 2. Corresponding values of the Frobenius norm for deviations from ‘pedigree BLUP’ results are summarized in Table 3, together with estimates of heritabilities and phenotypic variances.

Fig. 2

Estimates of genetic trend[a] for different methods of scaling relationship matrices and assumed allele frequencies[b]. [a]Sheep data for trait ‘number of lambs born in one year old ewes’, [b]Allele frequencies: ped ignoring genotypes (pedigree analysis) all_0.5 using observed frequencies (all genotypes) for , all_1 using observed frequencies (all genotypes) for , and half assuming frequencies of 0.5

Table 3

Estimated heritabilities and norm of the vector of mean breeding values per generation deviated from ‘pedigree only’ values for different SS-GBLUP analyses

Model\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm{{a}}}$$\end{document}a	λ	Freq.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm{{b}}}$$\end{document}b	h ^2c	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _P^{2}$$\end{document}σP2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm{{d}}}$$\end{document}d	Norm
M0	1.0	all	0.08	0.56	0.185
	0.5	all	0.08	0.56	0.049
MG	1.0	all	0.09	0.56	0.049
MA	1.0	all	0.11	0.57	0.031
	1.0	half	0.09	0.58	0.025

M0: No alignment between GRM and NRM, MG: Modifying the GRM by adding , and MA: Extending the NRM to include a meta-founder with self-relationship

Frequencies to center marker allele counts – all: using all genotypes in the subset and half: assuming frequencies of value 0.5 throughout

Heritability

Phenotypic variance

Estimated heritabilities and norm of the vector of mean breeding values per generation deviated from ‘pedigree only’ values for different SS-GBLUP analyses M0: No alignment between GRM and NRM, MG: Modifying the GRM by adding , and MA: Extending the NRM to include a meta-founder with self-relationship Frequencies to center marker allele counts – all: using all genotypes in the subset and half: assuming frequencies of value 0.5 throughout Heritability Phenotypic variance As for the simulated data, estimates of genetic trend for SS-GBLUP without any attempts to align and (top panel) differed substantially from the pedigree-based estimates, especially for the years with genotyped animals. Reducing the influence of genomic information by replacing the GRM with the average of and () reduced differences markedly. Modifying as suggested by Vitezica et al. [5] yielded a GEBV trajectory which was mostly parallel to that for pedigree only analyses (middle panel; considering only), i.e. of the correct shape but with some shift in location evident (for an estimate of ). Results for analyses fitting a meta-founder shown in Fig. 2 are mean GEBV adjusted for the estimate of the meta-founder effect (in contrast to corresponding results in Fig. 1 which are mean GEBV prior to adjustment for the meta-founder). The predicted value for the meta-founder was − 0.077 for and an estimate of . Hence, estimated trends for all analyses agreed well, although there was a tendency for mean GEBV in the last few years to be slightly lower than corresponding mean PEBV. Presumably this is, at least partially, a reflection of the small number of genotypes available and resulting sampling errors. Estimates of heritability and phenotypic variances varied little between analyses, with the estimated variance ratio for the service sire effect equal to 0.04 throughout. Values for the discrepancy in genetic trends between SS-GBLUP and pedigree BLUP analyses reflected observations for Fig. 2, with small numerical values which are an artifact of low phenotypic variances.

Discussion

Joint genetic evaluation of genotyped and non-genotyped animals in a single-step analysis has become a standard procedure. Results clearly illustrate that care has to be taken to model different means and conceptual founder populations appropriately, especially for populations under selection. In particular, estimates of genetic trends are easily distorted and can differ with the assumptions on gene frequencies or the way they are estimated. As reviewed briefly, various methods have been suggested to combine or align the pedigree-based and genomically-derived relationship matrices or scale selected components of to improve the accuracy or reduce the bias in GEBV from SS-GBLUP analyses. Moreover, most of these are easy to apply. Current livestock improvement schemes typically have genotype information for the most recent generation(s) of animals only, especially for the more extensive industries such as sheep and beef cattle. Results show that both scaling of to align with or, vice versa, scaling to match are effective for this scenario, with estimates of genetic trends in good agreement with true values (simulation) or results from pedigree only analyses. However, when more generations of genotypes are considered, estimated trajectories tend to be shifted somewhat (although still of the correct shape), especially when centering marker allele counts using all observed genotypes to estimate gene frequencies. In this case, the meta-founder approach has an immediate advantage: the GEBV for the meta-founder provides an estimate for the shift in GEBV—and adjusting for it yields the correct location for the curve. The expectation for the latter is a mean of zero for the founder generation. For practical applications where models of analysis include many fixed effects (as in our sheep example) this may differ somewhat if fixed effects and animals are sufficiently confounded so that fixed effects remove some of the trend. Analogously, the shift may be estimated for MG by fitting an equivalent model including a genetic group effect [5], but this was not considered. Moreover, for the meta-founder approach coefficients for the regression of true on predicted breeding values (simulation) were essentially equal to unity, while adding to resulted in slight deflation (coefficients ) when only genotypes in recent generations were considered. Adjusting to align with makes the genomic base of the reference population. The parameter , estimated from the genomic information and ranging from 0 to 2, can be interpreted as the degree of homozygosity among the pedigree founders that would yield observed relationships closest to those in , where is obtained assuming allele frequencies equal to 0.5. In other words, refers to a conceptual, genomic base with maximum variability for all loci [35].

Conclusions

Alignment of pedigree-based and genomic relationship matrices for single-step genetic evaluation of populations under selection is essential. Making the pedigree based relationship to be compatible with genomic information by fitting meta-founders is simple and effective.

22 in total

1. QMSim: a large-scale genome simulator for livestock.

Authors: Mehdi Sargolzaei; Flavio S Schenkel
Journal: Bioinformatics Date: 2009-01-28 Impact factor: 6.937

2. Efficient methods to compute genomic predictions.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2008-11 Impact factor: 4.034

3. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.

Authors: I Misztal; A Legarra; I Aguilar
Journal: J Dairy Sci Date: 2009-09 Impact factor: 4.034

4. A relationship matrix including full pedigree and genomic information.

Authors: A Legarra; I Aguilar; I Misztal
Journal: J Dairy Sci Date: 2009-09 Impact factor: 4.034

5. Effect of different genomic relationship matrices on accuracy and scale.

Authors: C Y Chen; I Misztal; I Aguilar; A Legarra; W M Muir
Journal: J Anim Sci Date: 2011-03-31 Impact factor: 3.159

6. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

7. Allele coding in genomic evaluation.

Authors: Ismo Strandén; Ole F Christensen
Journal: Genet Sel Evol Date: 2011-06-26 Impact factor: 4.297

8. An indirect approach to the extensive calculation of relationship coefficients.

Authors: Jean-Jacques Colleau
Journal: Genet Sel Evol Date: 2002 Jul-Aug Impact factor: 4.297

9. Metafounders are related to F _st fixation indices and reduce bias in single-step genomic evaluations.

Authors: Carolina A Garcia-Baccino; Andres Legarra; Ole F Christensen; Ignacy Misztal; Ivan Pocrnic; Zulma G Vitezica; Rodolfo J C Cantet
Journal: Genet Sel Evol Date: 2017-03-10 Impact factor: 4.297

10. A fast indirect method to compute functions of genomic relationships concerning genotyped and ungenotyped individuals, for diversity management.

Authors: Jean-Jacques Colleau; Isabelle Palhière; Silvia T Rodríguez-Ramilo; Andres Legarra
Journal: Genet Sel Evol Date: 2017-12-01 Impact factor: 4.297

4 in total

4. Variance estimates are similar using pedigree or genomic relationships with or without the use of metafounders or the algorithm for proven and young animals1.

Authors: Michael N Aldridge; Jérémie Vandenplas; Rob Bergsma; Mario P L Calus
Journal: J Anim Sci Date: 2020-03-01 Impact factor: 3.159