Literature DB >> 29653506

The effect of the H^-1 scaling factors τ and ω on the structure of H in the single-step procedure.

Johannes W R Martini¹, Matias F Schrauf², Carolina A Garcia-Baccino², Eduardo C G Pimentel³, Sebastian Munilla^2,4, Andres Rogberg-Muñoz^2,5, Rodolfo J C Cantet^2,6, Christian Reimer⁷, Ning Gao^7,8, Valentin Wimmer⁹, Henner Simianer⁷.

Abstract

BACKGROUND: The single-step covariance matrix H combines the pedigree-based relationship matrix [Formula: see text] with the more accurate information on realized relatedness of genotyped individuals represented by the genomic relationship matrix [Formula: see text]. In particular, to improve convergence behavior of iterative approaches and to reduce inflation, two weights [Formula: see text] and [Formula: see text] have been introduced in the definition of [Formula: see text], which blend the inverse of a part of [Formula: see text] with the inverse of [Formula: see text]. Since the definition of this blending is based on the equation describing [Formula: see text], its impact on the structure of [Formula: see text] is not obvious. In a joint discussion, we considered the question of the shape of [Formula: see text] for non-trivial [Formula: see text] and [Formula: see text].
RESULTS: Here, we present the general matrix [Formula: see text] as a function of these parameters and discuss its structure and properties. Moreover, we screen for optimal values of [Formula: see text] and [Formula: see text] with respect to predictive ability, inflation and iterations up to convergence on a well investigated, publicly available wheat data set.
CONCLUSION: Our results may help the reader to develop a better understanding for the effects of changes of [Formula: see text] and [Formula: see text] on the covariance model. In particular, we give theoretical arguments that as a general tendency, inflation will be reduced by increasing [Formula: see text] or by decreasing [Formula: see text].

Entities: Chemical Disease Species

Mesh：

Year: 2018 PMID： 29653506 PMCID： PMC5899415 DOI： 10.1186/s12711-018-0386-x

Source DB: PubMed Journal: Genet Sel Evol ISSN： 0999-193X Impact factor: 4.297

Background

A genomic relationship matrix provides information on the realized relatedness of individuals but requires genotyping, which increases the costs of breeding programs. Thus, breeders are often confronted with the situation that not all individuals for which expected relatedness can be derived from the pedigree are genotyped. The single-step approach [1-3] is a practical way to combine these two different sources of information—the pedigree relationship matrix and the genomic relationship matrix —in one matrix . This relationship matrix relates all individuals as does , but incorporates the more accurate information provided by . Here, the central concept is to substitute entries of by the corresponding entries of and to adapt the remaining relationships accordingly. In more detail, the matrix is defined byHere, the individuals are divided into two groups: Group 1 contains the individuals whose genotype is not available and Group 2 consists of the genotyped individuals. Thus, denotes the entries of that provide the relationships within Group 1, and the relationships between the individuals of the two groups, and the pedigree relationships within Group 2. Moreover, denotes the inverse of , which is not in general identical to the bottom-left block of , i.e. . With this definition, we have substituted the inner group pedigree relationship of Group 2 by the genomic relationship, which means . The terms adapt the relationships within Group 1 and the relationships between individuals of the two groups according to the changed relationships within Group 2 to generate a positive semi-definite, valid covariance structure (this transfer of information can be also interpreted in terms of imputation [4, 5]). Since many applications use the inverse of a relationship matrix, Eq. (1) is usually written on the level of its inverse (see [3] and equations 18, 19 of [6]):Based on this setup, several previous papers have discussed the question of how to combine and optimally. In this context, approaches which have been followed adapt to [7, 8] or conversely to [8-10]. Moreover, two scaling factors and have been introduced [11, 12]:The main purposes of the introduction of these parameters were to ensure convergence of iterative approaches that address the mixed models [11], and to reduce inflation of predictions [13]. Compared to methods based on or alone, these issues have been assumed to be enhanced by inconsistencies between and [14], and this blending is one possibility among several to approach the problem [15]. Equation (3) is defined on the level of , but the effect of the introduction of and on the shape of is not obvious. In particular, breeders aiming at implementing the single-step method in breeding programs raised the question of how these parameters affect the relationship model . Here, we present in a general form, as a matrix dependent on and and discuss some of its properties. Moreover, we provide arguments for a reduction in inflation of predicted breeding values being expected when increases or when decreases. Finally, to set a contrast to the widely used cattle data [12, 13, 16, 17], we screened for optimal values of and with respect to predictive ability, inflation and iterations to convergence on a well investigated, publicly available wheat data set [18]. Our results may help to develop an understanding for the effects on the covariance model when these parameters are changed. In particular, this may be of interest for people who aim at implementing the single-step method with non-trivial parameters and in practical breeding programs.

and some particular choices of and

We will first describe and discuss some special cases. Mathematical arguments for the presented statements are provided in the “Appendix”. If an inverse of a matrix is used, the implicit assumption on invertibility is made (also if not mentioned explicitly). In particular, is considered invertible on account of its construction from the pedigree (granted clones are absent) [19].

Central statement

The inverse of defined by Eq. (3) iswith The structure of Eq. (4) is identical to that of Eq. (1), but with substituted by Eq. (5). Considering , we see that the parameterization of the weights and is “reverse” in the sense that and appear with opposite signs in front of them. In particular, this implies that is not necessarily positive semi-definite when since this leads to a negative factor for and thus has to be compensated by to give a positive semi-definite matrix. However, positive semi-definiteness of is guaranteed, if and are positive definite and and , but not both at their boundary, that is not and at the same time.

Lemma 1

Let and be positive definite and let and , but not . Then is positive semi-definite. Note that due to the “reverse parameterization” in form of weights and in Eq. (5), the sets of parameter values, which guarantee positive semi-definiteness of the single-step matrix , are distinct. If both and are positive, then positive semi-definiteness of is guaranteed. In particular, this also means that a negative gives a valid covariance model. Thus, a grid to test combinations would be rather within than , which has often been the frame for the choice of parameters [13, 16, 17]. In the following, we will discuss special choices of and . If , we are dealing with the original single-step method of Eq. (2). If , then and thus . If , then If , then . If , then Case (i) is already obvious on the level of , but it can also be seen on the level of that Eq. (4) coincides in this case with Eq. (1), since . If instead as for case (ii) then and the single-step BLUP becomes the traditional pedigree-BLUP. Also note that case (iii), for which and are equal, has already been addressed in [3] and results in a weighted harmonic mean of and . In case (iv) in which is equal to 1, . With increasing , the entries of , , will shrink towards and to the Schur complement . In case (v), we see that if we fix , is not significantly simplified. Moreover, since the weighted sum of and is inverted in , the factor may also introduce a weight on the entries of . We see that this is indeed the case with the following example. Choosing ,In this example, the non-diagonal elements of are 0, but the non-diagonal elements of deviate from the corresponding entries of which are equal to 1. Thus, cannot be interpreted as being only a weight of the pedigree contribution to the covariance .

The effect of and on inflation

A main purpose of the introduction of these parameters is the reduction of inflation of the predicted breeding values [13, 16] which is manifested and diagnosed by a slope in a regression of observed values (y-axis) on predictions (x-axis). Please recall here that the regression of observed values on predictions should be preferred to a regression of predictions on observed values for model evaluation [20]. We will argue why—as a general tendency—increasing or decreasing may lead to a reduced inflation. In many models used for animal and plant breeding, the genetic component is modeled as a random variable with multivariate normal distribution, zero mean and structured variance, for instance given by in single-step. The simplest version without a fixed effect can be written as:where denotes the vector of phenotypes, the genetic effect and the independent and identically distributed errors. The best linear unbiased prediction (BLUP [19, 21]) for this is given byWe will apply some results on positive semi-definite matrices to this model and its BLUP to show when a change in the values of and (to and ) reduces the variance of the estimate of the genetic component. In the following, we use the partial order on the positive semi-definite matrices (the so-called Löwner order [22]), to speak about variance “increase” and “reduction” in a multivariate context. For two positive semi-definite matrices and , if and only if is positive semi-definite. With this notation, means that is positive semi-definite. For a reference on the properties of the Löwner order see [23].

Lemma 2

Let and be positive definite and as introduced. Let and be given such that . Then Moreover, For two matrices of the shape of the BLUP solution of Eq. (7) with a , we have Lemma 2(a) illustrates that if we keep constant and decrease to , the resulting matrix will be “smaller” with respect to the Löwner order. The same is true if we keep constant and increase to . Part (b) transfers this observation to the level of . Finally, part (c) connects with the BLUP of model (6). We now illustrate how this reduction with respect to the Löwner order, transfers to the variance of breeding value estimates in this simple model of .

Proposition 1

Let , , and let be the corresponding estimate of the breeding values. Moreover, let the empirical mean of both estimates be the same and let denote the empirical variance of the vector , defined byThen Proposition 1 illustrates that an important effect of using an smaller than 1, or a larger than 1 may be the reduction of the variance of the predicted genetic values. To see this, recall that Lemma 2(a) and (b) stated that reducing to and keeping fixed implies . The same is true for increasing to with fixed . Lemma 2(c) then implies that . Thus, provided that all preconditions are given, Proposition 1 states that the variance of the estimated breeding values is reduced. The critical assumption is , since this is not implied by (for a counter example see [24]). Thus, this will not be totally satisfied in practice. Instead, because we are dealing with a partial order, often neither nor will hold, but the difference of the two products may result in an indefinite matrix (i.e. one with both positive and negative eigenvalues). However, if only a few eigenvalues of the difference are smaller than zero, this assumption will be correct to a good approximation. Moreover, also the assumption of will only approximately hold in practice. Finally, recall that the variance components are usually estimated and an adapted estimate can compensate the effects of changes of the parameters and . We will give an example of how a reduced empirical variance may reduce inflation.

Example 1

Let be a vector of measured data and with . Moreover, let which means . Then and and . Defining the inflation as b of an ordinary least squares regression of on gives . Note here that a value of means that the estimates of the breeding values are deflated and that they are inflated. Thus means that the inflation is reduced when is used instead of . Example 1 illustrates that the reduced variance of the predicted genetic values may reduce inflation. It is worth highlighting that the scaling factor used in this example was formulated on the level of which does not simply translate to a scaled variance component for . In the next section, we give a small example with a well investigated wheat data set [18].

An example with wheat data

We assessed predictive ability, inflation and number of iterations up to convergence with varying parameters and on a publicly available wheat data set [18, 25]. The aim was to seek for the optimal combinations of both parameters, which maximize the predictive ability or minimize the inflation or the number of iterations to convergence, respectively. Moreover, we were interested in the general behavior of inflation when and are varied.

Data

The data set which we used consists of 599 CIMMYT wheat lines, genotyped with 1279 Diversity Array Technology markers indicating whether a certain allele is present (1) or not (0) in the respective line. The lines were grown in four different environments and grain yield was recorded for each line and each environment (for more details see [18]). We used only the phenotypic data of environment 1 for our comparisons. To see whether the choice of which lines are considered as (not) genotyped has a significant impact on properties of the single-step procedure, we split the lines into two parts according to the order in the data set and considered two scenarios: In scenario 1 (hereinafter referred to as SC1), lines 1 to 300 were treated as not genotyped and the remaining lines 301 to 599 were used as genotyped group. Thus, the pedigree relationship of lines 301 to 599 represents and their genomic relationship represents . The genomic relationship matrix was calculated according to VanRaden [26]: , with denoting the matrix giving the states of the p markers of the n individuals, and denoting the matrix with identical rows giving the column averages of . The same procedure was repeated in scenario 2 (hereinafter referred to as SC2) but the genotyped group consisted of lines 1 to 300. Note again that the order was used as provided by the data set.

Parameter grid

To seek for the optimal values for both parameters, 420 combinations of and were tested for each scenario. This number of combinations resulted from varying both parameters on a grid defined by 0.10 steps dividing the interval for , or [0.1, 2] for . To evaluate the performance of each parameter combination, we constructed by Eq. (3) for each combination of the parameters. Consequently, 420 different matrices were calculated in R [27] and transferred to the blupf90 software [28] to estimate the breeding values using the single-step procedure.

Evaluation of the prediction

To evaluate the predictions obtained with the different matrices, a cross-validation was run by partitioning the 599 wheat lines into 10 disjoint groups of approximately 60 lines each (regardless of whether their genomic information had been used in the single-step covariance matrix). The partitions used were those provided with the data set, which had been generated randomly [18]. Iteratively, each group was used as a test set and models were fit with the remaining lines. Prediction quality was evaluated for these 60 lines in terms of predictive ability and inflation. The former was measured as Pearson’s correlation between the phenotype and the estimated breeding value (EBV) for the test set. Inflation was calculated as the coefficient of regression of the phenotype on the EBV (for the test set). The optimal combination of parameter values should have a regression coefficient close to 1 (neither inflation nor deflation). The number of iterations to convergence was also recorded.

Results

Figure 1 illustrates the average predictive ability obtained for different choices of for the two different scenarios SC1 and SC2. The pedigree BLUP predictive ability is given by . The closest here is with a predictive ability of 0.46 for the first scenario and 0.43 for the second one and which is in accordance with the value of 0.448 originally reported [18]. The maximum predictive ability for SC1 was obtained with whereas in SC2 it was reached with . The location of the maximum differs, but in both scenarios we observe a broad optimum, that is a plateau on which the predictive ability hardly changes. An important observation is that the maximal predictive ability is very different between the two scenarios (0.53 vs. 0.45).

Fig. 1

Heat plots for predictive ability calculated as the Pearson’s correlation between phenotype and EBV for each combination of parameters and for a SC1 and b SC2. The lighter the colour, the higher the predictive ability of the corresponding combination Figure 2 shows the mean inflation for each considered combination for the two scenarios. The combinations with the lowest inflation, that is the highest regression coefficient b were in both scenarios, as suggested by our theoretical results. We see the tendency that both, increasing or decreasing reduces inflation in the sense of increasing b. However, note that in our example, we are already in a situation of deflation and reducing the variance of increases the predictive bias.

Fig. 2

Heat plots for inflation calculated as the slope in the regression of observed phenotypes on predictions for a SC1 and b SC2. The lighter the colour, the higher the slope and lower the inflation

Heat plots for inflation calculated as the slope in the regression of observed phenotypes on predictions for a SC1 and b SC2. The lighter the colour, the higher the slope and lower the inflation Lastly, the optimal values of the parameters in terms of a minimal number of iterations to convergence were for SC1 and for SC2. However, for most combinations, the number of iterations was between 26 and 32 which indicates that the influence of on the number of iterations required is limited for this data set (results not shown).

Discussion

Here we presented the general form of the single-step relationship matrix , when blending parameters and are defined on the level of its inverse [11, 12]. The matrix obtained (Eq. 4) is similar to the original single-step relationship matrix (Eq. 1) but with the role of replaced by expression Eq. (5). Moreover, we discussed some special choices of these parameters including the case for which and are equal, which was also the first adjustment of discussed in the literature [3]. The reduction in inflation was one of the main motivations for using the blending parameters [13, 16]. We illustrated with theoretical considerations that increasing or decreasing tends to reduce the empirical variance of , which again may lead to a reduced inflation. Our theoretical arguments are limited by their assumptions, but should hold to a good approximation. To reinforce these results with an empirical exploration, we gave a small example with a well investigated wheat data set [18]. There, the pattern observed for inflation was largely in accordance with what we expected from our theoretical considerations. With regard to predictive ability, the parameters showed broad optimality and varied strongly across the two scenarios SC1 and SC2. Both observations may be data set specific and the latter a consequence of the small population size. Finally, note that similar effects on inflation can also be achieved with other methods as for instance by explicitly reducing the additive variance or by accounting for inbreeding [5] (see in this context also Example 1). It may be worth considering the single-step method in more detail from a theoretical perspective to address the causes of inflation. Recent studies reported results in this direction by for instance attributing inflation to inconsistencies between genomic and pedigree relationships and by suggesting that accounting for inbreeding and unknown parent groups in a proper way may reduce this problem [5]. Moreover, it has also been highlighted that selective genotyping and selective imputation may have an impact on the properties of ssBLUP [29].

Conclusion

We provided theoretical arguments that increasing or decreasing may mainly decrease inflation by decreasing the variance of the estimated breeding values . Alternative solutions that address the problems of single-step predictions from a more theoretical point of view may be found by investigating the consistency problems of and with respect to scaling and coding further.

15 in total

1. Best linear unbiased estimation and prediction under a selection model.

Authors: C R Henderson
Journal: Biometrics Date: 1975-06 Impact factor: 2.571

2. Efficient methods to compute genomic predictions.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2008-11 Impact factor: 4.034

3. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.

Authors: I Misztal; A Legarra; I Aguilar
Journal: J Dairy Sci Date: 2009-09 Impact factor: 4.034

4. A relationship matrix including full pedigree and genomic information.

Authors: A Legarra; I Aguilar; I Misztal
Journal: J Dairy Sci Date: 2009-09 Impact factor: 4.034

5. Multiple-trait genomic evaluation of linear type traits using genomic and phenotypic data in US Holsteins.

Authors: S Tsuruta; I Misztal; I Aguilar; T J Lawlor
Journal: J Dairy Sci Date: 2011-08 Impact factor: 4.034

6. Single-step genomic evaluation using multitrait random regression model and test-day data.

Authors: M Koivula; I Strandén; J Pösö; G P Aamand; E A Mäntysaari
Journal: J Dairy Sci Date: 2015-02-07 Impact factor: 4.034

7. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers.

Authors: José Crossa; Gustavo de Los Campos; Paulino Pérez; Daniel Gianola; Juan Burgueño; José Luis Araus; Dan Makumbi; Ravi P Singh; Susanne Dreisigacker; Jianbing Yan; Vivi Arief; Marianne Banziger; Hans-Joachim Braun
Journal: Genetics Date: 2010-09-02 Impact factor: 4.562

8. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.

Authors: Selma Forni; Ignacio Aguilar; Ignacy Misztal
Journal: Genet Sel Evol Date: 2011-01-05 Impact factor: 4.297

9. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses.

Authors: Rohan L Fernando; Jack Cm Dekkers; Dorian J Garrick
Journal: Genet Sel Evol Date: 2014-09-22 Impact factor: 4.297

10. Metafounders are related to F _st fixation indices and reduce bias in single-step genomic evaluations.

Authors: Carolina A Garcia-Baccino; Andres Legarra; Ole F Christensen; Ignacy Misztal; Ivan Pocrnic; Zulma G Vitezica; Rodolfo J C Cantet
Journal: Genet Sel Evol Date: 2017-03-10 Impact factor: 4.297

9 in total

1. The impact of selective genotyping on the response to selection using single-step genomic best linear unbiased prediction.

Authors: Jeremy T Howard; Tom A Rathje; Caitlyn E Bruns; Danielle F Wilson-Wells; Stephen D Kachman; Matthew L Spangler
Journal: J Anim Sci Date: 2018-11-21 Impact factor: 3.159

2. Quality of breeding value predictions from longitudinal analyses, with application to residual feed intake in pigs.

Authors: Ingrid David; Anne Ricard; Van-Hung Huynh-Tran; Jack C M Dekkers; Hélène Gilbert
Journal: Genet Sel Evol Date: 2022-05-13 Impact factor: 5.100

3. Comparison of models for missing pedigree in single-step genomic prediction.

Authors: Yutaka Masuda; Shogo Tsuruta; Matias Bermann; Heather L Bradford; Ignacy Misztal
Journal: J Anim Sci Date: 2021-02-01 Impact factor: 3.159

4. Application of single-step genomic evaluation using social genetic effect model for growth in pig.

Authors: Joon Ki Hong; Young Sin Kim; Kyu Ho Cho; Deuk Hwan Lee; Ye Jin Min; Eun Seok Cho
Journal: Asian-Australas J Anim Sci Date: 2019-08-26 Impact factor: 2.509

5. Comparing Alternative Single-Step GBLUP Approaches and Training Population Designs for Genomic Evaluation of Crossbred Animals.

Authors: Amanda B Alvarenga; Renata Veroneze; Hinayah R Oliveira; Daniele B D Marques; Paulo S Lopes; Fabyano F Silva; Luiz F Brito
Journal: Front Genet Date: 2020-04-09 Impact factor: 4.599

6. Parsimonious genotype by environment interaction covariance models for cassava (Manihot esculenta).

Authors: Moshood A Bakare; Siraj Ismail Kayondo; Cynthia I Aghogho; Marnin D Wolfe; Elizabeth Y Parkes; Peter Kulakow; Chiedozie Egesi; Jean-Luc Jannink; Ismail Yusuf Rabbi
Journal: Front Plant Sci Date: 2022-09-21 Impact factor: 6.627

7. Estimates of genetic trend for single-step genomic evaluations.

Authors: Karin Meyer; Bruce Tier; Andrew Swan
Journal: Genet Sel Evol Date: 2018-08-03 Impact factor: 4.297

8. Genetic Parameter Estimation and Genomic Prediction of Duroc Boars' Sperm Morphology Abnormalities.

Authors: Yunxiang Zhao; Ning Gao; Jian Cheng; Saeed El-Ashram; Lin Zhu; Conglin Zhang; Zhili Li
Journal: Animals (Basel) Date: 2019-09-23 Impact factor: 2.752

9. Optimizing genomic selection for blight resistance in American chestnut backcross populations: A trade-off with American chestnut ancestry implies resistance is polygenic.

Authors: Jared W Westbrook; Qian Zhang; Mihir K Mandal; Eric V Jenkins; Laura E Barth; Jerry W Jenkins; Jane Grimwood; Jeremy Schmutz; Jason A Holliday
Journal: Evol Appl Date: 2019-12-29 Impact factor: 5.183

9 in total