| Literature DB >> 28642364 |
Wan-Ling Hsu1, Dorian J Garrick1,2, Rohan L Fernando3.
Abstract
In single-step analyses, missing genotypes are explicitly or implicitly imputed, and this requires centering the observed genotypes using the means of the unselected founders. If genotypes are only available for selected individuals, centering on the unselected founder mean is not straightforward. Here, computer simulation is used to study an alternative analysis that does not require centering genotypes but fits the mean [Formula: see text] of unselected individuals as a fixed effect. Starting with observed diplotypes from 721 cattle, a five-generation population was simulated with sire selection to produce 40,000 individuals with phenotypes, of which the 1000 sires had genotypes. The next generation of 8000 genotyped individuals was used for validation. Evaluations were undertaken with (J) or without (N) [Formula: see text] when marker covariates were not centered; and with (JC) or without (C) [Formula: see text] when all observed and imputed marker covariates were centered. Centering did not influence accuracy of genomic prediction, but fitting [Formula: see text] did. Accuracies were improved when the panel comprised only quantitative trait loci (QTL); models JC and J had accuracies of 99.4%, whereas models C and N had accuracies of 90.2%. When only markers were in the panel, the 4 models had accuracies of 80.4%. In panels that included QTL, fitting [Formula: see text] in the model improved accuracy, but had little impact when the panel contained only markers. In populations undergoing selection, fitting [Formula: see text] in the model is recommended to avoid bias and reduction in prediction accuracy due to selection.Entities:
Keywords: GenPred; Genomic Selection; Shared Data Resources; centering genotype covariates; estimated breeding value; genomic prediction; selection; single-step
Mesh:
Substances:
Year: 2017 PMID: 28642364 PMCID: PMC5555473 DOI: 10.1534/g3.117.043596
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Four combinations of the single-step Bayesian regression analyses
| Models | Marker Covariates | |
|---|---|---|
| Centered | Not Centered | |
| With J and | JC | J |
| Without J | C | N |
Centered, e.g., genotype values represented as −1, 0, 1 when the uncentered genotype covariate with values 0, 1, 2 has mean 1.
Not centered, e.g., genotype values represented as the number of copies of the A allele.
Correlations (%, ) between TBV and (G)EBV for alternative analyses
| Genotype Data | Analyses | ||||
|---|---|---|---|---|---|
| JC | J | C | N | PBLUP | |
| 50 QTL + 150 markers | 97.59 | 97.63 | 96.32 | 96.31 | — |
| 50 QTL only | 99.44 | 99.45 | 90.20 | 90.18 | — |
| 150 markers only | 80.47 | 80.47 | 80.44 | 80.44 | — |
| No genotypes | — | — | — | — | 41.56 |
Average correlation between true breeding value (TBV) and (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8,000 individuals with genotypes but no phenotypes. The true QTL effects were sampled from a Normal distribution with mean = 0.2 and scaled to simulate a trait with a heritability 0.5.
J: includes a covariate for μ in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, and PBLUP: pedigree-based BLUP.
The analyses were based on fitting covariates for only 50 QTL, only 150 markers, or both 50 QTL and 150 markers.
Correlations (%, ) between TBV and (G)EBV for alternative analyses for different heritabilities
| Heritabilities | Analyses | ||||
|---|---|---|---|---|---|
| JC | J | C | N | PBLUP | |
| 93.68 | 93.71 | 92.42 | 92.42 | 31.33 | |
| 96.79 | 96.81 | 95.19 | 95.19 | 37.07 | |
| 97.59 | 97.63 | 96.32 | 96.31 | 41.56 | |
Average correlation between true breeding value (TBV) and (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8000 individuals with genotypes but no phenotypes. The true quantitative trait loci (QTL) effects were sampled from a normal distribution with mean = 0.2 and scaled to simulate a trait with heritabilities 0.1, 0.3 or 0.5.
J: includes a covariate for μ in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, and PBLUP: pedigree-based BLUP. Covariates were fitted for both 50 QTL and 150 markers.
Regression coefficients () of TBV on (G)EBV
| Genotype Data | Analyses | ||||
|---|---|---|---|---|---|
| JC | J | C | N | PBLUP | |
| 50 QTL + 150 markers | 1.06 | 1.06 | 1.06 | 1.06 | — |
| 50 QTL only | 1.05 | 1.05 | 1.12 | 1.12 | — |
| 150 markers only | 0.95 | 0.95 | 0.95 | 0.95 | — |
| No genotypes | — | — | — | — | 0.95 |
Average Regression coefficients of true breeding value (TBV) on (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8,000 individuals with genotypes but no phenotypes. The true QTL effects were sampled from a Normal distribution with mean μ= 0.2 and scaled to simulate a trait with a trait with a heritability 0.5.
The analyses were based on fitting covariates for only 50 QTL, only 150 markers, or both 50 QTL and 150 markers.
J: includes a covariate for μ in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, and PBLUP: pedigree-based BLUP.
Accuracy and bias of genomic prediction () for alternative QTL distributions and analyses
| Substitution | Analyses | ||||
|---|---|---|---|---|---|
| JC | J | C | N | PBLUP | |
| Correlations (%) | |||||
| | 97.91 | 97.91 | 97.75 | 97.48 | 42.66 |
| | 97.59 | 97.63 | 96.32 | 96.31 | 41.56 |
| Regression coefficient | |||||
| | 1.05 | 1.05 | 1.05 | 1.05 | 0.97 |
| | 1.06 | 1.06 | 1.06 | 1.06 | 0.95 |
Accuracy was quantified using the average correlation between true breeding value and (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8000 individuals with genotypes but no phenotypes.
Bias was quantified using the average regression coefficients of true breeding value on (genomic) estimated breeding values from 10 replications.
The true QTL effects were sampled from normal distributions with mean or and scaled to simulate a trait with a heritability 0.5.
J: includes a covariate for in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, and PBLUP: pedigree-based BLUP. Covariates were fitted for both 50 QTL and 150 markers.
Accuracy and bias of genomic prediction () for different numbers of QTL and alternative analyses
| Numbers of QTL | Analyses | ||||
|---|---|---|---|---|---|
| JC | J | C | N | PBLUP | |
| Correlations (%) | |||||
| 50 QTL + 150 markers | 97.59 | 97.63 | 96.32 | 96.31 | 41.56 |
| 500 QTL + 1500 markers | 90.45 | 90.49 | 89.99 | 89.99 | 41.62 |
| Regression coefficient | |||||
| 50 QTL + 150 markers | 1.06 | 1.06 | 1.06 | 1.06 | 0.95 |
| 500 QTL + 1500 markers | 1.08 | 1.08 | 1.08 | 1.08 | 0.98 |
Accuracy was quantified using the average correlation between true breeding value and (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8,000 individuals with genotypes but no phenotypes.
Bias was quantified using the average regression coefficients of true breeding value on (genomic) estimated breeding values from 10 replications.
The true effects for 50 or 500 QTL were sampled from a Normal distribution with mean μ = 0.2 and scaled to simulate a trait with a heritability 0.5.
J: includes a covariate for μ in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, and PBLUP: pedigree-based BLUP. Covariates were fitted for either 50 QTL and 150 markers or 500 QTL and 1500 markers.
Accuracy and bias of genomic prediction () when centering for all genotypes or observed genotypes
| Analyses | Correlations (%) | Regression Coefficient |
|---|---|---|
| JC | 82.08 | 0.95 |
| J | 82.08 | 0.95 |
| C | 82.05 | 0.95 |
| N | 82.05 | 0.95 |
| JC* | 82.09 | 0.95 |
| C* | 65.35 | 1.16 |
| PBLUP | 42.66 | 0.97 |
Accuracy was quantified using the average correlation between true breeding value and (genomic) estimated breeding values from 10 replications validated in Generation 5, comprising 8,000 individuals with genotypes but no phenotypes. The true QTL effects were sampled from Normal distributions with mean μ = 0 and scaled to simulate a trait with a heritability 0.5.
Bias was quantified using the average regression coefficients of true breeding value on (genomic) estimated breeding values from 10 replications.
J: includes a covariate for μ in the model, C: entire matrix of imputed and observed genotype covariates centered, JC: both J and C, N: neither J or C, C*: only observed genotype covariates centered, JC*: both J and C*, and PBLUP: pedigree-based BLUP. Covariates were fitted for 150 markers.