| Literature DB >> 36162979 |
Andre Garcia1, Ignacio Aguilar2, Andres Legarra3, Shogo Tsuruta4, Ignacy Misztal4, Daniela Lourenco4.
Abstract
BACKGROUND: Although single-step GBLUP (ssGBLUP) is an animal model, SNP effects can be backsolved from genomic estimated breeding values (GEBV). Predicted SNP effects allow to compute indirect prediction (IP) per individual as the sum of the SNP effects multiplied by its gene content, which is helpful when the number of genotyped animals is large, for genotyped animals not in the official evaluations, and when interim evaluations are needed. Typically, IP are obtained for new batches of genotyped individuals, all of them young and without phenotypes. Individual (theoretical) accuracies for IP are rarely reported, but they are nevertheless of interest. Our first objective was to present equations to compute individual accuracy of IP, based on prediction error covariance (PEC) of SNP effects, and in turn, are obtained from PEC of GEBV in ssGBLUP. The second objective was to test the algorithm for proven and young (APY) in PEC computations. With large datasets, it is impossible to handle the full PEC matrix, thus the third objective was to examine the minimum number of genotyped animals needed in PEC computations to achieve IP accuracies that are equivalent to GEBV accuracies.Entities:
Mesh:
Year: 2022 PMID: 36162979 PMCID: PMC9513904 DOI: 10.1186/s12711-022-00752-4
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 5.100
Number of animals with genotypes, phenotypes, and pedigree information in each scenario
| Scenario | Genotypes | Phenotypes | Pedigree |
|---|---|---|---|
| direct | 54,533 | 38,000 | 230,639 |
| apy | 54,533 | 38,000 | 230,639 |
| 2K-50K | 2K-50K | 38,000 | 230,639 |
| core (15K) | 15,000 | 38,000 | 230,639 |
| hacc (15K) | 15,000 | 38,000 | 230,639 |
| core_prog (15K) | 15,000 | 22,625 | 101,837 |
| hacc_prog (15K) | 15,000 | 32,673 | 106,051 |
direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes
Fig. 1Correlation between and . direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes
Fig. 2Regression coefficient () of on . direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes
Fig. 3Intercept () of the regression of on . direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes
Descriptive statistics for and for all scenarios and datasets
| Scenario | Average | Min | Max | Standard deviation | ABS differencea | |
|---|---|---|---|---|---|---|
| Average | Max | |||||
| GEBV acc | 0.73 | 0.27 | 0.82 | 0.03 | NA | NA |
| direct | 0.73 | 0.28 | 0.82 | 0.03 | 0.00 | 0.02 |
| apy | 0.74 | 0.28 | 0.82 | 0.03 | 0.01 | 0.03 |
| 50K | 0.73 | 0.26 | 0.82 | 0.03 | 0.00 | 0.02 |
| 40K | 0.71 | 0.21 | 0.8 | 0.03 | 0.02 | 0.08 |
| 30K | 0.68 | 0.1 | 0.79 | 0.04 | 0.05 | 0.20 |
| 20K | 0.64 | 0 | 0.76 | 0.04 | 0.09 | 0.34 |
| 10K | 0.57 | 0 | 0.71 | 0.05 | 0.16 | 0.41 |
| 5K | 0.5 | 0 | 0.67 | 0.05 | 0.23 | 0.48 |
| 2K | 0.41 | 0 | 0.62 | 0.06 | 0.32 | 0.65 |
| core (15K) | 0.61 | 0 | 0.74 | 0.04 | 0.12 | 0.34 |
| hacc (15K) | 0.62 | 0 | 0.76 | 0.05 | 0.11 | 0.34 |
| core_prog (15K) | 0.57 | 0 | 0.7 | 0.04 | 0.16 | 0.41 |
| hacc_prog (15K) | 0.61 | 0 | 0.75 | 0.05 | 0.13 | 0.34 |
aABS difference: absolute difference between and
direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes
Correlation and regression coefficients for and for the direct scenario with different blending proportions
| Scenario | Blending % | Correlation | b0 | b1 |
|---|---|---|---|---|
| direct | 5 | 1.00 | − 0.01 | 1.00 |
| direct_10 | 10 | 1.00 | − 0.01 | 0.98 |
| direct_20 | 20 | 1.00 | − 0.01 | 0.92 |
| direct_30 | 30 | 1.00 | − 0.01 | 0.86 |
Peak memory requirements for each scenario
| Scenarios | Peak memory requirement (GB)a | ||
|---|---|---|---|
| BLUPF90 | POSTGSF90 | PREDF90 | |
| direct | 195.60 | 228 | 11.6 |
| apy | 208.50 | 238 | 11.6 |
| 50K | 182.75 | 211 | 11.6 |
| 40K | 103.60 | 157 | 11.6 |
| 30K | 69.40 | 113 | 11.6 |
| 20K | 26.69 | 78 | 11.6 |
| 10K | 5.65 | 52 | 11.6 |
| 5K | 2.50 | 42 | 11.6 |
| 2K | 0.63 | 38 | 11.6 |
| core (15K) | 18.50 | 63 | 11.6 |
| hacc (15K) | 17.90 | 63 | 11.6 |
| core_prog (15K) | 17.90 | 63 | 11.6 |
| hacc_prog (15K) | 18.20 | 63 | 11.6 |
aReal/resident memory (RSS); Linux server (x86_64) equipped with Intel Xeon E5-2470 2.30 GHz processors with 16 cores
direct: all genotyped animals (N = 54,533) and phenotypes with direct ; apy: all genotyped animals (N = 54,533) and phenotypes with APY ; 50K-2K: all phenotypes and decreasing the number of genotyped animals from 50K to 2K; core: genotypes for core animals only (N = 15K) and all phenotypes; hacc: genotypes for high accuracy animals only (N = 15K) and all phenotypes; core_prog: genotypes and phenotypes for core animals plus their progeny phenotypes; hacc_prog: genotypes and phenotypes for high accuracy animals plus their progeny phenotypes