| Literature DB >> 26940536 |
Abstract
BACKGROUND: Genomic selection is still to be evaluated and optimized in many species. Mathematical modeling of selection schemes prior to their implementation is a classical and useful tool for that purpose. These models include formalization of a number of entities including the precision of the estimated breeding value. To model genomic selection schemes, equations that predict this reliability as a function of factors such as the size of the reference population, its diversity, its genetic distance from the group of selection candidates genotyped, number of markers and strength of linkage disequilibrium are needed. The present paper aims at exploring new approximations of this reliability.Entities:
Mesh:
Year: 2016 PMID: 26940536 PMCID: PMC4778372 DOI: 10.1186/s12711-016-0183-3
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Coefficients describing the genotypes’ distributions moments when using the relation from Additional file 1
| E[ |
|
| E[ |
|
| E[ |
|
| E[ |
|
| E[ |
|
| E[ |
|
δ are the 15 classical identity states probabilities between two individuals [33–35]
Coefficients of expectations and E[X X X X ] involve IBD status between three or four different individuals and are explained in Additional file 1
Moments of genotypes’ distributions depending on genotype codification
| Expectations | Genotype codification | |
|---|---|---|
|
|
| |
| E[ | 0 | 0 |
| E[ |
| 1 |
| E[ |
| 1/ |
| E[ | 2 | 2 |
| E[ |
|
|
| E[ |
|
|
Expectation of elements involved in precision formulae when a uniform or a U shaped distribution of allelic frequencies is assumed
| Element | Expectation | |
|---|---|---|
| Uniform | U shaped | |
| E[ | 1/3 |
|
| E[ | 2/15 |
|
| E[ |
|
|
| E[ |
|
|
| E[ |
|
|
A large effective size of the population was assumed to make 1/N negligible
Expectations of products of four allelic values received by two individuals at two loci depending on the IBD status and parental origins of the alleles
|
|
|
|
|---|---|---|
|
|
|
|
|
|
| |
|
|
| |
|
| (1 − | |
|
|
|
|
|
|
|
|
|
|
|
|
Only non-null terms are given
p and p are the frequencies of the most frequent alleles at loci m and l. is the linkage disequilibrium measure between m and l
is the allelic value the candidate received from its parent at locus l etc
means c and i genes are IBD at m and l, only at m etc
Fig. 1Convergence of the Taylor series as a function of heritability and reference population size (). a First approximation. b Second approximation
Performances of the first approximation for an unrelated reference population as a function of the number of markers (n ) and reference population size (n ), assuming ν 2 = 0.4
|
|
| True value | Approximation | Convergence criteria |
|
|---|---|---|---|---|---|
| 50 | 500 | 0.92 | 2.14 | 2.74 | 2.05 |
| 50 | 1000 | 0.96 | 2.30 | 2.62 | 2.23 |
| 50 | 1500 | 0.96 | 2.31 | 2.57 | 2.28 |
| 50 | 2000 | 0.97 | 2.43 | 2.60 | 2.37 |
| 100 | 500 | 0.85 | 1.71 | 2.58 | 1.69 |
| 100 | 1000 | 0.90 | 2.01 | 2.49 | 2.05 |
| 100 | 1500 | 0.95 | 2.21 | 2.58 | 2.18 |
| 100 | 2000 | 0.96 | 2.31 | 2.60 | 2.26 |
| 250 | 500 | 0.67 | 1.09 | 2.53 | 1.09 |
| 250 | 1000 | 0.81 | 1.56 | 2.53 | 1.55 |
| 250 | 1500 | 0.85 | 1.72 | 2.54 | 1.72 |
| 250 | 2000 | 0.89 | 1.96 | 2.53 | 1.97 |
| 1000 | 500 | 0.32 | 0.39 | 0.61 | 0.38 |
| 1000 | 1000 | 0.52 | 0.72 | 2.45 | 0.73 |
| 1000 | 1500 | 0.64 | 0.99 | 2.51 | 0.99 |
| 1000 | 2000 | 0.71 | 1.18 | 2.51 | 1.18 |
| 1500 | 500 | 0.25 | 0.28 | 0.27 | 0.28 |
| 1500 | 1000 | 0.42 | 0.54 | 1.88 | 0.54 |
| 1500 | 1500 | 0.54 | 0.76 | 2.49 | 0.76 |
| 1500 | 2000 | 0.61 | 0.91 | 2.50 | 0.91 |
| 2000 | 500 | 0.20 | 0.22 | 0.20 | 0.22 |
| 2000 | 1000 | 0.35 | 0.43 | 0.95 | 0.43 |
| 2000 | 1500 | 0.46 | 0.61 | 2.28 | 0.61 |
| 2000 | 2000 | 0.54 | 0.76 | 2.48 | 0.76 |
| 2500 | 500 | 0.16 | 0.17 | 0.16 | 0.17 |
| 2500 | 1000 | 0.30 | 0.35 | 0.44 | 0.35 |
| 2500 | 1500 | 0.40 | 0.50 | 1.61 | 0.50 |
| 2500 | 2000 | 0.49 | 0.65 | 2.40 | 0.66 |
The convergence criterion is the value of the Taylor series at order 10
is the expectation of the first approximation across the distribution of allele frequencies as given in Goddard [16]
Performances of the second approximation for an unrelated reference population as a function of the number of markers (n ) and reference population size (n ), assuming ν 2 = 0.4
|
|
| True value | Approximation | 10th order approximation |
|
|---|---|---|---|---|---|
| 50 | 500 | 0.92 | 0.91 | 0.91 | 0.91 |
| 50 | 1000 | 0.96 | 0.95 | 0.94 | 0.95 |
| 50 | 1500 | 0.96 | 0.97 | 0.97 | 0.96 |
| 50 | 2000 | 0.97 | 0.97 | 0.97 | 0.97 |
| 100 | 500 | 0.85 | 0.83 | 0.82 | 0.83 |
| 100 | 1000 | 0.90 | 0.91 | 0.90 | 0.91 |
| 100 | 1500 | 0.95 | 0.94 | 0.94 | 0.94 |
| 100 | 2000 | 0.96 | 0.95 | 0.95 | 0.95 |
| 250 | 500 | 0.67 | 0.71 | 0.67 | 0.71 |
| 250 | 1000 | 0.81 | 0.83 | 0.81 | 0.82 |
| 250 | 1500 | 0.85 | 0.88 | 0.87 | 0.88 |
| 250 | 2000 | 0.89 | 0.90 | 0.90 | 0.90 |
| 1000 | 500 | 0.32 | 0.41 | 0.31 | 0.40 |
| 1000 | 1000 | 0.52 | 0.59 | 0.52 | 0.57 |
| 1000 | 1500 | 0.62 | 0.68 | 0.64 | 0.67 |
| 1000 | 2000 | 0.69 | 0.73 | 0.70 | 0.73 |
| 1500 | 500 | 0.24 | 0.32 | 0.23 | 0.31 |
| 1500 | 1000 | 0.42 | 0.50 | 0.42 | 0.48 |
| 1500 | 1500 | 0.52 | 0.60 | 0.53 | 0.59 |
| 1500 | 2000 | 0.60 | 0.66 | 0.61 | 0.67 |
| 2000 | 500 | 0.19 | 0.26 | 0.17 | 0.28 |
| 2000 | 1000 | 0.34 | 0.43 | 0.33 | 0.44 |
| 2000 | 1500 | 0.46 | 0.53 | 0.46 | 0.53 |
| 2000 | 2000 | 0.53 | 0.60 | 0.54 | 0.61 |
| 2500 | 500 | 0.16 | 0.22 | 0.14 | 0.23 |
| 2500 | 1000 | 0.30 | 0.38 | 0.28 | 0.38 |
| 2500 | 1500 | 0.40 | 0.48 | 0.39 | 0.47 |
| 2500 | 2000 | 0.47 | 0.55 | 0.48 | 0.56 |
The convergence criterion is the value of the Taylor series at order 10
is the expectation of the second approximation across the distribution of allele frequencies as given in Goddard [16]
Fig. 2Largest Eigen value of the noise matrix involved in the Taylor expansion of the phenotypic variances matrix as a function of heritability, reference population size and number of markers. a First approximation. b Second approximation
Performances of the first approximation when the parents of candidate belong to the reference population as a function of the number of markers (n ) and reference population size (n ), assuming ν 2 = 0.4
|
|
| True value | Approximation | 10th order approximation |
|
|---|---|---|---|---|---|
| 1000 | 500 | 0.37 | 0.42 | 0.58 | 0.47 |
| 1000 | 1000 | 0.56 | 0.73 | 2.47 | 0.82 |
| 1000 | 1500 | 0.65 | 0.95 | 2.50 | 1.04 |
| 1000 | 2000 | 0.72 | 1.17 | 2.52 | 1.26 |
| 1500 | 500 | 0.31 | 0.34 | 0.33 | 0.37 |
| 1500 | 1000 | 0.46 | 0.56 | 1.87 | 0.63 |
| 1500 | 1500 | 0.56 | 0.73 | 2.44 | 0.81 |
| 1500 | 2000 | 0.62 | 0.87 | 2.50 | 0.96 |
| 2000 | 500 | 0.27 | 0.29 | 0.27 | 0.32 |
| 2000 | 1000 | 0.40 | 0.46 | 0.89 | 0.52 |
| 2000 | 1500 | 0.50 | 0.62 | 2.24 | 0.69 |
| 2000 | 2000 | 0.57 | 0.76 | 2.48 | 0.84 |
| 2500 | 500 | 0.24 | 0.25 | 0.24 | 0.27 |
| 2500 | 1000 | 0.36 | 0.40 | 0.49 | 0.45 |
| 2500 | 1500 | 0.46 | 0.55 | 1.79 | 0.61 |
| 2500 | 2000 | 0.52 | 0.67 | 2.35 | 0.74 |
The convergence criterion is the value of the Taylor series at order 10
is the expectation of the first approximation across the distribution of allele frequencies as given in Goddard [16]
Performances of the second approximation when the parents of the candidates belong to the reference population as a function of the number of markers (n ) and reference population size (n ), assuming ν 2 = 0.4
|
|
| True value | Approximation | 10th order approximation |
|
|---|---|---|---|---|---|
| 1000 | 500 | 0.37 | 0.46 | 0.35 | 0.46 |
| 1000 | 1000 | 0.53 | 0.60 | 0.54 | 0.61 |
| 1000 | 1500 | 0.64 | 0.70 | 0.65 | 0.69 |
| 1000 | 2000 | 0.71 | 0.75 | 0.72 | 0.75 |
| 1500 | 500 | 0.30 | 0.39 | 0.26 | 0.40 |
| 1500 | 1000 | 0.47 | 0.55 | 0.46 | 0.51 |
| 1500 | 1500 | 0.56 | 0.63 | 0.56 | 0.61 |
| 1500 | 2000 | 0.63 | 0.69 | 0.64 | 0.68 |
| 2000 | 500 | 0.27 | 0.36 | 0.22 | 0.35 |
| 2000 | 1000 | 0.40 | 0.49 | 0.38 | 0.48 |
| 2000 | 1500 | 0.50 | 0.58 | 0.50 | 0.56 |
| 2000 | 2000 | 0.57 | 0.64 | 0.57 | 0.62 |
| 2500 | 500 | 0.24 | 0.33 | 0.20 | 0.32 |
| 2500 | 1000 | 0.34 | 0.44 | 0.31 | 0.45 |
| 2500 | 1500 | 0.44 | 0.53 | 0.43 | 0.53 |
| 2500 | 2000 | 0.37 | 0.46 | 0.35 | 0.46 |
The convergence criterion is the value of the Taylor series at order 10
is the expectation of the second approximation across the distribution of allele frequencies as given in Goddard [16]
Fig. 3Example of approximated precision [from Eq. (3)] corresponding to various relations between the candidate and reference populations. ()
Fig. 4Number of equivalent markers [from Eq. (8)] as a function of the total number of markers (n ) and reference population size (n ). ()
Fig. 5Number of equivalent markers [from Eq. (8)] as a function of the effective population size (Ne) and heritability (v 2)
Fig. 6Parameter of the beta distribution that best fits Godard’s distribution of allele frequencies
Expectation of the ratio of variances vs. the ratio of the variance expectations considering different reference population sizes and numbers of markers (ν 2 = 0.4, 50 simulations)
|
|
|
|
|
| 500 | 1000 | 0.403 | 0.401 |
| 1000 | 1000 | 0.726 | 0.725 |
| 1500 | 1000 | 1.010 | 1.008 |
| 2000 | 1000 | 1.212 | 1.212 |
| 500 | 1500 | 0.270 | 0.269 |
| 1000 | 1500 | 0.535 | 0.534 |
| 1500 | 1500 | 0.753 | 0.753 |
| 2000 | 1500 | 0.944 | 0.944 |
| 500 | 2000 | 0.213 | 0.213 |
| 1000 | 2000 | 0.414 | 0.413 |
| 1500 | 2000 | 0.597 | 0.597 |
| 2000 | 2000 | 0.760 | 0.759 |
| 500 | 2500 | 0.175 | 0.175 |
| 1000 | 2500 | 0.349 | 0.348 |
| 1500 | 2500 | 0.515 | 0.514 |
| 2000 | 2500 | 0.670 | 0.669 |