| Literature DB >> 34380418 |
Ashley S Ling1, El Hamidi Hay2, Samuel E Aggrey3,4, Romdhane Rekaya5,4,6.
Abstract
BACKGROUND: Use of genomic information has resulted in an undeniable improvement in prediction accuracies and an increase in genetic gain in animal and plant genetic selection programs in spite of oversimplified assumptions about the true biological processes. Even for complex traits, a large portion of markers do not segregate with or effectively track genomic regions contributing to trait variation; yet it is not clear how genomic prediction accuracies are impacted by such potentially nonrelevant markers. In this study, a simulation was carried out to evaluate genomic predictions in the presence of markers unlinked with trait-relevant QTL. Further, we compared the ability of the population statistic FST and absolute estimated marker effect as preselection statistics to discriminate between linked and unlinked markers and the corresponding impact on accuracy.Entities:
Keywords: Accuracy; FST scores; Genomic prediction; Marker preselection
Mesh:
Substances:
Year: 2021 PMID: 34380418 PMCID: PMC8356450 DOI: 10.1186/s12863-021-00979-y
Source DB: PubMed Journal: BMC Genom Data ISSN: 2730-6844
Fig. 1A general description of the simulation and workflow: a) A 30-chromosome genome was simulated with 200 QTL randomly distributed across 2 chromosomes and the remaining 28 chromosomes harboring no QTL. b A schematic representation of the pedigree simulation (7 generations of 3.5 k individual each). The first six generations (21 k phenotyped individuals and half of them genotyped) were used for training. The last generation consisting of 3.5 k genotyped and non-phenotyped individuals was used as validation set. Preselection of SNPs was based either on the absolute estimated marker effects or FST scores calculated using data from the training population
Accuracy of genomic predictions under varying number of random-, FST-, or estimated effect-based preselected markers
| Selection methoda | Number of preselected SNPs (in thousands) | |||||
|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | |
| Random | 0.27 | 0.51 | 0.57 | 0.59 | 0.59 | 0.60 |
| FST | 0.81 | 0.85 | 0.83 | 0.81 | 0.80 | 0.79 |
| Effect | 0.84 | 0.78 | 0.72 | 0.69 | 0.68 | 0.67 |
a SNPs were preselected either randomly, based on their FST scores, or based on the absolute value of their estimated effect
Overlap (%) between random-, FST-, or effect-preselected marker subsets and G2 SNPs
| Selection methoda | Number of preselected SNPs (in thousands) | |||||
|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | |
| Random | 6.75 | 6.65 | 6.63 | 6.64 | 6.63 | 6.69 |
| FST | 99.99 | 67.04 | 47.19 | 37.84 | 32.26 | 28.45 |
| Effect | 100.00 | 76.07 | 54.13 | 43.13 | 36.42 | 31.89 |
a SNPs were preselected either randomly, based on their FST scores, or based on the absolute value of their estimated effect
Proportion of total GVa explained by random, effect, and FST-preselected markers
| Selection methodb | Number of preselected SNPs (in thousands) | |||||
|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | |
| Random | 0.0041 | 0.018 | 0.082 | 0.11 | 0.10 | 0.14 |
| FST | 0.31 | 0.38 | 0.39 | 0.39 | 0.39 | 0.40 |
| Effect | 0.33 | 0.40 | 0.41 | 0.41 | 0.41 | 0.41 |
aGV Genetic variance bSNPs were preselected either randomly, based on their FST scores, or based on the absolute value of their estimated effect
Correlations between centered and non-centered genomic relationships with QTL relationships for different sets of markersa
| All | HQ2 | LQ28 | |
|---|---|---|---|
| Non-Centered | 0.345399 | 0.631371 | 0.284554 |
| Centered | 0.159684 | 0.578165 | 0.0017988 |
| Relative Decrease (%) | 0.537721 | 0.084473 | 0.993687 |
a All = all markers; HQ2 = markers on the two chromosomes harboring the QTL; LQ28 = markers on the 28 chromosomes lacking QTL
Fig. 2Characterization of the modelling of QTL Mendelian Sampling (MS) using all, HQ2, and LQ28 markers: a) The distribution of marker-estimated MS for relationships among training individuals with sign reflecting whether marker-estimated and QTL MS fall in the same (+) or opposite (−) direction relative to the expected additive relationship. b The proportion of relationships among training individuals for which marker-estimated and QTL MS fall in the same direction relative to expected additive relationships
Correlations between non-centered and centered genomic and QTL relationships for varying numbers of FST-preselected markers
| Number of preselected SNPs (in thousands) | ||||||
|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | |
| Non-Centered | 0.339069 | 0.542457 | 0.54376 | 0.527988 | 0.511761 | 0.49678 |
| Centered | 0.315198 | 0.477285 | 0.469059 | 0.451059 | 0.433872 | 0.417522 |
| Relative Decrease (%) | 0.0728319 | 0.121576 | 0.138934 | 0.147398 | 0.153974 | 0.161264 |
Correlations between non-centered and centered genomic and QTL relationships for varying numbers of estimated effects-preselected markers
| Number of preselected SNPs (in thousands) | ||||||
|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | |
| Non-Centered | 0.378834 | 0.607054 | 0.602191 | 0.576295 | 0.550798 | 0.529322 |
| Centered | 0.351288 | 0.550309 | 0.548288 | 0.528462 | 0.506049 | 0.48496 |
| Relative Decrease (%) | 0.0739304 | 0.093814 | 0.0899005 | 0.08346 | 0.0818111 | 0.0844371 |
Fig. 3Characterization of the modelling of QTL Mendelian Sampling (MS) based on FST- and estimated-effects-preselected markers: a) The proportion of relationships among training individuals for which marker-estimated and QTL MS fall in the same direction relative to expected additive relationships. b and c The distribution of marker-estimated MS for relationships among training individuals with sign reflecting whether marker-estimated and QTL MS fall in the same (+) or opposite (−) direction relative to the expected additive relationship
Fig. 4Errors in the estimation of QTL Mendelian Sampling: Distribution of error terms (%) in the estimation of genomic relationships (Eq. 3) for a) FST - and b) estimated effect-preselected marker subsets
Fig. 5Regression of FST scores on the absolute estimated effect for a) HQ2 and b) LQ28 markers: The blue and yellow dashed lines denote the thresholds for selection of the top 10 k markers among all markers for FST and absolute estimated effects, respectively
Accuracy after exclusion of different subsets of LQ28 markers from construction of the genomic relationship matrix
| Excluded markersb | |||
|---|---|---|---|
| Exclusion criteriaa | None | Top 50 k | Bottom 50 k |
| Effects | 0.62 | 0.68 | 0.62 |
| FST Scores | 0.62 | 0.65 | 0.63 |
a Markers were excluded from the LQ28 subset based either of their FST scores or effects; b All markers were included (None), top 50 k markers excluded (Top 50 k), and bottom 50 k markers excluded (Bottom 50 k)
Accuracy and percent of genetic variance explained by FST- and effect-preselected subsets under a simulation design with 200 QTL distributed across all 30 chromosomes
| Number of preselected SNPs (in thousands) | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 40 | 50 | ||
| Accuracy | FST | 0.66 | 0.73 | 0.73 | 0.72 | 0.71 | 0.71 |
| Effect | 0.73 | 0.70 | 0.67 | 0.66 | 0.65 | 0.64 | |
| GV Explained (%) | FST | 0.21 | 0.29 | 0.31 | 0.32 | 0.32 | 0.33 |
| Effect | 0.21 | 0.34 | 0.35 | 0.35 | 0.35 | 0.36 | |