| Literature DB >> 31467030 |
Elena Flavia Mouresan1, Maria Selle2, Lars Rönnegård3,4.
Abstract
The increasing amount of available biological information on the markers can be used to inform the models applied for genomic selection to improve predictions. The objective of this study was to propose a general model for genomic selection using a link function approach within the hierarchical generalized linear model framework (hglm) that can include external information on the markers. These models can be fitted using the well-established hglm package in R. We also present an R package (CodataGS) to fit these models, which is significantly faster than the hglm package. Simulated data were used to validate the proposed model. We tested categorical, continuous and combination models where the external information on the markers was related to 1) the location of the QTL on the genome with varying degree of uncertainty, 2) the relationship of the markers with the QTL calculated as the LD between them, and 3) a combination of both. The proposed models showed improved accuracies from 3.8% up to 23.2% compared to the SNP-BLUP method in a simulated population derived from a base population with 100 individuals. Moreover, the proposed categorical model was tested on a dairy cattle dataset for two traits (Milk Yield and Fat Percentage). These results also showed improved accuracy compared to SNP-BLUP, especially for the Fat% trait. The performance of the proposed models depended on the genetic architecture of the trait, as traits that deviate from the infinitesimal model benefited more from the external information. Also, the gain in accuracy depended on the degree of uncertainty of the external information provided to the model. The usefulness of these type of models is expected to increase with time as more accurate information on the markers becomes available.Entities:
Keywords: BLUP; CodataGS; GenPred; Genomic Prediction; Shared Data Resources; external information; hglm
Mesh:
Year: 2019 PMID: 31467030 PMCID: PMC6778789 DOI: 10.1534/g3.119.400381
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
SUMMARY OF MODELS TESTED FOR EACH SCENARIO OF GENETIC ARCHITECTURE SIMULATED
| Sc0 | Sc1 | Sc2 | Sc3 | |
|---|---|---|---|---|
W10= categorical model with window of 10 SNPs, W20= categorical model with window of 20 SNPS, W40= categorical model with window of 40 SNPS, LD= continuous model with LD estimates, W10-LD= combined model with window of 10 SNPs and LD estimates, W20-LD= combined model with window of 20 SNPs and LD estimates, W40-LD= combined model with window of 40 SNPs and LD estimates.
Sc0= simulation scenario 0, Sc1= simulation scenario 1, Sc2= simulation scenario 2, Sc3= simulation scenario 3.
Figure 1Simulated QTL effects (black dots) and fitted SNP effects under SNP-BLUP and 7 alternative models (Categorical: W10, W20 and W40, Continuous: LD, Combination: W10-LD, W20-LD and W40-LD) for one simulation replicate under simulation scenario Sc0 with 10 QTL per chromosome underlying the trait.
ACCURACY AND BIAS OF THE PREDICTED EGBVS IN THE VALIDATION SET (GENERATION 2) FOR THE SCENARIO 0 (SC0) WITH 10 QTLS PER CHROMOSOME UNDERLYING THE TRAIT
| Models | Accuracy (r) | Bias (b) |
|---|---|---|
| 0.586 (0.010) | 1.213 (0.089) | |
| 0.670 (0.013) | 1.003 (0.044) | |
| 0.656 (0.012) | 1.014 (0.048) | |
| 0.635 (0.012) | 1.030 (0.045) | |
| 0.717 (0.011) | 1.024 (0.041) | |
| 0.707 (0.013) | 1.050 (0.053) | |
| 0.714 (0.013) | 1.044 (0.050) | |
| 0.722 (0.013) | 1.028 (0.042) |
W10= categorical model with window of 10 SNPs, W20= categorical model with window of 20 SNPS, W40= categorical model with window of 40 SNPS, LD= continuous model with LD estimates, W10-LD= combined model with window of 10 SNPs and LD estimates, W20-LD= combined model with window of 20 SNPs and LD estimates, W40-LD= combined model with window of 40 SNPs and LD estimates.
Figure 2Accuracies obtained under different cases of genetic architecture of the trait for SNP-BLUP and the alternative models.
Figure 3Accuracies obtained from SNP-BLUP model and alternative models under all simulated scenarios and genetic architectures.
Figure 4Time of execution (seconds per iteration) of SNP-BLUP and alternative models from hglm package and CodataGS package.
MEAN ACCURACY (STANDARD ERROR) OF THE PREDICTED EGBVS IN A 5-FOLD CROSS VALIDATION ANALYSIS USING THE GERMAN HOLSTEIN DATA FOR TWO TRAITS
| Models | MY | Fat% |
|---|---|---|
| 0.771 (0.002) | 0.811 (0.004) | |
| 0.785 (0.002) | 0.862 (0.003) |
W41= categorical model with window of 40 SNPs around the top SNP for the trait detected on a GWAS study. MY: Milk Yield, Fat%: Fat percentage.