| Literature DB >> 28673233 |
Melanie Hess1,2, Tom Druet3, Andrew Hess4, Dorian Garrick4,5.
Abstract
BACKGROUND: Fitting covariates representing the number of haplotype alleles rather than single nucleotide polymorphism (SNP) alleles may increase genomic prediction accuracy if linkage disequilibrium between quantitative trait loci and SNPs is inadequate. The objectives of this study were to evaluate the accuracy, bias and computation time of Bayesian genomic prediction methods that fit fixed-length haplotypes or SNPs. Genotypes at 37,740 SNPs that were common to Illumina BovineSNP50 and high-density panels were phased for ~58,000 New Zealand dairy cattle. Females born before 1 June 2008 were used for training, and genomic predictions for milk fat yield (n = 24,823), liveweight (n = 13,283) and somatic cell score (n = 24,864) were validated within breed (predominantly Holstein-Friesian, predominantly Jersey, or admixed KiwiCross) in later-born females. Covariates for haplotype alleles of five lengths (125, 250, 500 kb, 1 or 2 Mb) were generated and rare haplotypes were removed at four thresholds (1, 2, 5 or 10%), resulting in 20 scenarios tested. Genomic predictions fitting covariates for either SNPs or haplotypes were calculated by using BayesA, BayesB or BayesN. This is the first study to quantify the accuracy of genomic prediction using haplotypes across the whole genome in an admixed population.Entities:
Mesh:
Year: 2017 PMID: 28673233 PMCID: PMC5494768 DOI: 10.1186/s12711-017-0329-y
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Numbers of records in training and validation sets used for genomic prediction
| Breeda | Fatb | Lwtb | SCSb | |||
|---|---|---|---|---|---|---|
| Training | Validation | Training | Validation | Training | Validation | |
| HF | 9072 | 3354 | 3908 | 1464 | 9094 | 3358 |
| J | 5067 | 5854 | 2667 | 2331 | 5071 | 5860 |
| KX | 10,684 | 6125 | 6708 | 2436 | 10,699 | 6140 |
| Total | 24,823c | 15,333 | 13,283c | 6231 | 24,864c | 15,358 |
aHF = predominantly (>7/8) Holstein–Friesian; J = predominantly (>7/8) Jersey; KX = admixed KiwiCross
bYield deviation: Fat = Milk fat yield; Lwt = Liveweight; SCS = Somatic Cell Score
cTraining was performed using pooled data across the three breed classes
Mean and maximum number of SNPs per haploblock length
| Haploblock length | Number of haploblocks | Number of SNPs per haploblocka | |
|---|---|---|---|
| Mean | Maximum | ||
| 125 kb | 17,452 | 2 | 6 |
| 250 kb | 9676 | 4 | 10 |
| 500 kb | 4978 | 8 | 17 |
| 1 Mb | 2514 | 15 | 31 |
| 2 Mb | 1267 | 30 | 54 |
aThe minimum number of SNPs in a haploblock was 1 for all haplotype lengths
Fig. 1Genomic prediction accuracy and bias of milk fat yield with varying haplotype lengths and frequencies
Computation time and number of random covariates in haplotype and SNP BayesA models
| Trait | Freqa | Number of random covariates | Computation time (h)b | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 125 kb | 250 kb | 500 kb | 1 Mb | 2 Mb | 125 kb | 250 kb | 500 kb | 1 Mb | 2 Mb | ||
| Milk fat yield | SNP | 37,226 | 37,226 | 37,226 | 37,226 | 37,226 | 13.1 | 13.1 | 13.1 | 13.1 | 13.1 |
| 1% | 56,590 | 64,724 | 70,380 | 56,534 | 32,520 | 22.8 | 23.5 | 24.7 | 20.0 | 11.3 | |
| 2.5% | 51,889 | 53,482 | 47,378 | 29,343 | 13,460 | 21.3 | 19.7 | 16.8 | 10.4 | 4.8 | |
| 5% | 46,283 | 41,737 | 28,324 | 12,291 | 3977 | 19.6 | 15.5 | 10.4 | 4.5 | 1.5 | |
| 10% | 37,848 | 27,656 | 12,790 | 3255 | 646 | 15.2 | 10.8 | 5.0 | 1.4 | 0.3 | |
| Liveweight | SNP | 37,356 | 37,356 | 37,356 | 37,356 | 37,356 | 6.6 | 6.6 | 6.6 | 6.6 | 6.6 |
| 1% | 56,595 | 64,634 | 70,218 | 56,164 | 32,117 | 11.0 | 13.1 | 13.3 | 9.9 | 5.7 | |
| 2.5% | 51,839 | 53,204 | 46,797 | 28,756 | 13,050 | 10.2 | 9.6 | 9.2 | 5.2 | 2.4 | |
| 5% | 46,163 | 41,467 | 28,040 | 12,198 | 4027 | 9.2 | 7.7 | 5.2 | 2.3 | 0.8 | |
| 10% | 37,775 | 27,604 | 12,882 | 3354 | 707 | 7.8 | 5.4 | 2.6 | 0.7 | 0.2 | |
| Somatic cell score | SNP | 37,229 | 37,229 | 37,229 | 37,229 | 37,229 | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 |
| 1% | 56,630 | 64,730 | 70,375 | 56,521 | 32,516 | 21.4 | 24.4 | 27.2 | 19.7 | 11.1 | |
| 2.5% | 51,934 | 53,488 | 47,385 | 29,348 | 13,464 | 23.1 | 20.8 | 16.7 | 10.9 | 4.7 | |
| 5% | 46,326 | 41,746 | 28,329 | 12,296 | 3977 | 18.3 | 15.4 | 10.2 | 4.5 | 1.5 | |
| 10% | 37,898 | 27,663 | 12,793 | 3254 | 645 | 15.1 | 10.7 | 5.0 | 1.3 | 0.3 | |
aFrequency threshold for removing rare haplotype alleles. SNP refers to fitting covariates for SNPs rather than haplotype alleles
bComputation time for running the analysis on the training set containing all breeds with a chain length of 41,000
Fig. 2Genomic prediction accuracy of Bayesian SNP and haplotype models
Prediction bias (standard error) of SNP and haplotype models for BayesA, BayesB and BayesN analyses
| Traita | Breedb | BayesA | BayesB (π = 0.5) | BayesN (Π = 0.5; π = 0) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SNP | Hap125c | Hap250d | SNP | Hap125c | Hap250d | SNP | Hap125c | Hap250d | ||
| Fat | HF |
| 0.07 (0.05) | 0.06 (0.05) | 0.06 (0.05) | 0.08 (0.05) | 0.07 (0.05) | 0.09 (0.05) | 0.06 (0.05) | 0.05 (0.05) |
| J | − | −0.16 (0.04) | −0.17 (0.04) | −0.15* (0.04) | −0.16 (0.04) | −0.16 (0.04) | −0.13* (0.04) | −0.18* (0.04) | −0.18 (0.04) | |
| KX |
| 0.00 (0.04) | 0.03 (0.04) | 0.01 (0.04) | 0.01 (0.04) | 0.03 (0.04) | 0.03 (0.04) | −0.01 (0.04) | 0.01 (0.04) | |
| Lwt | HF | − | −0.03 (0.06) | −0.01 (0.06) | −0.03 (0.06) | −0.03 (0.06) | −0.01 (0.06) | 0.00 (0.06) | −0.05 (0.06) | −0.03 (0.06) |
| J | − | −0.21 (0.04) | −0.18* (0.05) | −0.20* (0.05) | −0.20 (0.04) | −0.19 (0.05) | −0.15* (0.05) | −0.21 (0.04) | −0.21 (0.04) | |
| KX |
| −0.01 (0.04) | 0.02 (0.04) | 0.01 (0.04) | 0.00 (0.04) | 0.03 (0.04) | 0.06 (0.04) | −0.02 (0.04) | 0.01 (0.04) | |
| SCS | HF | − | −0.04 (0.08) | −0.04 (0.08) | −0.05 (0.08) | −0.04 (0.08) | −0.05 (0.08) | −0.04 (0.08) | −0.08 (0.08) | −0.08 (0.08) |
| J | − | −0.22 (0.07) | −0.18* (0.07) | −0.23 (0.07) | −0.22 (0.07) | −0.18* (0.07) | −0.22 (0.07) | −0.26* (0.07) | −0.21 (0.07) | |
| KX | − | −0.20 (0.07) | −0.17 (0.07) | −0.18 (0.07) | −0.20 (0.07) | −0.17 (0.07) | −0.18 (0.07) | −0.23* (0.06) | −0.20 (0.06) | |
* Significantly different bias than the BayesA SNP model (italics) for that breed and trait (P < 0.05)
aTrait: Fat = Milk fat yield; Lwt = liveweight; SCS = somatic cell score
bBreed: HF = predominantly Holstein–Friesian; J = predominantly Jersey; KX = admixed KiwiCross (HF/J)
cHap125 = haplotypes of length 125 kb, fitting only haplotype alleles >10% frequency in training data set
dHap250 = haplotypes of length 250 kb, fitting only haplotype alleles >1% frequency in training data set
Number of random covariates (windows) and computation time for each model
| Modela | Number of random effectsb | Computation time (h)d | |||||
|---|---|---|---|---|---|---|---|
| Fatc | Lwtc | SCSc | Fatc | Lwtc | SCSc | ||
| BayesA | SNP | 37,226 | 37,356 | 37,229 | 13.1 | 6.6 | 13.0 |
| Hap125e | 37,848 | 37,775 | 37,898 | 15.2 | 7.8 | 15.1 | |
| Hap250f | 64,724 | 64,634 | 64,730 | 23.5 | 13.1 | 24.4 | |
| BayesB | SNP | 18,589 | 18,637 | 18,629 | 10.0 | 5.1 | 9.9 |
| Hap125e | 18,899 | 18,831 | 18,954 | 13.6 | 6.2 | 13.9 | |
| Hap250f | 32,332 | 32,273 | 32,388 | 18.1 | 9.2 | 18.0 | |
| BayesN | SNP | 17,748 (4701) | 17,639 (4671) | 18,254 (4805) | 26.7 | 12.5 | 25.6 |
| Hap125e | 18,451 (8264) | 18,303 (8223) | 18,711 (8344) | 30.2 | 16.0 | 30.0 | |
| Hap250f | 31,596 (4737) | 31,281 (4706) | 32,103 (4809) | 37.6 | 18.9 | 38.1 | |
aSNP = SNP model with 250 kb windows
bAverage number of SNPs or haplotype alleles fitted in each chain of the MCMC
cFat = Milk fat yield; Lwt = liveweight; SCS = somatic cell score
dComputation time for running the analysis on the training set containing all breeds with a chain length of 41,000
eHap125 = Haplotypes of length 125 kb, fitting only haplotype alleles >10% frequency in training data set
fHap250 = Haplotypes of length 250 kb, fitting only haplotype alleles >1% frequency in training data set
Rankings from the BayesA 250-kb haplotype model compared to the BayesA SNP model
| Trait | Breed | rS (All)a | rS (Top 100)b | Top 0.9%c |
|---|---|---|---|---|
| Fat | HF | 0.97 | 0.70 | 23/30 |
| J | 0.97 | 0.68 | 41/53 | |
| KX | 0.96 | 0.55 | 36/55 | |
| Lwt | HF | 0.96 | 0.57 | 10/13 |
| J | 0.95 | 0.68 | 12/21 | |
| KX | 0.96 | 0.70 | 17/22 | |
| SCS | HF | 0.96 | 0.58 | 21/30 |
| J | 0.97 | 0.64 | 42/53 | |
| KX | 0.96 | 0.49 | 36/55 |
aSpearman rank correlation for all cows
bSpearman rank correlation for the joint set of cows that are in the top 100 cows for DGV from the SNP model or the top 100 cows for DGV from the haplotype model
cNumber of animals with DGV in the top 0.9% for both the SNP model and haplotype model over the number of animals that are in the top 0.9% for the SNP model