| Literature DB >> 27905878 |
Roel F Veerkamp1,2, Aniek C Bouwman3, Chris Schrooten4, Mario P L Calus3.
Abstract
BACKGROUND: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data.Entities:
Mesh:
Year: 2016 PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Number of variants in each of the subsets of variants selected from the SNP panels and selection criteria
| Trait | Selection criteria | Imputed sequence (ISQ) | HD | 50k |
|---|---|---|---|---|
| All variants | 13,789,029 | 656,044 | 49,580 | |
| PY | −log10(p) > 3 | 24,387 | 1238 | 120 |
| −log10(p) > 5 | 2,194 | 159 | 27 | |
| SCS | −log10(p) > 3 | 23,346 | 1203 | 98 |
| −log10(p) > 5 | 1539 | 90 | 7 | |
| IFL | −log10(p) > 3 | 22,833 | 987 | 61 |
| −log10(p) > 5 | 853 | 27 | 4 |
Number of variants in each of the subsets of variants selected using COJO, and number of variants in linkage disequilibrium (LD) with the selected variants, which were ignored in the GRMc
| Trait | Selection criteria | Number of variants in subset of selected variants | Number of variants in LD with selected variants |
|---|---|---|---|
| PY | COJO3 | 90 | 1,650,152 |
| COJO5 | 64 | 1,154,416 | |
| COJO5LD | 35 | 615,586 | |
| COJO#100 | 100 | 1,688,270 | |
| SCS | COJO3 | 195 | 3,241,932 |
| COJO5 | 215 | 3,449,212 | |
| COJO5LD | 42 | 757,095 | |
| COJO#100 | 100 | 1,652,678 | |
| IFL | COJO3 | 264 | 3,835,730 |
| COJO5 | 209 | 3,151,538 | |
| COJO5LD | 35 | 607,631 | |
| COJO#100 | 100 | 1,675,727 |
Fig. 1Manhattan plot for protein yield (PY) using Bovine 50k (a), BovineHD (b) and ISQ data and variants after selection (COJO5LD in green) (c). Significance of variants effects (−log10(p)) based on the GCTA single variant analyses for protein yield (PY) using Bovine 50k (a), BovineHD (b), and full sequence data (ISQ) and the variants selected after the COJO5LD analysis (green) (c)
Fig. 2Manhattan plot for somatic cell score (SCS) using Bovine 50k (a), BovineHD (b) and ISQ data and variants after selection (COJO5LD in green) (c). Significance of variants effects (−log10(p)) based on the GCTA single variant analyses for somatic cell score (SCS) using Bovine 50k (a), BovineHD (b), and full sequence data (ISQ) and the variants selected after the COJO5LD analysis (green) (c)
Fig. 3Manhattan plot for interval first last insemination (IFL) using Bovine 50k (a), BovineHD (b) and ISQ data and variants after selection (COJO5LD in green) (c). Significance of variants effects (−log10(p)) based on the GCTA single variant analyses for interval between first and last lactation using Bovine 50k (a), BovineHD (b), and full sequence data (ISQ) and the variants selected after the COJO5LD analysis (green) (c)
Phenotypic variance (h2) explained in 2287 validation animals fitting GRM based on the selected set of variants for protein yield (PY), somatic cell score (SCS) and interval first–last insemination (IFL)
| Trait | Selection criteria | ISQ | HD | 50k |
|---|---|---|---|---|
| Selected set of variants ( | ||||
| PY | All variants | 0.83 | 0.82 | 0.81 |
| −log10(p) > 3 | 0.53a | 0.40 | 0.22 | |
| −log10(p) > 5 | 0.60a | 0.43a | 0.22a | |
| COJO3 | 0.21 | |||
| COJO5 | 0.19 | |||
| COJO5LD | 0.19 | |||
| COJO#100 | 0.23 | |||
| SCS | All variants | 0.87 | 0.84 | 0.83 |
| −log10(p) > 3 | 0.57a | 0.45a | 0.19 | |
| −log10(p) > 5 | 0.72a | 0.25a | 0.03a | |
| COJO3 | 0.31 | |||
| COJO5 | 0.31 | |||
| COJO5LD | 0.16 | |||
| COJO#100 | 0.22 | |||
| IFL | All variants | 0.72 | 0.70 | 0.69 |
| −log10(p) > 3 | 0.51a | 0.32 | 0.14 | |
| −log10(p) > 5 | 0.50a | 0.15a | 0.03 | |
| COJO3 | 0.25 | |||
| COJO5 | 0.23 | |||
| COJO5LD | 0.13 | |||
| COJO#100 | 0.17 | |||
ISQ are all imputed sequence variants, and HD and 50k are the SNPs on the common HD and 50k panels. Variants were selected using GWAS results on 3469 discovery animals
aInflated phenotypic variance (see Additional file 5: Table S1)
Phenotypic variance explained (h2) in 2287 validation animals for protein yield (PY), somatic cell score (SCS) and interval first –last insemination (IFL)
| Trait | Selection criteria | ISQ | HD | 50k | ISQ | HD | 50k | ISQ | HD | 50k |
|---|---|---|---|---|---|---|---|---|---|---|
| Selected set of variants | Complementary set of variants ( | Sum of | ||||||||
| PY | All variants | 0.83 | 0.98 | 0.70 | 0.00 | 0.12 | 0.83 | 0.98 | 0.82 | |
| −log10(p) > 3 | 0.19 | 0.15 | 0.09 | 0.61 | 0.65 | 0.73 | 0.80 | 0.80 | 0.82 | |
| −log10(p) > 5 | 0.10 | 0.04 | 0.03 | 0.74 | 0.79 | 0.80 | 0.84 | 0.83 | 0.83 | |
| COJO3 | 0.11 | 0.70 | 0.81 | |||||||
| COJO5 | 0.10 | 0.71 | 0.81 | |||||||
| COJO5LD | 0.08 | 0.73 | 0.82 | |||||||
| COJO#100 | 0.09 | 0.72 | 0.82 | |||||||
| SCS | All variants | 0.87 | 0.87 | 0.48 | 0.00 | 0.38 | 0.87 | 0.87 | 0.86 | |
| −log10(p) > 3 | 0.22 | 0.15 | 0.05 | 0.60 | 0.68 | 0.80 | 0.82 | 0.83 | 0.85 | |
| −log10(p) > 5 | 0.24 | 0.03 | 0.01 | 0.64 | 0.84 | 0.85 | 0.88 | 0.87 | 0.86 | |
| COJO3 | 0.14 | 0.68 | 0.81 | |||||||
| COJO5 | 0.14 | 0.66 | 0.80 | |||||||
| COJO5LD | 0.05 | 0.79 | 0.84 | |||||||
| COJO#100 | 0.08 | 0.76 | 0.83 | |||||||
| IFL | All variants | 0.72 | 0.85 | 0.50 | 0.00 | 0.21 | 0.72 | 0.85 | 0.70 | |
| −log10(p) > 3 | 0.20 | 0.12 | 0.05 | 0.51 | 0.57 | 0.64 | 0.70 | 0.69 | 0.69 | |
| −log10(p) > 5 | 0.11 | 0.03 | 0.03 | 0.63 | 0.69 | 0.70 | 0.73 | 0.72 | 0.72 | |
| COJO3 | 0.11 | 0.57 | 0.67 | |||||||
| COJO5 | 0.12 | 0.57 | 0.69 | |||||||
| COJO5LD | 0.07 | 0.64 | 0.71 | |||||||
| COJO#100 | 0.08 | 0.64 | 0.71 | |||||||
COJO analysis with −log10(p) > 3 (COJO3) or −log10(p) > 5 (COJO5); ISQ are all imputed sequence variants, HD and 50k are the SNPs on the common HD and 50k panels
Variances are estimated fitting and together in one model where were based on the selected set of variants and on the complementary variants. Set of variants that were selected using GWAS results on 3469 discovery animals
Prediction accuracy in 2287 validation animals, and the intercept and slope for the regression of phenotype on the breeding value estimated using GRM with different selected sets of variants for protein yield (PY), somatic cell score (SCS) and interval first–last insemination (IFL)
| Trait | Selection criteria | ISQ | HD | 50k | ISQ | HD | 50k | ISQ | HD | 50k |
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Intercept | Slope | ||||||||
| PY | All variants | 0.68 | 0.68 | 0.68 | −0.6 | −0.6 | −0.7 | 0.90 | 0.90 | 0.89 |
| −log10(p) > 3 | 0.58 | 0.56 | 0.42 | 2.3 | 3.6 | 6.9 | 0.73 | 0.65 | 0.57 | |
| −log10(p) > 5 | 0.39 | 0.30 | 0.28 | 7.2 | 8.7 | 9.4 | 0.54 | 0.51 | 0.71 | |
| COJO3 | 0.40 | 7.2 | 0.45 | |||||||
| COJO5 | 0.38 | 7.7 | 0.41 | |||||||
| COJO5LD | 0.33 | 8.1 | 0.51 | |||||||
| COJO#100 | 0.34 | 7.2 | 0.47 | |||||||
| SCS | All variants | 0.70 | 0.71 | 0.70 | 100 | 100 | 100 | 1.02 | 1.03 | 1.03 |
| −log10(p) > 3 | 0.63 | 0.55 | 0.36 | 100 | 100 | 101 | 0.82 | 0.79 | 0.70 | |
| −log10(p) > 5 | 0.40 | 0.22 | 0.11 | 100 | 101 | 101 | 0.79 | 0.66 | 0.57 | |
| COJO3 | 0.48 | 100 | 0.64 | |||||||
| COJO5 | 0.48 | 100 | 0.60 | |||||||
| COJO5LD | 0.35 | 100 | 0.68 | |||||||
| COJO#100 | 0.39 | 100 | 0.66 | |||||||
| IFL | All variants | 0.60 | 0.60 | 0.60 | 99 | 99 | 99 | 0.88 | 0.87 | 0.86 |
| −log10(p) > 3 | 0.51 | 0.45 | 0.31 | 99 | 99 | 99 | 0.70 | 0.62 | 0.52 | |
| −log10(p) > 5 | 0.27 | 0.16 | 0.10 | 99 | 98 | 98 | 0.47 | 0.51 | 0.77 | |
| COJO3 | 0.38 | 99 | 0.45 | |||||||
| COJO5 | 0.35 | 99 | 0.41 | |||||||
| COJO5LD | 0.30 | 99 | 0.52 | |||||||
| COJO#100 | 0.32 | 99 | 0.45 | |||||||
ISQ are all imputed sequence variants, HD and 50k are the SNPs on the common HD and 50k panels. Set of variants that were selected using GWAS and subsequently trained in 3469 discovery animals
Prediction accuracy in 2287 validation animals, and the intercept and slope for the regression of phenotype on the breeding value estimated using the effects of variants from GRM and GRMc fitted together for different selected sets of variants for protein yield (PY), somatic cell score (SCS) and interval first –last insemination (IFL)
| Trait | Selection criteria | ISQ | HD | 50k | ISQ | HD | 50k | ISQ | HD | 50k |
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Intercept | Slope | ||||||||
| PY | All variants | 0.68 | 0.68 | 0.68 | −0.6 | −0.6 | −0.7 | 0.90 | 0.90 | 0.90 |
| −log10(p) > 3 | 0.64 | 0.65 | 0.67 | 1.0 | 1.1 | 0.4 | 0.79 | 0.75 | 0.83 | |
| −log10(p) > 5 | 0.67 | 0.69 | 0.69 | 0.2 | −0.5 | −0.6 | 0.84 | 0.90 | 0.91 | |
| COJO3 | 0.64 | 1.4 | 0.72 | |||||||
| COJO5 | 0.64 | 1.1 | 0.75 | |||||||
| COJO5LD | 0.67 | 0.2 | 0.84 | |||||||
| COJO#100 | 0.63 | 1.2 | 0.79 | |||||||
| SCS | All variants | 0.70 | 0.71 | 0.70 | 100 | 100 | 100 | 1.02 | 1.03 | 1.02 |
| −log10(p) > 3 | 0.67 | 0.66 | 0.66 | 100 | 100 | 100 | 0.85 | 0.83 | 0.87 | |
| −log10(p) > 5 | 0.67 | 0.69 | 0.69 | 100 | 100 | 100 | 0.93 | 0.99 | 1.00 | |
| COJO3 | 0.63 | 100 | 0.76 | |||||||
| COJO5 | 0.62 | 100 | 0.73 | |||||||
| COJO5LD | 0.65 | 100 | 0.88 | |||||||
| COJO#100 | 0.62 | 100 | 0.83 | |||||||
| IFL | All variants | 0.60 | 0.60 | 0.60 | 99 | 99 | 99 | 0.88 | 0.87 | 0.87 |
| −log10(p) > 3 | 0.55 | 0.56 | 0.58 | 99 | 99 | 99 | 0.73 | 0.71 | 0.80 | |
| −log10(p) > 5 | 0.58 | 0.61 | 0.60 | 99 | 99 | 99 | 0.80 | 0.88 | 0.88 | |
| COJO3 | 0.51 | 99 | 0.60 | |||||||
| COJO5 | 0.51 | 99 | 0.59 | |||||||
| COJO5LD | 0.57 | 99 | 0.76 | |||||||
| COJO#100 | 0.54 | 99 | 0.70 | |||||||
ISQ are all imputed sequence variants, HD and 50k are the SNPs on the common HD and 50k panels. Set of variants that were selected using GWAS and subsequently trained in 3469 discovery animals