| Literature DB >> 26642058 |
Oscar Gonzalez-Recio1,2, Hans D Daetwyler1,2,3, Iona M MacLeod1,2,4, Jennie E Pryce1,2,3, Phil J Bowman1,2, Ben J Hayes1,2,3, Michael E Goddard1,4.
Abstract
The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of "missing heritability"--the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF>0.05), 102,549 uncommon variants (0.01<MAF<0.05), and 83,856 rare variants (MAF<0.01). We describe a novel method for estimating the proportion of the rare variants that are sequencing errors using parent-progeny duos. We then used mixed model methodology to estimate the proportion of variance captured by these different classes of variants for fat, milk and protein yields, as well as for fertility. Common sequence variants captured 83%, 77%, 76% and 84% of the total genetic variance for fat, milk, and protein yields and fertility, respectively. This was between 2 and 5% more variance than that captured from 600k SNPs on a high density chip, although the difference was not significant. Rare variants captured 3%, 0%, 1% and 14% of the genetic variance for fat, milk and protein yields, and fertility respectively, whereas pedigree explained the remaining amount of genetic variance (none for fertility). The proportion of variation explained by rare variants is likely to be under-estimated due to reduced accuracies of imputation for this class of variants. Using common sequence variants slightly improved accuracy of genomic predictions for fat and milk yield, compared to high density SNP array genotypes. However, including rare variants from transcript regions did not increase the accuracy of genomic predictions. These results suggest that rare variants recover a small percentage of the missing heritability for complex traits, however very large reference sets will be required to exploit this to improve the accuracy of genomic predictions. Our results do suggest the contribution of rare variants to genetic variation may be greater for fitness traits.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26642058 PMCID: PMC4671594 DOI: 10.1371/journal.pone.0143945
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Observed vs expected less common alleles transmitted in 38 sire-son duos(left) and confidence interval of unrelated duos at different MAF (right) in the Bos Taurus autosome 1 (BTA1).
Almost 35% of sequence variants had MAF<0.01 (green dashed line in left plot), however 50% of these variants were not observed in the expected proportions in the parent offspring duos (red solid line in left plot). The proportion of transmitted alleles to the progeny was modeled according to the MAF (right plot). Green lines represent each of the duos, and the solid green line is the local weighted regression for the 38 duos. Red shadow represent the confident interval for the same regression when 10 pairs of unrelated animals were evaluated, with the red solid line being the local weighted regression. The dashed black line represents the expected theoretical proportion of transmission for the less frequent allele from sire to son under random mating. At MAF<0.10 we observed that observed proportions of transmission deviated from the theory, which implies that up to 50% of the uncommon and rare variants (at MAF<0.01) are sequencing errors.
Genetic variance estimates from genomic markers () or pedigree () and their respective standard error (s.e.) for milk, fat and protein yield, and fertility captured from GBLUP models using 3311 Holstein sires.
Narrow sense heritability from pedigree is provided, and the proportion of missing heritability from markers was calculated as .
| Trait | BLUP-PED | GBLUP-HD | GBLUP-Seq | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
|
| 175 | 7.2 | 0.34 | 143 | 6.7 | 18% | 146 | 8.6 | 17% |
|
| 175,355 | 6,867 | 0.34 | 152,554 | 7449 | 13% | 157,947 | 10,212 | 10% |
|
| 127 | 4.7 | 0.53 | 106 | 5.4 | 17% | 108 | 7.2 | 15% |
|
| 65 | 4.1 | 0.20 | 61 | 3.8 | 6% | 64 | 4.5 | 2% |
1BLUP model with numerator relationship matrix from pedigree used as a genetic relationship matrix.
2GBLUP model with genomic relationship matrix built using 632,003 SNP genotypes;
3GBLUP model with genomic relationship matrix constructed using 675,062 SNPs pruned for LD and MAF>0.05.
Level of statistical significance of the log-likelihood tests from GBLUP models incorporating different sources of genetic information against GBLUP model incorporating only common variants .
| Common variants and pedigree | Common variants and rare variants | Common variants, pedigree and uncommon variants | Common variants, pedigree and rare variants | |
|---|---|---|---|---|
|
|
|
| N.S. |
|
|
|
|
| N.S. | N.S. |
|
|
|
| N.S. | N.S. |
|
| N.S. |
| N.S. |
|
1GBLUP-Seq model: genomic relationship matrix constructed using 675,062 SNPs pruned for LD<0.999 and MAF>0.05;
2as (1) plus the polygenic effect
3as (1) with an additional random effect with genomic relationship matrix constructed from 83,856 variants with MAF<0.01 detected in 429 sequenced animals;
4as (2) with an additional random effect with genomic relationship matrix constructed from variants with 0.01
5as (2) with an additional random effect with genomic relationship matrix constructed from 83,856 variants with MAF<0.01 detected in 429 sequenced animals;
**P<0.005
*P<0.025
†P<0.05
(b)Statistical test against model 2.
Posterior mean estimates (standard errors within brackets) for proportion of genetic variance for milk, fat and protein yield, and fertility captured from the GBLUP model fitting jointly: pedigree and common and rare variants, using 3311 Holstein sires.
| Pedigree | Common variants | Uncommon variants | Rare variants | Total additive variance | |
|---|---|---|---|---|---|
|
| 14% (2) | 83% (10) | 0% (0) | 3% (1) | 147 |
|
| 23% (0.1) | 77% (0.1) | 0% (0) | 0% (0) | 157919 |
|
| 23% (5) | 76% (14) | 0% (0) | 1% (1) | 108 |
|
| 2% (0.4) | 84% (16) | 0% (0) | 14% (5) | 63.9 |
Pearson correlation (cor), slope coefficient for the linear regression and mean square error (MSE) between observed and predicted daughter yield deviation for fat, milk and protein yield, and fertility from different GBLUP models.
Training set consisted of 2832 animals and there were 465 animals in the validation set.
| Trait | BLUP | GBLUP-HD | GBLUP-Seq | GBLUP-Seq-Uncommon | GBLUP-Seq-RV-SET | GBLUP-Seq-RVvalidated | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0.46 | 0.79 | 0.98 | 0.56 | 0.96 | 0.83 | 0.57 | 0.94 | 0.82 | 0.57 | 0.94 | 0.82 | 0.57 | 0.95 | 0.82 | 0.57 | 0.95 | 0.82 |
|
| 0.52 | 0.86 | 0.81 | 0.61 | 0.92 | 0.65 | 0.63 | 0.93 | 0.63 | 0.63 | 0.93 | 0.63 | 0.63 | 0.93 | 0.63 | 0.63 | 0.93 | 0.63 |
|
| 0.57 | 0.89 | 0.77 | 0.65 | 0.98 | 0.64 | 0.65 | 0.97 | 0.64 | 0.65 | 0.97 | 0.64 | 0.65 | 0.97 | 0.64 | 0.65 | 0.97 | 0.64 |
|
| 0.36 | 0.87 | 3.02 | 0.43 | 1.13 | 2.80 | 0.42 | 1.12 | 2.80 | 0.42 | 1.12 | 2.80 | 0.43 | 1.17 | 2.80 | 0.43 | 1.17 | 2.80 |
1 BLUP model with pedigree numerator relationship matrix;
2 G-BLUP model with genomic relationship matrix built using 632,003 SNP genotypes;
3 G-BLUP model with genomic relationship matrix constructed using 675,062 SNPs pruned for LD and MAF>0.05;
4 G-BLUP model with genomic relationship matrix for common variants constructed using 675,062 SNPs pruned for LD and MAF>0.05 and a genomic relationship matrix constructed from variants with 0.01
5 G-BLUP model with genomic relationship matrix for common variants constructed using 675,062 SNPs pruned for LD and MAF>0.05 and genomic relationship matrix for rare variants constructed from 83,856 variants with MAF<0.01 detected in 429 sequenced animals;
6 G-BLUP model with genomic relationship matrix for common variants constructed using 985,757 SNPs pruned for LD and MAF>0.05 and genomic relationship matrix for rare variants constructed from 20,648 confirmed rare variants detected in 38 sire-son duos;
7 MSE is expressed as units of additive genetic standard deviations. All models included a polygenic effect.
Fig 2Number of variants detected by number of parent-offspring duos.
Boxplot for the occurrence of variants with MAF<0.01 detected from 38 sire-son duos (blue). The boxplots in red show how many of them were present in transcript regions. Each boxplot is constructed from 50 replicates of random samples of a given number of duos (from 1 to 38). Solid lines are the corresponding quadratic regression for the number of rare variants discovered in the Australian Holstein population according the number of duos used. The regression equation for the total number of rare variants (y) according to the number of duos (x) was y = 108432+80557x-926x2. The regression equation equivalent for the number of rare variants in transcript regions was y = 653+775x-6.7x2. This means that we would need 44 duos for detecting most of the rare variants along the genome, which number is projected to be 1,860,826. Among these 23,000 are expected to be present in transcript regions, and 58 parent offspring duos would be necessary to detect them.