| Literature DB >> 23251454 |
Ge Zhang1, Rebekah Karns, Guangyun Sun, Subba Rao Indugula, Hong Cheng, Dubravka Havas-Augustin, Natalija Novokmet, Zijad Durakovic, Sasa Missoni, Ranajit Chakraborty, Pavao Rudan, Ranjan Deka.
Abstract
Genome-wide association studies (GWAS) have identified many common variants associated with complex traits in human populations. Thus far, most reported variants have relatively small effects and explain only a small proportion of phenotypic variance, leading to the issues of 'missing' heritability and its explanation. Using height as an example, we examined two possible sources of missing heritability: first, variants with smaller effects whose associations with height failed to reach genome-wide significance and second, allelic heterogeneity due to the effects of multiple variants at a single locus. Using a novel analytical approach we examined allelic heterogeneity of height-associated loci selected from SNPs of different significance levels based on the summary data of the GIANT (stage 1) studies. In a sample of 1,304 individuals collected from an island population of the Adriatic coast of Croatia, we assessed the extent of height variance explained by incorporating the effects of less significant height loci and multiple effective SNPs at the same loci. Our results indicate that approximately half of the 118 loci that achieved stringent genome-wide significance (p-value<5×10(-8)) showed evidence of allelic heterogeneity. Additionally, including less significant loci (i.e., p-value<5×10(-4)) and accounting for effects of allelic heterogeneity substantially improved the variance explained in height.Entities:
Mesh:
Year: 2012 PMID: 23251454 PMCID: PMC3521016 DOI: 10.1371/journal.pone.0051211
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Numbers of significant loci and conditional signals.
| Significance level | Length of context (kb) | Number of significant loci (primary signal) | Number (%) of significant loci with secondary signal | Number (%) of significant loci with tertiary signal |
| 5.E-08 | 500 | 118 | 60 (50.8%) | 26 (22.0%) |
| 5.E-07 | 400 | 151 | 70 (46.4%) | 31 (20.5%) |
| 5.E-06 | 300 | 217 | 75 (34.6%) | 31 (14.3%) |
| 5.E-05 | 200 | 354 | 87 (24.6%) | 34 (9.6%) |
| 5.E-04 | 100 | 781 | 92 (11.8%) | 24 (3.1%) |
| 5.E-03 | 50 | 2668 | 48 (1.8%) | 10 (0.4%) |
Lowering the significance level substantially increased the number (or the density) of significant SNPs used in clustering height loci. Therefore, shorter context lengths were arbitrarily selected in defining “physical adjacency” when relaxed significance levels were used, which might artificially reduce the length of significant loci and hence the chance of allelic heterogeneity in these loci clustered at lower significance level.
Figure 1Correlation between reported and estimated effect sizes of the 180 primary height SNPs (A) and the 19 secondary SNPs (B) reported by Lango Allen et al.
* The reference study did not report the effect sizes of the secondary signals. Here we used the values converted from the reported p-values based on conditional analyses in a subset of Stage 1 GIANT studies (Table 1 of Lango Allen et al).
Figure 2Two example loci with allelic heterogeneity.
(A) The GHSR locus included a secondary signal (rs7652177) after accounting for the primary signal (rs572169). (B) The HMGA1 locus had a more complicated pattern of allelic heterogeneity; with significant secondary, tertiary and quaternary signals after multiple rounds of conditioning (only the first round of conditioning is shown). The secondary p-values (bottom plots) conditioning on the primary SNP were estimated from GIANT summary data using the analytical approach described in main text.
Fraction of height variance explained.
| Significance level | Number of SNPs | All (N = 1304) | Female (N = 739) | Male (N = 565) | ||||||||
| (1st) | (+2nd) | (+3rd) | (1st) | (+2nd) | (+3rd) | (1st) | (+2nd) | (+3rd) | (1st) | (+2nd) | (+3rd) | |
| 5.E-08 | 118 | 178 | 204 | 0.066 | 0.085 | 0.092 | 0.081 | 0.095 | 0.107 | 0.051 | 0.075 | 0.078 |
| 5.E-07 | 151 | 221 | 252 | 0.071 | 0.091 | 0.091 | 0.084 | 0.102 | 0.100 | 0.059 | 0.081 | 0.082 |
| 5.E-06 | 217 | 292 | 323 | 0.077 | 0.096 | 0.101 | 0.104 | 0.117 | 0.120 | 0.052 | 0.076 | 0.083 |
| 5.E-05 | 354 | 441 | 475 | 0.072 | 0.095 | 0.105 | 0.092 | 0.112 | 0.118 | 0.053 | 0.078 | 0.092 |
| 5.E-04 | 781 | 873 | 897 | 0.113 | 0.132 |
| 0.144 | 0.156 |
| 0.084 | 0.109 |
|
| 5.E-03 | 2668 | 2716 | 2726 | 0.116 | 0.123 | 0.127 | 0.145 | 0.153 | 0.156 | 0.088 | 0.095 | 0.099 |
The number of SNPs used in constructing the genetic score. (1st): primary SNPs only; (+2nd): primary+secondary SNPs; and (+3rd) primary+secondary+tertiary SNPs.
The highest level of variance explained was achieved by including less significant SNPs plus significant secondary and tertiary SNPs.
Figure 3Additional fraction of variance explained could be obtained by including less significant SNPs and secondary/tertiary SNPs.