| Literature DB >> 22126751 |
Xu Wang1, Xuanyao Liu, Xueling Sim, Haiyan Xu, Chiea-Chuen Khor, Rick Twee-Hee Ong, Wan-Ting Tay, Chen Suo, Wan-Ting Poh, Daniel Peng-Keat Ng, Jianjun Liu, Tin Aung, Kee-Seng Chia, Tien-Yin Wong, E-Shyong Tai, Yik-Ying Teo.
Abstract
Genome-wide association studies (GWAS) have become the preferred experimental design in exploring the genetic etiology of complex human traits and diseases. Standard SNP-based meta-analytic approaches have been utilized to integrate the results from multiple experiments. This fundamentally assumes that the patterns of linkage disequilibrium (LD) between the underlying causal variants and the directly genotyped SNPs are similar across the populations for the same SNPs to emerge with surrogate evidence of disease association. We introduce a novel strategy for assessing regional evidence of phenotypic association that explicitly incorporates the extent of LD in the region. This provides a natural framework for combining evidence from multi-ethnic studies of both dichotomous and quantitative traits that (i) accommodates different patterns of LD, (ii) integrates different genotyping platforms and (iii) allows for the presence of allelic heterogeneity between the populations. Our method can also be generalized to perform gene-based or pathway-based analyses. Applying this method on real GWAS data in type 2 diabetes (T2D) boosted the association evidence in regions well-established for T2D etiology in three diverse South-East Asian populations, as well as identified two novel gene regions and a biologically convincing pathway that are subsequently validated with data from the Wellcome Trust Case Control Consortium.Entities:
Mesh:
Year: 2011 PMID: 22126751 PMCID: PMC3306862 DOI: 10.1038/ejhg.2011.219
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Figure 1Illustration of the three scenarios in a meta-analysis, where the genotyped SNPs may be in different degree of LD with the unobserved causal variant (star): (i) the ideal situation where the same SNPs are genotyped in two studies, and the LD between them and the functional variant is identical in both populations (black arrow); (ii) a realistic situation where the same markers are genotyped in two studies, but different LD patterns exist between them and the functional variant (green arrow); (iii) realistic situation where different markers are genotyped in two studies, and cannot be meta-analysed without resorting to imputation (pink arrow). The LD between the causal variant and each SNP is represented in different color intensity ranging from white (low LD) to red (high LD).
Figure 2Power comparisons of the different methods for the meta-analysis across all three Hapmap populations. Simulations were performed with HAPGEN (Wellcome Trust Centre for Human Genetics, Oxford, UK) assuming a causal variant that was present in all HapMap phase 2 panels with a multiplicative allelic relative risk of 1.5. The case–control genotype data were subsequently thinned to the SNP content of Affymetrix 500K (CEU simulations), Illumina 1M (JPT+CHB simulations) and Affymetrix 6.0 (Santa Clara, CA, USA) (YRI simulations). We calculate the power when only the genotyped SNPs were considered (green triangles), and when we performed region-based analyses of 100 kb regions in each of the three populations (red circles). Imputation was performed with population-specific haplotypes to recover the SNPs removed from the thinning (except for the causal SNP), and a SNP-based analysis was performed on this denser set of imputed and genotyped SNPs (blue diamonds). The SNP-based meta-analyses considered either the genotyped SNPs present across all three platforms only (green triangles) or across the denser set of imputed and genotyped SNPs common to all three populations (blue diamonds). The region-based meta-analysis was performed without restriction (red circles), and with the restriction that at least two populations display region-based P-value <0.001 (red open circles).
Figure 3Power comparisons of the different methods for meta-analysis in the presence of allelic heterogeneity. A different causal variant was selected in CEU and JPT+CHB, respectively, while either of the two causal variants was equally likely to be present in the YRI simulations. The two causal variants are located at least 20 kb away but are not >50 kb apart, and have minor allele frequencies of at least 10% in all three HapMap populations. The case–control genotype data simulated from HAPGEN were subsequently thinned to the SNP content of Affymetrix 500K (CEU), Illumina 1M (JPT+CHB) and Affymetrix 6.0 (YRI). We calculated the power when only the CEU and JPT+CHB populations were combined (top row), and when all three HapMap panels were combined (bottom row), investigating the performance of the meta-analysis across the SNPs on all three arrays (green triangles), and for the region-based meta-analysis considering 250 kb regions (red circles). Imputation was performed with population-specific haplotypes to recover the SNPs removed from the thinning, and a SNP-based meta-analysis was performed on this denser set of imputed and genotyped SNPs common to all three populations (blue diamonds). We binned the 3000 pairs of causal variants according to the LD between the two SNPs into four groups: (i) 0≤r2≤0.1; (ii) 0.1
Results of the region-based meta-analysis for type 2 diabetes
| P | P | P | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 168 408 674 | 168 858 674 | 11.2 | 90 | 1.60 × 10 | 168 758 674 | 169 208 674 | 2.2 | 21 | 1.47 × 10 | STK39 | ||||
| (168 458 674) | (168 708 674) | (168 808 674) | (169 058 674) | ||||||||||||
| I | 0 | 24 | 1 | ||||||||||||
| 3 | 21 736 044 | 22 136 044 | 13.6 | 221 | 2.55 × 10−7 | 21 186 044 | 21 636 044 | 4.5 | 25 | 6.06 × 10−5 | ZNF659 | ||||
| (21 786 044) | (22 036 044) | (21 286 044) | (21 536 044) | ||||||||||||
| I | 0 | 60 | 1 | ||||||||||||
| 6 | 20 594 609 | 20 894 609 | 10.1 | 107 | 9.68 × 10−7 | 20 494 609 | 21 044 609 | 7.3 | 20 | 5.03 × 10−10 | CDKAL1 | ||||
| (20 594 609) | (20 844 609) | M | 0 | 28 | 1 | (20 594 609) | (20 844 609) | ||||||||
| 14 | 51 355 752 | 51 755 572 | C | 3 | 67 | 2.91 × 10−2 | 19.7 | 146 | 2.92 × 10−16 | 50 955 752 | 51 405 752 | 2.5 | 30 | 1.99 × 10−2 | GNG2, NID2 |
| (51 455 752) | (51 705 752) | (51 105 752) | (51 355 752) | ||||||||||||
| 20 | 56 559 795 | 56 859 795 | 11.1 | 173 | 1.56 × 10−6 | 56 609 795 | 56 859 795 | 0.9 | 25 | 0.294 | STX16, NPEPL1 | ||||
| (56 609 795) | (56 859 795) | M | 0 | 46 | 1 | ||||||||||
Genomic regions identified by the region-based analysis, with the discovery mechanism based on three genome-wide association studies conducted in Chinese, Malays and Asian Indians in Singapore. Validation of the regions that emerged was performed on the type 2 diabetes case–control study from Phase 1 of the Wellcome Trust Case–Control Consortium (WTCCC).
The start and end positions of the genomic region containing consecutive windows with P<0.001 in at least two of the populations (in bold). The start and end positions of the top 250 kb window are shown in brackets. Subsequent columns show the evidence for the discovery populations in the top window.
The three discovery populations abbreviated: C, SP2 Chinese; M, SiMES Malays; I, SINDI Indians.
Effective number of independent SNPs with P<0.01 after accounting for LD.
Effective number of independent SNPs across the region after accounting for LD.
The start and end positions of the genomic region containing consecutive windows with evidence of validation (defined as P<0.05), with the start and end positions of the top 250 kb being shown in brackets. Subsequent columns show the evidence for WTCCC1 in the top window. For regions without any 250 kb windows displaying P<0.05, the best window in that region is shown instead.