Literature DB >> 22126751

A statistical method for region-based meta-analysis of genome-wide association studies in genetically diverse populations.

Xu Wang¹, Xuanyao Liu, Xueling Sim, Haiyan Xu, Chiea-Chuen Khor, Rick Twee-Hee Ong, Wan-Ting Tay, Chen Suo, Wan-Ting Poh, Daniel Peng-Keat Ng, Jianjun Liu, Tin Aung, Kee-Seng Chia, Tien-Yin Wong, E-Shyong Tai, Yik-Ying Teo.

Abstract

Genome-wide association studies (GWAS) have become the preferred experimental design in exploring the genetic etiology of complex human traits and diseases. Standard SNP-based meta-analytic approaches have been utilized to integrate the results from multiple experiments. This fundamentally assumes that the patterns of linkage disequilibrium (LD) between the underlying causal variants and the directly genotyped SNPs are similar across the populations for the same SNPs to emerge with surrogate evidence of disease association. We introduce a novel strategy for assessing regional evidence of phenotypic association that explicitly incorporates the extent of LD in the region. This provides a natural framework for combining evidence from multi-ethnic studies of both dichotomous and quantitative traits that (i) accommodates different patterns of LD, (ii) integrates different genotyping platforms and (iii) allows for the presence of allelic heterogeneity between the populations. Our method can also be generalized to perform gene-based or pathway-based analyses. Applying this method on real GWAS data in type 2 diabetes (T2D) boosted the association evidence in regions well-established for T2D etiology in three diverse South-East Asian populations, as well as identified two novel gene regions and a biologically convincing pathway that are subsequently validated with data from the Wellcome Trust Case Control Consortium.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 22126751 PMCID： PMC3306862 DOI： 10.1038/ejhg.2011.219

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Introduction

Remarkable achievements have been made in large-scale genetic studies of common diseases and complex traits.[1, 2] The identification of variants in the human genome that are convincingly associated with different phenotypes has mainly been carried out in individuals of European descent, although increasingly studies involving non-Caucasian samples from diverse population groups have been published or are currently being conducted. Genome-wide meta-analyses (GWMA) involving tens of thousands of samples have extended the success in allowing novel variants with smaller effect sizes to be discovered.[3, 4, 5, 6, 7] Despite these triumphs, these findings really account for only a small fraction of the total disease heritability,[8] suggesting undiscovered genetic mechanisms may be responsible or alternative methods to analyze these data may be necessary to address the missing heritability. Current implementation of GWMA requires the same SNPs to display consistent evidence of phenotypic association across multiple populations. This implicitly assumes that across these populations, (i) the same causal variant is present; (ii) the linkage disequilibrium (LD) pattern between the causal variant and the assayed SNPs is similar and (iii) the effect sizes observed at the assayed SNPs are consistent.[9, 10] Random effects methods for combining data across studies do not utilize information from neighboring SNPs that may present concurring evidence of disease association in different studies, and often have the tendency to weaken association signals.[9] SNP-based meta-analyses also require the same SNPs to be genotyped in all the populations, although this requirement can apparently be addressed by imputation strategies that effectively standardize the SNP content across different studies[11, 12, 13] (Figure 1). However, imputation does not always present an effective solution, particularly in the absence of appropriate population reference panels.[14, 15]

Figure 1

Illustration of the three scenarios in a meta-analysis, where the genotyped SNPs may be in different degree of LD with the unobserved causal variant (star): (i) the ideal situation where the same SNPs are genotyped in two studies, and the LD between them and the functional variant is identical in both populations (black arrow); (ii) a realistic situation where the same markers are genotyped in two studies, but different LD patterns exist between them and the functional variant (green arrow); (iii) realistic situation where different markers are genotyped in two studies, and cannot be meta-analysed without resorting to imputation (pink arrow). The LD between the causal variant and each SNP is represented in different color intensity ranging from white (low LD) to red (high LD).

Before recent whole-genome sequencing endeavors, SNP discoveries were predominantly made in populations of European ancestry.[16] This strong ascertainment bias has inadvertently skewed the SNPs surveyed in the International Hap-Map Project,[17] which consequently prejudiced the SNP content of commercial genotyping platforms to carry tagging SNPs that are liable to exhibit higher minor allele frequencies in European populations.[18] This means current genotyping arrays may be less optimal for non-European populations, resulting in attenuated association signals due to lower allele frequencies and weaker LD.[14, 15] The pursuit of evidence stronger than the genome-wide significance is thus more challenging, and larger sample sizes in non-European studies and meta-analyses of ethnically mixed populations are required to compensate for variations in LD patterns from European populations.[14, 19] Instead of seeking individual variants that display convincing evidence of phenotypic association across multiple populations, a more realistic scenario is perhaps to look for genomic regions with consistent clustering of SNPs exhibiting moderate signals in these populations. In this paper, we propose a novel paradigm for interrogating genetic data for disease association given either a dichotomous or a quantitative outcome. Our method works by quantifying the degree of over-representation of associated SNPs in a pre-defined genomic region, given a specific definition of statistical significance. Through an eigen-decomposition of the matrix measuring the LD between every possible pair of SNPs in the region, the effective number of independent SNPs as well as the number of independent SNPs exhibiting evidence of phenotypic association can be evaluated (Supplementary Figure S1). The regional evidence of phenotypic association is thus quantified as the extent of over-representation of independent associated SNPs against the effective total number of independent SNPs in the region. This approach can be applied in a genome-wide fashion by considering moving windows of a fixed length within a population. In addition, this presents a natural framework for integrating the results from multiple studies in a region-based genome-wide meta-analysis, where we can sum up the number of independent signals and independent SNPs in each region across the different studies, and to calculate a single regional P-value for this meta-analysis by quantifying the joint extent of over-representation. This framework also allows a straightforward extension to consider evidence across genes and biological pathways.

Methods

Region-based analysis

Our region-based meta-analysis approach relies on the principle that when L* independent hypotheses are tested at a statistical significance threshold of α% (Pcrit), on average we expect αL*/100 of these hypotheses to display statistical evidence more significant than α% by chance. In the application within a genome-wide association study (GWAS), suppose there are 100 SNPs in a particular genomic window of 250 kb and the threshold for defining statistical significance has been set at 1%. Under the null hypothesis that none of the 100 SNPs are associated with the phenotype, we expect one SNP on average to exhibit a P-value that is <0.01 if the 100 SNPs are mutually independent. An over-representation of independent SNPs with P-value <0.01 in this genomic region thus corresponds to evidence that suggests the region is associated with the phenotype. However, the presence of LD implies the assumption of independence between the SNPs is unlikely to be valid. In order to evaluate the effective number of ‘independent' SNPs in each genomic region, we perform an eigen-decomposition of the L × L symmetric correlation matrix M between the L SNPs with entry m denoting the LD in directional r2 between the ith and the jth SNP, where the direction is determined by the sign of D′. Here we assume the minor allele frequencies of all L SNPs are at least 1%. The resulting eigenvectors effectively represent mutually independent contributions in explaining the variance in the correlation matrix, and each eigenvector is given as a linear combination of SNPs that are in at least some degree of LD. The SNP loadings of each eigenvector measure the extent each SNP contributes to the eigenvector, and the relative loadings between the SNPs for each eigenvector provide a surrogate for the degree of correlation between the SNPs. The L eigenvectors thus represent independent sources of information from all the SNPs in the window, and the number of eigenvectors Ntotal that cumulatively accounts for τ% of the variance can be determined as argmin ∑λ≥τL%, for 1≤l≤L and where λ represents the eigenvalue corresponding to the ith eigenvector . Let denote a vector of length L with the w entry corresponding to one if the observed P-value for the ith SNP is The regional evidence for the extent of over-representation of SNPs with P-values A region-based meta-analysis across K independent populations can be performed by calculating the corresponding Nhit and Ntotal( from each population k in the same genomic window. The cumulative evidence across the K populations will then be quantified by the upper tailed P-value of the exact Binomial test for observing ∑ Nhit(out of∑ Ntotal( SNPs when the success probability is given as Pcrit.

Type 2 diabetes (T2D) data sets

We applied our region-based meta-analysis approach to combine the evidence from three separate genome-wide surveys of type 2 diabetes in the Chinese, South-East Asian Malays and Asian Indians from Singapore. Results from each individual survey and the SNP-based meta-analysis have been reported elsewhere.[20] Briefly, the Chinese GWAS examined 2010 cases and 1945 controls (post-QC) that were typed on a mixture of Illumina (San Diego, CA, USA) 610 (1082 cases/1006 controls) and Illumina1M arrays (928 cases/939 controls). The corresponding numbers for the Malay and Indian GWAS were 794 cases/1240 controls and 977 cases/1169 controls, and these were all genotyped on the Illumina 610 arrays. A genome-wide region-based meta-analysis was first performed between the Chinese data that were genotyped on the two arrays to yield a single set of findings for the Chinese experiment. The three experiments for the different population groups were used as discovery cohorts for a region-based meta-analysis with a window size of 250 kb and a sliding gap of 50 kb such that two consecutive windows have a 200-kb overlap. We also performed a gene-based meta-analysis across 30 037 genes identified from the hg18 version of the TransMap UCSC gene mapping, with each window spanning a 100-kb flanking buffer from the start and end coordinates of each gene. A pathway analysis was also performed for 212 pathways in the KEGG database[21, 22, 23] (http://www.genome.jp/kegg/pathway.html). Each gene (inclusive of a 25-kb flanking buffer) in a particular pathway was considered as a distinct window, except for genes within 50 kb of each other, which we merged as one discrete window. The intra-population evidence for a pathway was calculated from the summation of the effective number of independent significant and total SNPs across the windows. The P-value threshold (Pcrit) was set at 0.01. We identified any genomic region that exhibited P-value <0.001 in at least two populations from the region-based and gene-based analyses. This is an additional criteria to ensure that at least two populations are contributing to the observed signals, given the fundamental strategy of our approach is to identify genomic regions that are associated with the outcome in multiple populations. We excluded any regions that are known to carry copy number changes as estimation of LD is likely to be inaccurate in these regions. For the pathway analysis, we identified a pathway that exhibited P-value <0.05 in at least two populations. To avoid artificial signals of disease association that were the results of erroneous genotype calling, genotyping quality was visually ascertained in each cohort for every SNP located in the discovered regions from the region-based analysis. To validate the findings, similar analyses were performed on the type 2 diabetes data from Phase 1 of the Wellcome Trust Case Control Consortium (WTCC).[19] Calculation of LD in each of the discovery and validation cohorts was performed with 500 control samples from the respective study.

Software implementation

The method described in this paper is implemented in three separate C++ programs: (i) regionalP for performing genome-wide region-based analysis; (ii) regionalP-gene for performing gene-based analysis; and (iii) regionalP-pathway for performing pathway-based analysis. The programs are available from http://www.statgen.nus.edu.sg/ ~software/regionalP.html. Descriptions of the set up for the simulations, along with additional methods and analyses are available in the Supplementary Material online.

Results

Power and false-positive rates

We compared our method for regional analysis against standard SNP-based analyses with (i) only the genotyped SNPs or with (ii) the full set of SNPs after imputing against reference panels from phase 2 of the HapMap (HapMap2). In the meta-analysis combining the results from all three populations, the power of the region-based strategy was similar to that from a meta-analysis of the imputed SNPs (Figure 2). This was significantly higher than the power from the meta-analysis of only the genotyped SNPs. The false-positive rates of all three meta-analytic approaches were <5%, although the region-based approach had a near-zero false-positive rate when we imposed an additional restriction requiring at least two populations to exhibit P-values of <0.001 in the same region (Supplementary Table S1 online). At a genome-wide significance of 10−8, this additional restriction resulted in only a marginal decrease in statistical power, although this decrease was more substantial at less stringent significance thresholds. Investigating the sensitivity of our method by the allelic spectrum of the simulated causal variants in CEU, we observed the region-based approach was less powerful in identifying low-frequency causal variants (MAF of causal variant ≤5%) but was marginally more powerful for common causal variants (MAF of causal variant >5%, see Figure 2).

Figure 2

Power comparisons of the different methods for the meta-analysis across all three Hapmap populations. Simulations were performed with HAPGEN (Wellcome Trust Centre for Human Genetics, Oxford, UK) assuming a causal variant that was present in all HapMap phase 2 panels with a multiplicative allelic relative risk of 1.5. The case–control genotype data were subsequently thinned to the SNP content of Affymetrix 500K (CEU simulations), Illumina 1M (JPT+CHB simulations) and Affymetrix 6.0 (Santa Clara, CA, USA) (YRI simulations). We calculate the power when only the genotyped SNPs were considered (green triangles), and when we performed region-based analyses of 100 kb regions in each of the three populations (red circles). Imputation was performed with population-specific haplotypes to recover the SNPs removed from the thinning (except for the causal SNP), and a SNP-based analysis was performed on this denser set of imputed and genotyped SNPs (blue diamonds). The SNP-based meta-analyses considered either the genotyped SNPs present across all three platforms only (green triangles) or across the denser set of imputed and genotyped SNPs common to all three populations (blue diamonds). The region-based meta-analysis was performed without restriction (red circles), and with the restriction that at least two populations display region-based P-value <0.001 (red open circles).

We also explored the performance of the three approaches in the presence of allelic heterogeneity, defined as having different causal variants in the same gene or genomic location across different populations. Specifically, we performed another series of simulations assuming two different causal variants in CEU and JPT+CHB, while allowing YRI to carry either of the two possible causal variants. Our simulation explicitly selected causal variants that are at least 20 kb away but within 50 kb of each other. The region-based approach significantly outperformed both SNP-based approaches in the meta-analyses across CEU and JPT+CHB, particularly at lower Type I errors and when LD between the two causal variants is low (Figure 3). When the LD between the two causal variants is high (r>0.8), there is almost no difference in the results of the SNP-based meta-analyses of all three populations at higher Type I errors as compared with the power observed in our earlier simulations with only one causal variant. This is reassuring since we expect the two causal variants to behave as effectively a single variant when the LD is high. However, the low power experienced by the SNP-based methods in the presence of two separate causal variants reflects the inadequacy of SNP-based approaches for integrating data across diverse populations, and the greatest merit of the region-based approach is in the presence of allelic heterogeneity between populations where the different causal variants are in weak and non-existent LD.

Figure 3

Power comparisons of the different methods for meta-analysis in the presence of allelic heterogeneity. A different causal variant was selected in CEU and JPT+CHB, respectively, while either of the two causal variants was equally likely to be present in the YRI simulations. The two causal variants are located at least 20 kb away but are not >50 kb apart, and have minor allele frequencies of at least 10% in all three HapMap populations. The case–control genotype data simulated from HAPGEN were subsequently thinned to the SNP content of Affymetrix 500K (CEU), Illumina 1M (JPT+CHB) and Affymetrix 6.0 (YRI). We calculated the power when only the CEU and JPT+CHB populations were combined (top row), and when all three HapMap panels were combined (bottom row), investigating the performance of the meta-analysis across the SNPs on all three arrays (green triangles), and for the region-based meta-analysis considering 250 kb regions (red circles). Imputation was performed with population-specific haplotypes to recover the SNPs removed from the thinning, and a SNP-based meta-analysis was performed on this denser set of imputed and genotyped SNPs common to all three populations (blue diamonds). We binned the 3000 pairs of causal variants according to the LD between the two SNPs into four groups: (i) 0≤r2≤0.1; (ii) 0.1

Application to T2D data

We applied our method to perform region-based, gene-based and pathway-based meta-analyses in three independent genome-wide studies of type 2 diabetes (T2D) involving the Chinese, Malays and Asian Indians in Singapore. This was performed across all the autosomal chromosomes within each of the three GWAS in a hypothesis-generating fashion, where for the region-based analyses we considered sliding windows of 250 kb each with a sliding distance of 50 kb such that every pair of consecutive windows overlapped by 200 kb. About half of the Chinese samples were genotyped on the Illumina 1M array, while the remaining half of the Chinese, Malay and Indian samples were genotyped on the Illumina 610 array. Results of the SNP-based meta-analyses using both the genotyped SNPs and the imputed SNPs have been reported elsewhere.[20] Briefly, none of the SNPs achieved genome-wide significance in the meta-analyses, although variants in CDKAL1 and HHEX/IDE/KIF11 displayed moderate evidence of T2D association in at least two of the three populations. In particular, variants in CDKAL1 were found against a genomic background exhibiting substantial LD variations between the populations.[24] The genome-wide meta-analysis with our region-based method identified five regions exhibiting P<0.001 in at least two of the three populations (Table 1). Other than the region on chromosome 6 that encompassed CDKAL1, the other four regions did not emerge in the SNP-based meta-analyses[20] (Supplementary Table S2 online). In the replication experiment with the WTCCC data, two of these five regions displayed strong evidence of regional association (P<10−4) in the case–control T2D GWAS, which included the stretch on chromosome 6 encompassing CDKAL1 and the region on chromosome 3 between 21.73 and 22.13 Mb that encompassed ZNF659. Suggestive corroborative evidences (P<0.05) from WTCCC were also seen in the region on chromosome 2 that spanned the STK39 gene and the region on chromosome 14 containing the genes GNG2 and NID2. There was no evidence of regional association in the WTCCC for the remaining region on chromosome 20 spanning STX16 and NPEPL1.

Table 1

Results of the region-based meta-analysis for type 2 diabetes

			Discovery – Single population				Discovery – Combined			Validation from WTCCC1
Chromosome	Starta (top window)	Enda (top window)	Popb	# Hitsc	# SNPd	P	# Hitsc	# SNPd	P	Starte	Ende	# Hitsc	# SNPd	P	Gene
2	168 408 674	168 858 674	C	4.9	43	1.49 × 10⁻⁴	11.2	90	1.60 × 10⁻⁹	168 758 674	169 208 674	2.2	21	1.47 × 10⁻²	STK39
	(168 458 674)	(168 708 674)	M	6.3	23	6.03 × 10⁻⁸				(168 808 674)	(169 058 674)
			I	0	24	1
3	21 736 044	22 136 044	C	6.2	103	4.91 × 10⁻⁴	13.6	221	2.55 × 10⁻⁷	21 186 044	21 636 044	4.5	25	6.06 × 10⁻⁵	ZNF659
	(21 786 044)	(22 036 044)	M	7.3	58	1.36 × 10⁻⁶				(21 286 044)	(21 536 044)
			I	0	60	1
6	20 594 609	20 894 609	C	4.1	41	6.56 × 10⁻⁴	10.1	107	9.68 × 10⁻⁷	20 494 609	21 044 609	7.3	20	5.03 × 10⁻¹⁰	CDKAL1
	(20 594 609)	(20 844 609)	M	0	28	1				(20 594 609)	(20 844 609)
			I	6	39	2.42 × 10⁻⁶
14	51 355 752	51 755 572	C	3	67	2.91 × 10⁻²	19.7	146	2.92 × 10⁻¹⁶	50 955 752	51 405 752	2.5	30	1.99 × 10⁻²	GNG2, NID2
	(51 455 752)	(51 705 752)	M	7.5	34	2.24 × 10⁻⁸				(51 105 752)	(51 355 752)
			I	9.2	45	5.03 × 10⁻¹⁰
20	56 559 795	56 859 795	C	5	75	9.22 × 10⁻⁴	11.1	173	1.56 × 10⁻⁶	56 609 795	56 859 795	0.9	25	0.294	STX16, NPEPL1
	(56 609 795)	(56 859 795)	M	0	46	1
			I	6.1	52	1.20 × 10⁻⁵

Genomic regions identified by the region-based analysis, with the discovery mechanism based on three genome-wide association studies conducted in Chinese, Malays and Asian Indians in Singapore. Validation of the regions that emerged was performed on the type 2 diabetes case–control study from Phase 1 of the Wellcome Trust Case–Control Consortium (WTCCC).

The start and end positions of the genomic region containing consecutive windows with P<0.001 in at least two of the populations (in bold). The start and end positions of the top 250 kb window are shown in brackets. Subsequent columns show the evidence for the discovery populations in the top window.

The three discovery populations abbreviated: C, SP2 Chinese; M, SiMES Malays; I, SINDI Indians.

Effective number of independent SNPs with P<0.01 after accounting for LD.

Effective number of independent SNPs across the region after accounting for LD.

The start and end positions of the genomic region containing consecutive windows with evidence of validation (defined as P<0.05), with the start and end positions of the top 250 kb being shown in brackets. Subsequent columns show the evidence for WTCCC1 in the top window. For regions without any 250 kb windows displaying P<0.05, the best window in that region is shown instead.

Remarkably, all five regions have been previously implicated in diabetes, obesity or other cardiovascular biomarkers. The convincing signal for the region encompassing CDKAL1 is consistent with established findings for T2D,[25, 26, 27, 28, 29, 30] while ZNF659 has been associated with young-onset type 2 diabetes in the American Indians.[31] The STK39 gene has been consistently reported to harbor variants implicated in hypertension and in obesity and diabetes-related rodent quantitative trait loci.[32] Previous pathway analysis has identified the G-protein GNG2 to be associated with type 1 diabetes,[33] suggesting a serotonin modulating mechanism that is similarly relevant in the etiology of type 2 diabetes. Variants in STX16 have also been reported to significantly slow the reversal of insulin-stimulated glucose transport,[34, 35] a biological mechanism that is highly relevant to T2D.

Discussion

The scale of GWMA with diverse European and non-European populations is expected to increase markedly given the popularity of genome-wide designs in studying the genetic etiology of common diseases and complex traits. This, however, increases the challenge of accommodating varying patterns of LD that may exist between genetically diverse populations, which can compromise the ability to reproduce the association signals from surrogate markers that are correlated to the unobserved functional polymorphisms. We have introduced an alternative strategy for combining the evidence across different populations that is robust to dissimilar patterns of LD surrounding a bona fide association signal. The approach is applicable to both case–control studies or in association studies of quantitative traits. Our method has also been shown to perform comparably to imputation-based meta-analysis, except it relies on available genotype information from the experiment without requiring additional reference data from appropriately matched populations. In the presence of allelic heterogeneity, our approach outperforms both SNP-based approaches using either genotyped or imputed SNPs. The application of the region-based method to three genome-wide surveys in T2D resulted in the discovery of novel and established regions that are subsequently validated with data from the WTCCC. The region-based approach relies on the elegant application of the concept of statistical significance in evaluating a genomic region for evidence of trait association. For example, under the null hypothesis that the region is independent of the phenotype, we expect 5% of the SNPs to be statistically significant by chance when adopting a P-value threshold of 5%, if indeed all the SNPs in this region are mutually independent. If this assumption of mutual independence is true, an over-representation of statistically significant SNPs in this region constitutes evidence that this region is associated with the phenotype, with the extent of over-representation indicating the strength of the evidence. This is analogous to the use of 5 × 10−8 as the definition of genome-wide significance for assessing the likely authenticity of single markers. LD between the SNPs can confound the measurement of over-representation, as this can either inflate the number of significant signals, which increases false positives, or produce an inflated estimate of the total number of SNPs, which decreases statistical power. The eigen-decomposition of the LD matrix allows the effective number of independent SNPs to be estimated and consequently, also the effective number of independent association signals that are statistically significant. By surveying the same genomic region across different independent populations, the same statistical framework can be extended to consolidate the evidence from multiple populations, simply by summing the effective number of independent SNPs and signals across these populations and assessing the evidence for an over-representation of significant signals. This provides a simple but, yet, effective solution to combining the results from experiments that use different genotyping platforms. By searching for the same regions rather than the same SNPs to emerge in the different GWAS, inter-population variation in LD patterns between the assayed SNPs and the causal variant is expected to have lesser impact on the sensitivity of our approach. One feature of our method is the ability to sharpen the association evidence in regions containing multiple weak signals across different ethnic groups. These signals may be weaker as a result of SNP ascertainment biases in the design of genotyping arrays, resulting in weaker LD between the assayed SNPs and the causal variants. The current definition of genome-wide significance excludes many potential signals to be considered in a bid to protect against the abundance of false discoveries that is associated with testing in excess of a million hypotheses. This poses a significant challenge to genome-wide studies and GWMA in populations with short LD, such as African populations,[15, 36] as it is less likely for variants to be in sufficient LD to exhibit statistical evidence stronger than the stringent threshold. Furthermore, the greater genetic diversity that is common of such populations means it is not immediately straightforward to compensate for the lower LD by increasing the effective sample size through a meta-analysis of several populations. Our method thus provides a viable solution within a sound statistical framework to exploit and combine the evidence from SNPs that are weakly associated with the phenotype. The application of analytical methods that investigate regions in the genome rather than relying on individual SNPs is not a new concept. Neither is implementing a statistical strategy to estimate the effective number of independent association tests in the presence of LD. Numerous approaches have in fact been introduced to address the issue of multiple testing in the presence of correlated SNPs.[37, 38, 39, 40, 41, 42, 43] However, these methods either assign the most significant SNP-based evidence as the statistical evidence for the set of loci,[39] or do not explicitly incorporate the association evidence in adjusting for the effective number of tests.[38, 40, 41, 42, 43] A recent region-based approach adopted a more sophisticated approach that borrows information from surrounding SNPs, although it tends to rely on heuristic measures such as the proximity to specific genomic features (eg, known genes, evolutionarily conserved regions and haplotype blocks) for defining SNP clusters.[44] In our opinion, the imputation frameworks that MACH[12] and IMPUTE[13] are built on provide a more natural way to incorporate information from surrounding SNPs without relying on pre-defined features that may not adequately account for the correlation between SNPs. We thus benchmarked our method against the performance of the imputation-based approach, which has become the strategy of choice in recent genome-wide studies. More importantly, neither of the previous region-based approaches provide a natural solution to integrate the evidence across multiple genome-wide studies in a meta-analysis, nor adequately manage the complexity due to allelic heterogeneity. We have proposed a novel and powerful strategy for querying the genome for genotype–phenotype associations that realistically manages the challenges imposed by the fundamental design of genome-wide studies and in combining several such studies from diverse populations. We envisage this approach has the potential to be further developed for burden-related tests of rare or low-frequency variants across multiple heterogeneous populations, which is an emerging issue given the increasing popularity of exome-sequencing experiments across numerous traits.

44 in total

1. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other.

Authors: Dale R Nyholt
Journal: Am J Hum Genet Date: 2004-03-02 Impact factor: 11.025

2. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix.

Authors: J Li; L Ji
Journal: Heredity (Edinb) Date: 2005-09 Impact factor: 3.821

3. A genome-wide association study identifies novel risk loci for type 2 diabetes.

Authors: Robert Sladek; Ghislain Rocheleau; Johan Rung; Christian Dina; Lishuang Shen; David Serre; Philippe Boutin; Daniel Vincent; Alexandre Belisle; Samy Hadjadj; Beverley Balkau; Barbara Heude; Guillaume Charpentier; Thomas J Hudson; Alexandre Montpetit; Alexey V Pshezhetsky; Marc Prentki; Barry I Posner; David J Balding; David Meyre; Constantin Polychronakos; Philippe Froguel
Journal: Nature Date: 2007-02-11 Impact factor: 49.962

4. Ascertainment bias in studies of human genome-wide polymorphism.

Authors: Andrew G Clark; Melissa J Hubisz; Carlos D Bustamante; Scott H Williamson; Rasmus Nielsen
Journal: Genome Res Date: 2005-11 Impact factor: 9.043

5. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.

Authors: Richa Saxena; Benjamin F Voight; Valeriya Lyssenko; Noël P Burtt; Paul I W de Bakker; Hong Chen; Jeffrey J Roix; Sekar Kathiresan; Joel N Hirschhorn; Mark J Daly; Thomas E Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C Florez; Joanne Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi; Candace Guiducci; Anna Berglund; Joyce Carlson; Lauren Gianniny; Rachel Hackett; Liselotte Hall; Johan Holmkvist; Esa Laurila; Marketa Sjögren; Maria Sterner; Aarti Surti; Margareta Svensson; Malin Svensson; Ryan Tewhey; Brendan Blumenstiel; Melissa Parkin; Matthew Defelice; Rachel Barry; Wendy Brodeur; Jody Camarata; Nancy Chia; Mary Fava; John Gibbons; Bob Handsaker; Claire Healy; Kieu Nguyen; Casey Gates; Carrie Sougnez; Diane Gage; Marcia Nizzari; Stacey B Gabriel; Gung-Wei Chirn; Qicheng Ma; Hemang Parikh; Delwood Richardson; Darrell Ricke; Shaun Purcell
Journal: Science Date: 2007-04-26 Impact factor: 47.728

6. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants.

Authors: Laura J Scott; Karen L Mohlke; Lori L Bonnycastle; Cristen J Willer; Yun Li; William L Duren; Michael R Erdos; Heather M Stringham; Peter S Chines; Anne U Jackson; Ludmila Prokunina-Olsson; Chia-Jen Ding; Amy J Swift; Narisu Narisu; Tianle Hu; Randall Pruim; Rui Xiao; Xiao-Yi Li; Karen N Conneely; Nancy L Riebow; Andrew G Sprau; Maurine Tong; Peggy P White; Kurt N Hetrick; Michael W Barnhart; Craig W Bark; Janet L Goldstein; Lee Watkins; Fang Xiang; Jouko Saramies; Thomas A Buchanan; Richard M Watanabe; Timo T Valle; Leena Kinnunen; Gonçalo R Abecasis; Elizabeth W Pugh; Kimberly F Doheny; Richard N Bergman; Jaakko Tuomilehto; Francis S Collins; Michael Boehnke
Journal: Science Date: 2007-04-26 Impact factor: 47.728

7. Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia.

Authors: Xueling Sim; Rick Twee-Hee Ong; Chen Suo; Wan-Ting Tay; Jianjun Liu; Daniel Peng-Keat Ng; Michael Boehnke; Kee-Seng Chia; Tien-Yin Wong; Mark Seielstad; Yik-Ying Teo; E-Shyong Tai
Journal: PLoS Genet Date: 2011-04-07 Impact factor: 5.917

8. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes.

Authors: Eleftheria Zeggini; Michael N Weedon; Cecilia M Lindgren; Timothy M Frayling; Katherine S Elliott; Hana Lango; Nicholas J Timpson; John R B Perry; Nigel W Rayner; Rachel M Freathy; Jeffrey C Barrett; Beverley Shields; Andrew P Morris; Sian Ellard; Christopher J Groves; Lorna W Harries; Jonathan L Marchini; Katharine R Owen; Beatrice Knight; Lon R Cardon; Mark Walker; Graham A Hitman; Andrew D Morris; Alex S F Doney; Mark I McCarthy; Andrew T Hattersley
Journal: Science Date: 2007-04-26 Impact factor: 47.728

10. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors:
Journal: Nature Date: 2007-06-07 Impact factor: 49.962

9 in total

Review 1. Identity by descent: variation in meiosis, across genomes, and in populations.

Authors: Elizabeth A Thompson
Journal: Genetics Date: 2013-06 Impact factor: 4.562

2. A multi-SNP locus-association method reveals a substantial fraction of the missing heritability.

Authors: Georg B Ehret; David Lamparter; Clive J Hoggart; John C Whittaker; Jacques S Beckmann; Zoltán Kutalik
Journal: Am J Hum Genet Date: 2012-11-02 Impact factor: 11.025

3. Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants.

Authors: Wan-Yu Lin; Nengjun Yi; Xiang-Yang Lou; Degui Zhi; Kui Zhang; Guimin Gao; Hemant K Tiwari; Nianjun Liu
Journal: Genet Epidemiol Date: 2013-06-05 Impact factor: 2.135

4. Genetic variants associated with increased risk of malignant pleural mesothelioma: a genome-wide association study.

Authors: Giuseppe Matullo; Simonetta Guarrera; Marta Betti; Giovanni Fiorito; Daniela Ferrante; Floriana Voglino; Gemma Cadby; Cornelia Di Gaetano; Fabio Rosa; Alessia Russo; Ari Hirvonen; Elisabetta Casalone; Sara Tunesi; Marina Padoan; Mara Giordano; Anna Aspesi; Caterina Casadio; Francesco Ardissone; Enrico Ruffini; Pier Giacomo Betta; Roberta Libener; Roberto Guaschino; Ezio Piccolini; Monica Neri; Arthur W B Musk; Nicholas H de Klerk; Jennie Hui; John Beilby; Alan L James; Jenette Creaney; Bruce W Robinson; Sutapa Mukherjee; Lyle J Palmer; Dario Mirabelli; Donatella Ugolini; Stefano Bonassi; Corrado Magnani; Irma Dianzani
Journal: PLoS One Date: 2013-04-23 Impact factor: 3.240

5. Genome-wide and gene-based association studies of anxiety disorders in European and African American samples.

Authors: Takeshi Otowa; Brion S Maher; Steven H Aggen; Joseph L McClay; Edwin J van den Oord; John M Hettema
Journal: PLoS One Date: 2014-11-12 Impact factor: 3.240

6. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution.

Authors: Reedik Mägi; Momoko Horikoshi; Tamar Sofer; Anubha Mahajan; Hidetoshi Kitajima; Nora Franceschini; Mark I McCarthy; Andrew P Morris
Journal: Hum Mol Genet Date: 2017-09-15 Impact factor: 6.150

7. A missense variant in SHARPIN mediates Alzheimer's disease-specific brain damages.

Authors: Jun Young Park; Dongsoo Lee; Jang Jae Lee; Jungsoo Gim; Tamil Iniyan Gunasekaran; Kyu Yeong Choi; Sarang Kang; Ah Ra Do; Jinyeon Jo; Juhong Park; Kyungtaek Park; Donghe Li; Sanghun Lee; Hoowon Kim; Immanuel Dhanasingh; Suparna Ghosh; Seula Keum; Jee Hye Choi; Gyun Jee Song; Lee Sael; Sangmyung Rhee; Simon Lovestone; Eunae Kim; Seung Hwan Moon; Byeong C Kim; SangYun Kim; Andrew J Saykin; Kwangsik Nho; Sung Haeng Lee; Lindsay A Farrer; Gyungah R Jun; Sungho Won; Kun Ho Lee
Journal: Transl Psychiatry Date: 2021-11-16 Impact factor: 6.222

Review 8. Genetics of obesity and type 2 diabetes in African Americans.

Authors: Shana McCormack; Struan F A Grant
Journal: J Obes Date: 2013-03-19

9. Imputation-based meta-analysis of severe malaria in three African populations.

Authors: Gavin Band; Quang Si Le; Luke Jostins; Matti Pirinen; Katja Kivinen; Muminatou Jallow; Fatoumatta Sisay-Joof; Kalifa Bojang; Margaret Pinder; Giorgio Sirugo; David J Conway; Vysaul Nyirongo; David Kachala; Malcolm Molyneux; Terrie Taylor; Carolyne Ndila; Norbert Peshu; Kevin Marsh; Thomas N Williams; Daniel Alcock; Robert Andrews; Sarah Edkins; Emma Gray; Christina Hubbart; Anna Jeffreys; Kate Rowlands; Kathrin Schuldt; Taane G Clark; Kerrin S Small; Yik Ying Teo; Dominic P Kwiatkowski; Kirk A Rockett; Jeffrey C Barrett; Chris C A Spencer
Journal: PLoS Genet Date: 2013-05-23 Impact factor: 5.917

9 in total