| Literature DB >> 25473427 |
Yun R Li1, Brendan J Keating2.
Abstract
Genome-wide association studies (GWASs) are the method most often used by geneticists to interrogate the human genome, and they provide a cost-effective way to identify the genetic variants underpinning complex traits and diseases. Most initial GWASs have focused on genetically homogeneous cohorts from European populations given the limited availability of ethnic minority samples and so as to limit population stratification effects. Transethnic studies have been invaluable in explaining the heritability of common quantitative traits, such as height, and in examining the genetic architecture of complex diseases, such as type 2 diabetes. They provide an opportunity for large-scale signal replication in independent populations and for cross-population meta-analyses to boost statistical power. In addition, transethnic GWASs enable prioritization of candidate genes, fine-mapping of functional variants, and potentially identification of SNPs associated with disease risk in admixed populations, by taking advantage of natural differences in genomic linkage disequilibrium across ethnically diverse populations. Recent efforts to assess the biological function of variants identified by GWAS have highlighted the need for large-scale replication, meta-analyses and fine-mapping across worldwide populations of ethnically diverse genetic ancestries. Here, we review recent advances and new approaches that are important to consider when performing, designing or interpreting transethnic GWASs, and we highlight existing challenges, such as the limited ability to handle heterogeneity in linkage disequilibrium across populations and limitations in dissecting complex architectures, such as those found in recently admixed populations.Entities:
Year: 2014 PMID: 25473427 PMCID: PMC4254423 DOI: 10.1186/s13073-014-0091-5
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Landmark and innovative transethnic genetic association analyses*
| Trait | Gene or locus | Platform | Comments | References |
|---|---|---|---|---|
| Type 2 diabetes |
| Haplotype analysis | Replication of primary signal in WA population and fine-mapping of second independent signal showing positive selection in WA, EA and EUR cohorts; recently also replicated in large-scale meta-analysis over 39 studies | [ |
| Lipids (HDLC and TGs) | Candidate gene resequencing | Fine-mapping of known | [ | |
| End-stage kidney disease |
| GWAS | Common variants in | [ |
| Uric acid levels (serum) |
| GWAS | Replication of a 263 kb association locus (identified in EUR) in an AA cohort enabled fine-mapping to a 27 kb shared region | [ |
| Bilirubin levels |
| GWAS | Replication of previously identified association in this locus in EUR and ASN cohorts using AFR population; also enabled fine-mapping to a functional, putatively causative variant | [ |
| ALL | GWAS | Known risk-associated variants are more common in NA, confer greater risk and explain the higher observed risk of ALL in Hispanic children. Illustrates how disease risk analysis can shed light on disease associations in admixed populations with complex genomic architectures | [ | |
| T2D |
| Exome seq | High-throughput sequencing identified rare, novel missense mutation in a known locus associated with maturity-onset diabetes (MODY3); association is specific to Latino populations. Recently highlighted in a review on admixed population analysis | [ |
| Prostate cancer | 15 EUR-specific, 7 multi-ethnic | GWAS | Large study encompassing over 40,000 cases and 40,000 controls in EUR, AFR, JPT, and Latino populations; multi-ethnic analyses help identify 7 new signals not found in EUR | [ |
| BMI | Custom genotyping platform | Metabochip analysis across about 30,000 AA individuals confirms 8 EUR BMI loci in AA, identified independent signal in known locus and identified two novel loci | [ | |
| Global gene expression levels | Multiple | Expression array | EUR, JPT and CHN populations show large variations in gene expressions due to differences in allele frequencies of common regulatory eSNPs, possibly explaining differences in complex disease risk | [ |
| T2D | Multiple | GWAS meta-analysis | Landmark transethnic FE meta-analysis across nearly 27,000 cases from 5 ethnic minority populations identified 7 novel signals, enabled fine-mapping of 10 loci, and demonstrated evidence of heterogeneity compared with EUR studies using MANTRA software | [ |
*GWAS and other forms of genetic association studies have historically and recently provided important insights into disease-related loci. This table highlights a few notable examples, providing the study phenotypes, key associations (where specific), and details of the study including any unique approach used and the main findings/advances. Abbreviations: AA, African American; AFR, African; ALL, acute lymphoblastic leukemia; ASN, Asian; BMI, body mass index; CEU, Caucasoid; CHN, Chinese; EA, East Asian; eSNP, expression single nucleotide polymorphism; EUR, European; FE, fixed effects; GWAS, genome-wide association study; HDLC, high density lipoprotein cholesterol; JPT, Japanese; LD, linkage disequilibrium; NA, Native American; RE, random effects; T2D, type 2 diabetes; TG, triglycerides; WA, West African.
Figure 1Fine-mapping of candidate causal or functional SNPs by transethnic GWAS. The graph shows the results of association testing (in the form of the allele frequencies) for a typical locus in three different populations. In the EUR population, many SNPs in the region are in close LD, leading to a significant signal for a wide set of SNPs. However, LD patterns in the ASN population are different, which enables finer mapping of the causal SNP as being the SNP with the strongest trait association. However, it is rarely obvious in advance which additional populations should be studied, as in some populations (such as AFR in this example) the locus might not be associated with the trait at all, because of epistatic interactions, phenotype heterogeneity, or low minor allele frequency/non-polymorphic markers across the locus. Data shown are based on simulation and do not reflect the result of any published or unpublished studies. Abbreviations: ASN, Asian; AFR, African; EUR, European.
Methods, tools, literature reviews and resources*
| Method or advance | Advances and limitations or main findings | References |
|---|---|---|
| MANTRA transethnic meta-analysis software | Replication of primary signal in WA population and fine-mapping of second independent signal showing positive selection in WA, EA and EUR cohorts. MANTRA is available as a suite of executables on request from the author [ | MANTRA [ |
| RE-HE random-effects method | RE and FE models in the context of a meta-analysis with significant heterogeneity have low power. By relaxing overly conservative parameters in RE analysis algorithms, RE-HE provides more power in the presence of inter-study effect heterogeneity. Metasoft is available as a package [ | RE-HE algorithm [ |
| Review on replicability of transethnic association signals | Comprehensive review of literature across 28 diseases in EA and EUR populations demonstrating high replicability, sharing of disease alleles and good correlation of effect sizes | [ |
| Review on power gains in meta-analytical approaches | Simulation-based analysis demonstrating that a multi-ethnic study design provides non-trivial power gains, especially when AFR populations are used to examine low frequency alleles (MAF <5%) | [ |
| Comparative analysis of FE, RE, RE-HE and MANTRA as a method for GWAS meta-analysis | Results show that both RE-HE and MANTRA are computationally efficient and robust methods in accounting for effect size heterogeneity while providing a boost in power when compared with traditional meta-analysis methods. Results are provided for both simulations and application to T2D datasets | [ |
| Modified RE-HE for joint analysis of resequencing data for rare variant gene-based analysis | Extension of RE-HE to provide a more powerful (than traditional RE) method to perform rare-variant burden testing in a heterogeneous resequencing study sample | [ |
*Summary of innovative methods, applications and literature reviews as highlighted in the main text. We summarize the methodological advances, including those for meta-analysis, any significant or notable limitations, and for reviews. Abbreviations: AFR, African; ALL, acute lymphoblastic leukemia; EA, East Asian; eQTL, expression quantitative trait locus; EUR, European; FE, fixed effects; GWAS, genome-wide association study; LD, linkage disequilibrium; MAF, minor allele frequency; RE, random effects; RE-HE, alternate random effects; T2D, type 2 diabetes; WA, West African.
Figure 2Theoretical basis of admixture GWAS study designs. (a) Populations 1 and 2 are two parental populations in which there has been no gene flow historically. When these populations interbreed the subsequent F1 population includes heterozygotes. Over the course of 5 or 10 generations the chromosome of any given Fn population offspring will include a combination of parental chromosomal `bands'. Some loci are associated with a disease (such as B) and others are not (such as A). (b, c) In a typical GWAS, association testing identifies whether a given allele (such as T at SNP2) is associated with increased risk for having a disease; this is shown as allele frequencies in the table. (c) If the ancestral frequency of T at SNP2 is different in two parental populations (1 and 2) and if it is associated with disease, then the population with higher frequencies of this allele will also have higher risk for disease. One can thus expect to observe higher incidences of disease in individuals carrying the T allele and also higher incidence of disease in individuals from population 1, in which the T allele is more frequent. This is the premise of admixture association studies. By ascertaining local ancestry one can determine if an allele that is much more common in one population may be associated with disease risk. In (b), in a locus with no evidence of association with disease, admixture analysis would find that the minor allele frequencies (and percentages of individuals of either ancestral populations) do not differ between cases and controls. (d) Graph of the allele frequencies along the genome. The relative frequency of the allele from population 1 differs between the cases and the controls only at the locus associated with the disease/phenotype. Thus, in admixed populations, by determining the local ancestry in the cases versus controls, one can determine if there is an association between an allele associated with ancestry and disease liability.