| Literature DB >> 27973554 |
Stephanie A Bien1, Genevieve L Wojcik2, Niha Zubair1, Christopher R Gignoux2, Alicia R Martin2, Jonathan M Kocarnik1, Lisa W Martin3, Steven Buyske4, Jeffrey Haessler1, Ryan W Walker5,6, Iona Cheng7, Mariaelisa Graff8, Lucy Xia9, Nora Franceschini8, Tara Matise4, Regina James10, Lucia Hindorff11, Loic Le Marchand12, Kari E North8, Christopher A Haiman9, Ulrike Peters1,13, Ruth J F Loos5,6, Charles L Kooperberg1, Carlos D Bustamante2, Eimear E Kenny5,6, Christopher S Carlson1,13.
Abstract
Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II), Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA) was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328), Hispanic/Latino (22,379), Asian/Pacific Islander (8,640), and American Indian (653) and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.Entities:
Mesh:
Year: 2016 PMID: 27973554 PMCID: PMC5156387 DOI: 10.1371/journal.pone.0167758
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Marker allocation used for design of MEGA.
| Abbreviated reference | Approximate SNP allocation | Content description | Parameters informing content |
|---|---|---|---|
| Infinium HumanCore BeadChip | 250,000 | Included for backwards compatibility | Highly informative GWAS tag SNPs for EUR or ASN ancestries |
| African Diaspora Consortium Power Chip | 700,000 | Augmented GWAS coverage for African ancestries | 692 individuals sequenced by CAAPA, highly informative for variants with MAF>2% |
| Improved cross-population tagging content | 300,000 | Augmented GWAS coverage for diverse ancestries | New tagging strategy developed by PAGE using 1KGP Phase 3 sequencing, highly informative for variants with MAF<2% |
| Multiethnic exonic content | 400,000 | Exome markers for diverse populations | Derived from WGS/WES data from > 36,000 individuals in diverse ethnic groups, emphasizes loss of function and splice variants |
| NHGRI GWAS catalog | 11,631 | Markers (tag SNPs) from published GWAS | Includes tag SNPs not reaching genome-wide significance (p<5x10-8), and SNPs in high LD |
| SNPs in publications | 5,874 | SNPs listed in UCSC browser track | Mentioned by rsid number in ≥ 4 publications |
| Clinical and pharmacogenetic | 17,000 | All clinically relevant SNPs | Domain expert opinion and those annotated as deleterious |
| Validated regulatory SNPs | 2,500 | Regulatory variants with | Differential EMSA, most with differential luciferase or equivalent |
| Enhanced GWAS | 20,000 | Improved tag SNP coverage for candidate genes/regions | Minimum r2 of 0.8 rather than mean r2 of 0.6 used for backbone |
| Enhanced Exome | 16,000 | Improved exonic coverage for candidate genes/regions | All available exonic variants |
| Fine-mapping | 16,000 | Fine-mapping coverage for GWAS catalog reports | All SNPs tagged at r2 > 0.6 in reference population from primary GWAS report |
| OMIM/Clinvar | Overlaps backbone content | Clinically relevant SNPs related to traits of interest | E.g. hyperlipidemia ( |
a OMIM/Clinvar datasets were subsequently added as back-bone content when variant classified as deleterious. Additional ‘likely deleterious’ variants were added if directly related to traits of interest; MAF = Minor Allele Frequency; 1KGP = 1000 Genomes Project; AFR, AMR, ASN, EUR = 1000 Genomes Project Phase 3 super populations; WGS/WES = whole genome sequencing/whole exome sequencing; EMSA = electrophoretic mobility shift assay
List of Prioritized Genes by Trait.
| Traits | Genes of Interest |
|---|---|
* Genes present in multiple trait groups
Custom Content Variants Before and After Design.
| Source | Before Design Filter | Content Included On MEGA |
|---|---|---|
| 2,894 | 2,849 | |
| 4,915 | 4,496 | |
| 139 | 122 | |
| 1,616 | 1,495 | |
| 327 | 215 | |
| 28,465 | 25,388 | |
| 14,001 | 5,733 | |
| 15,049 | 11,974 | |
| 2,617 | 1,577 | |
| 3,752 | 2,278 | |
| 129 | 119 |
Fig 1Enrichment of rarer variation in custom content.
Comparison of minor allele frequency distribution between the MEGA GWAS backbone and the custom content stratified by race. Allele frequencies were calculated in PAGE II study populations.
Fig 2Improved imputation accuracy found with custom content sites in regions of interest.
Solid lines denote the imputation accuracy of MEGA including the custom content, while dashed lines indicate the performance of MEGA without the custom content. Admixed populations are on the left, with continental populations found on the right.
Enhanced Imputation Accuracy with Custom Content Addition.
| Population | Minor Allele Frequency Threshold | ||
|---|---|---|---|
| AAC | 2.75% | 2.63% | 0.84% |
| AFR | 3.02% | 2.79% | 0.98% |
| ASN | 1.79% | 1.89% | 0.54% |
| AMR | 2.20% | 2.27% | 0.43% |
| SAS | 2.65% | 2.38% | 0.52% |
| EUR | 2.48% | 2.75% | 0.43% |
Fig 3The functional hypothesis tested (‘Hypothesized Function”) by year for 2,610 variants reported in a functional allelic assay found through literature review.