| Literature DB >> 24808905 |
Braxton D Mitchell1, Myriam Fornage2, Patrick F McArdle3, Yu-Ching Cheng1, Sara L Pulit4, Quenna Wong5, Tushar Dave3, Stephen R Williams6, Roderick Corriveau7, Katrina Gwinn7, Kimberly Doheny8, Cathy C Laurie5, Stephen S Rich9, Paul I W de Bakker10.
Abstract
Genome-wide association studies (GWAS) are widely applied to identify susceptibility loci for a variety of diseases using genotyping arrays that interrogate known polymorphisms throughout the genome. A particular strength of GWAS is that it is unbiased with respect to specific genomic elements (e.g., coding or regulatory regions of genes), and it has revealed important associations that would have never been suspected based on prior knowledge or assumptions. To date, the discovered SNPs associated with complex human traits tend to have small effect sizes, requiring very large sample sizes to achieve robust statistical power. To address these issues, a number of efficient strategies have emerged for conducting GWAS, including combining study results across multiple studies using meta-analysis, collecting cases through electronic health records, and using samples collected from other studies as controls that have already been genotyped and made publicly available (e.g., through deposition of de-identified data into dbGaP or EGA). In certain scenarios, it may be attractive to use already genotyped controls and divert resources to standardized collection, phenotyping, and genotyping of cases only. This strategy, however, requires that careful attention be paid to the choice of "public controls" and to the comparability of genetic data between cases and the public controls to ensure that any allele frequency differences observed between groups is attributable to locus-specific effects rather than to a systematic bias due to poor matching (population stratification) or differential genotype calling (batch effects). The goal of this paper is to describe some of the potential pitfalls in using previously genotyped control data. We focus on considerations related to the choice of control groups, the use of different genotyping platforms, and approaches to deal with population stratification when cases and controls are genotyped across different platforms.Entities:
Keywords: case-control study; genetic association study; genome-wide association study; population stratification; power
Year: 2014 PMID: 24808905 PMCID: PMC4010766 DOI: 10.3389/fgene.2014.00095
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Ischemic stroke cases genotyped as part of the SiGN study and previously genotyped control groups, according to study site.
| GASROS | Boston, USA | Illumina HumanOmni 5M Exome | 470 | |
| GCNKSS | Greater Cincinnati region, USA | Illumina HumanOmni 5M Exome | 499 | |
| ISGS | Multi-center, USA | Illumina HumanOmni 5M Exome | 187 | |
| MCISS | New Jersey, USA | Illumina HumanOmni 5M Exome | 630 | |
| MIAMISR | Miami, USA | Illumina HumanOmni 5M Exome | 299 | |
| NHS | National sample, USA | Illumina HumanOmni 5M Exome | 316 | |
| NOMAS(S) | Manhattan, USA | Illumina HumanOmni 5M Exome | 363 | |
| REGARDS | National sample, USA | Illumina HumanOmni 5M Exome | 311 | |
| SPS3 | Multi-center; USA; Latin America, Spain | Illumina HumanOmni 5M Exome | 962 | |
| SWISS | Multi-center, USA | Illumina HumanOmni 5M Exome | 271 | |
| WHI | National sample, USA | Illumina HumanOmni 5M Exome | 458 | |
| WUSTL | St. Louis, USA | Illumina HumanOmni 5M Exome | 455 | |
| BASICMAR | Barcelona, Spain | Illumina HumanOmni 5M Exome | 930 | |
| BRAINS | London, England | Illumina HumanOmni 5M Exome | 114 | |
| GRAZ | Graz, Austria | Illumina HumanOmni 5M Exome | 639 | |
| KRAKOW | Krakow, Poland | Illumina HumanOmni 5M Exome | 952 | 776 |
| LEUVEN | Leuven, Belgium | Illumina HumanOmni 5M Exome | 482 | 468 |
| LUND | Lund, Sweden | Illumina HumanOmni 5M Exome | 651 | |
| SAHLSIS | Gothenburg, Sweden | Illumina HumanOmni 5M Exome | 800 | |
| HABC | Multi-center, USA | Illumina 1M-Duo | 2802 | |
| HRS | Multi-center, USA | Illumina HumanOmni 2.5M | 12507 | |
| OAI | Multi-center, USA | Illumina HumanOmni 2.5M | 4011 | |
| ADHD | Barcelona, Spain | Illumina HumanOmni 1M | 435 | |
| GRAZ | Graz, Austria | Illumina 610 | 829 | |
| INMA | Barcelona, Spain | Illumina HumanOmni 1M | 1061 | |
| KORA | Southern Germany | Illumina Human 550 | 820 | |
| WTCCC | United Kingdom | Illumina 660 | 5186 | |
SiGN cases genotyped at the Center for Inherited Diseases (CIDR) on the Illumina HumanOmni 5M Exome array.
Figure 1Minimum odds ratio to detect SNP associations at 80% power for three different sample sets. (1) 11,000 cases and 27,000 previously genotyped controls; (2) 5500 cases and 5500 controls; and (3) 5500 cases and 27,000 previously genotyped controls*.
Figure 2Flowchart of the proposed analysis by SiGN.
Considerations when using already genotyped controls.
| • When possible, identify control sets that are from similar ethnic ancestry and were genotyped on the same platform as the cases. |
| • Consider using multiple control groups, especially when cases and controls are genotyped on different platforms and/or when the size of available control groups is small. |
| • Cross-study duplicates: if possible, re-genotype a small number of previously genotyped controls to allow evaluation of SNP concordance rates across the two platforms. |
| • Combine cases and previously genotyped controls together for assessment of population substructure, using a subset of non-imputed markers common to all samples (and after excluding SNPs found to be discordant from analysis of cross-study duplicates). |
| • Impute genotypes of cases and controls within population substrata. |
| • For confirmation, it is prudent to replicate observed associations after re-genotyping cases and control samples together. |