| Literature DB >> 26059716 |
Donghyung Lee1, T Bernard Bigdeli2, Vernell S Williamson2, Vladimir I Vladimirov3, Brien P Riley2, Ayman H Fanous2, Silviu-Alin Bacanu2.
Abstract
MOTIVATION: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.Entities:
Mesh:
Year: 2015 PMID: 26059716 PMCID: PMC4576696 DOI: 10.1093/bioinformatics/btv348
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Estimated weights (%) for 1KG ethnicities (see Supplementary Table S1 for abbreviations of ethnicities)
| Cohort | Estimated weights (%) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ASW | CEU | CHB | CHS | CLM | FIN | GBR | IBS | JPT | LWK | MXL | PUR | TSI | YRI | |
| Cohort 1 40% ASW + 60% GBR | 40 | 0 | 0 | 0 | 0 | 0 | 60 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Cohort 2 60% CHB + 40% MXL | 0 | 0 | 60 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39.9 | 0 | 0 | 0 |
| Cohort 3 20% ASW + 30% CHB + 30% GBR + 20% MXL | 20 | 0 | 29.9 | 0 | 0 | 0 | 29.9 | 0 | 0 | 0 | 20 | 0 | 0 | 0 |
| Cohort 3* 20% ASW + 30% CHB + 30% GBR + 20% MXL | 22.3 | 12.1 | 30.3 | 0 | 16.3 | 6.4 | - | 0.5 | 1.3 | 0 | - | 7.8 | 2.9 | 0 |
| Cohort 4 30% CEU + 25% CHS + 5% PUR + 40% YRI | 0 | 30 | 0 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 40 |
| Cohort 5 10% ASW + 15% CEU + 15% CHB + 12.5% CHS + 15% GBR + 10% MXL + 2.5% PUR + 20% YRI | 10 | 15 | 15 | 12.5 | 0 | 0 | 15 | 0 | 0 | 0 | 10 | 2.5 | 0 | 20 |
| PGC SCZ2 | 2 | 25.2 | 2.1 | 4 | 2.3 | 17.9 | 24.3 | 2.4 | 0.5 | 0 | 0 | 3.9 | 15.4 | 0 |
All cohorts use 1KG as the reference panel, except Cohort 3* which used 1KG without GBR and MXL.
Fig. 1.DISTMIX relative Type I error rate (the empirical Type I error rate divided by the nominal Type I error rate) as a function of the nominal Type I error rate and the null summary data used. Cohort 1, 40% ASW + 60% GBR; Cohort 2, 60% CHB + 40% MXL; Cohort 3, 20% ASW + 30% CHB + 30% GBR + 20% MXL; Cohort 4, 30% CEU + 25% CHS + 5% PUR + 40% YRI. All cohorts use 1KG as the reference panel, except Cohort 3* which used 1KG without GBR and MXL. The dashed line (at 1) denotes the nominal threshold for the relative Type I error rate
Fig. 2.DISTMIX Z-scores as a function of IMPUTE2 Z-scores from PGC SCZ2 discovery phase and DISTMIX imputation information. The vertical dotted lines represent the suggestive thresholds for PGC SCZ2 discovery phase (IMPUTE2 P-value < 1 × 10−6). r, the squared correlation coefficient (r) between DISTMIX and IMPUTE2 Z-scores for the suggestive PGC SCZ2 SNPs; r, r between two predictions for all SNPs
Fig. 3.−log10(p) for the 105 LD independent autosomal association PGC SCZ2 SNPs as a function of the rank of significance for DISTMIX P-values, imputation information and imputation method used