| Literature DB >> 26808560 |
Yunpeng Wang1,2,3,4, Wesley K Thompson5, Andrew J Schork6, Dominic Holland4, Chi-Hua Chen4,7, Francesco Bettella1,2, Rahul S Desikan4,7, Wen Li1,2, Aree Witoelar1,2, Verena Zuber1,2, Anna Devor3,4, Markus M Nöthen8, Marcella Rietschel9, Qiang Chen10, Thomas Werge11, Sven Cichon12, Daniel R Weinberger10, Srdjan Djurovic1,13, Michael O'Donovan14, Peter M Visscher15,16, Ole A Andreassen1,2, Anders M Dale3,4,5,7.
Abstract
Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic ("z-score") of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a "relative enrichment score" for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3.Entities:
Mesh:
Year: 2016 PMID: 26808560 PMCID: PMC4726519 DOI: 10.1371/journal.pgen.1005803
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Mean replication z-scores stratified by genomic annotation, pleiotropy and heterozygosity.
The conditional mean z-scores in replication sample (y axis) were plotted against the z-scores in the discovery sample (x axis). The shrinkage of replication z-score is differentiated by A.) genomic annotation categories (All SNPs; Intergenic; 5’ untranslated region,5’ UTR; Intron; Exon; and 3’ untranslated region, 3’ UTR), B.) by heterozygosity (H) intervals, C.) by associations with bipolar disorder (BIP; All SNPs; -log10 p ≥ 1.0; -log10 p ≥ 2.0; and log10 p ≥ 3.0) and D.) by total LD (TLD) intervals. All plots were generated by randomly assigning 26 of the PGC Schizophrenia sub-studies as discovery sample and 26 as replication sample (split half). The average value over 500 iterations is shown.
Fig 2Enrichment of SNP associations with schizophrenia conditioned on predicted enrichment scores.
The conditional Q-Q plot shows the enrichment of SNP association with schizophrenia stratified by predicted relative enrichment scores A.) based on LD-weighted Annotation categories, heterozygosity and total LD score and B.) based on LD-weight Annotation categories, heterozygosity, total LD score and SNP association with bipolar disorder. The predicted enrichment scores are equally divided into 10 disjoint intervals or bins (from the least enriched stratum, Bin1, to the most enriched stratum, Bin10). The dashed line indicates the null distribution and dotted line indicates all SNPs, i.e., not stratified. Different colors indicate different intervals of predicted enrichment scores. The leftward shift of the each curve compared to the null line indicates the relative enrichment. SNPs in the MHC region were excluded and then pruned based on the LD structure from the 1000 Genomes European subpopulation at r2 < 0.8.
Fig 3Mean replication z-score and replication rate stratified by enrichment scores.
A.) The observed (solid lines) and predicted (dotted lines) mean z-scores in the replication sample (y axis) were plotted against the z-scores in the discovery sample (x axis). The shrinkage of replication z-scores is differentiated by disjoint intervals of relative enrichment scores. B.) The observed (solid lines) and predicted (dotted lines) replication probabilities were plotted against the negative common logarithm of nominal p values of schizophrenia SNPs in discovery sample (x axis). Colors indicate the 10 disjoint intervals of relative enrichment scores, ranging from the least enriched (Bin1) to the most enriched (Bin10). All data were generated by randomly assigning 26 of the PGC schizophrenia sub-studies as discovery sample and 26 as replication sample (split half). The averaged value over 500 iterations was shown.
Fig 4CM3 improves power of identifying gene loci.
The average empirical cumulative replication rates (y axis) are plotted against the number of SNPs replicating at that rate > 0.5 (x axis), after removing MHC region SNPs and pruning at LD r2 < 0.1. A.) The full PGC sample was used to estimate predicted replication probability (pred repl prob). For each iteration, 26 PGC schizophrenia sub-studies were randomly assigned to the discovery sample, and the rest to the replication sample (split half). The average values over 500 iterations are shown, and B.) Half of the PGC sample (26 sub-studies) was used to estimate the predicted replication probability. For each iteration, 26 PGC schizophrenia sub-studies were randomly assigned to the discovery sample, and the rest to the replication sample (split half). Then, the predicted replication probability was estimated by applying the CM3 method on the discovery sample with 50 iterations. The p values (computed by meta-analysis) of the discovery sample and the predicted replication probabilities (computed by CM3) were used to sort SNPs in replication sample, consist of rest of the sub-studies. The average replication rates across 50 iterations were shown. Colors indicate different sorting criteria (green: sorted by prediction replication probability and blue sorted by nominal p values).
Fig 5Relative importance of sources for enrichment.
The relative importance of different sources of enrichment (x axis) for explaining SNP association with schizophrenia was measured by the Nagelkerke’s R2. The enrichment sources were: total linkage disequilibrium (TLD); the squared z-scores of SNP association with bipolar disorder (BIP); the LD weighted genomic annotation scores (Annot); and the heterozygosity (H).