| Literature DB >> 32833970 |
Xinyuan Dong1,2, Yu-Ru Su1, Richard Barfield1, Stephanie A Bien1, Qianchuan He1, Tabitha A Harrison1, Jeroen R Huyghe1, Temitope O Keku3, Noralane M Lindor4, Clemens Schafmayer5, Andrew T Chan6, Stephen B Gruber7, Mark A Jenkins8, Charles Kooperberg1,2, Ulrike Peters1, Li Hsu1,2.
Abstract
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.Entities:
Mesh:
Year: 2020 PMID: 32833970 PMCID: PMC7470748 DOI: 10.1371/journal.pgen.1008947
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Novel CRC associated genes and secondary genes.
| Gene | N SNPs | chr | Pred Exp | Var Comp | sMiST | Pred Exp | Var Comp | sMiST | |
| 0.35 | 52 | 3 | 0.96 | 1.92e-06 | 3.96e-06 | 0.95 | 2.03e-06 | 4.38e-06 | |
| 0.25 | 36 | 17 | 0.25 | 1.42e-06 | 2.89e-06 | 0.29 | 9.12e-07 | 1.99e-06 | |
| 0.04 | 12 | 22 | 0.99 | 1.41e-06 | 3.19e-06 | 0.99 | 1.41e-06 | 3.19e-06 | |
| Gene | N SNPs | chr | Pred Exp | Var Comp | sMiST | Pred Exp | Var Comp | sMiST | |
| 0.06 | 36 | 13 | 0.48 | 1.13e-06 | 2.36e-06 | 0.43 | 2.39e-05 | 5.45e-05 | |
1 The unadjusted and adjusted p-values are without and with adjusting for the known CRC loci that are on the same chromosome of the gene.
2 The column names are as follows. R2 is the variation of gene expression explained by eQTLs from the PrediXcan model; N SNPs is the number of variants in the gene; chr is the chromosome #; Pred Exp is the p-value for predicted gene expression; Var Comp is the p-value for the variance component; sMiST is the combined p-value of predicted gene expression and variance component tests using optimally weighted linear combination.
Fig 1Scatter plots of −log10(p-values) for testing the mediation effect and variance component for sMiST compared with individual level data based MiST in the presence of confounding.
Fig 2Performance of sMiST when there are two mediators.
Fig 3Comparison of -log10(p-values) from summary statistics based sMiST-Score, sMiST-Standardized Score, and sMiST-Wald vs. individual level data based MiST under the complete null hypothesis (top panel) and under the alternative hypothesis (bottom panel).
Fig 4Comparison of sMiST using summary statistics from GECCO and LD matrices from CORECT with MiST with individual level data in GECCO.
Fig 5Effect of sample sizes in calculating the genotype covariance matrix on the mediation and variance component p-values for sMiST without regularization (top panel) and with regularization (bottom panel).
Fig 6Scatter plots of −log10(p-values) from summary statistics-based methods sMiST mediation, S-PrediXcan, and TWAS vs. −log10(p-values) based on individual level data.