| Literature DB >> 30016313 |
Yu Jiang1, Sai Chen2, Daniel McGuire1, Fang Chen1, Mengzhen Liu3, William G Iacono3, John K Hewitt4, John E Hokanson5, Kenneth Krauter4, Markku Laakso6, Kevin W Li2, Sharon M Lutz7, Matthew McGue3, Anita Pandit2, Gregory J M Zajac2, Michael Boehnke2, Goncalo R Abecasis2, Scott I Vrieze3, Xiaowei Zhan8, Bibo Jiang1, Dajiang J Liu1.
Abstract
Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.Entities:
Mesh:
Year: 2018 PMID: 30016313 PMCID: PMC6063450 DOI: 10.1371/journal.pgen.1007452
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Power and type I errors of meta-analysis of single variant tests in the presence of missing data for continuous outcomes.
Datasets were simulated according to the genetic and phenotype model described in METHODS. Meta-analysis was performed to combine 20 cohorts with 1500 individuals each. For each replicate, summary association statistics were generated, and a certain fraction of the generated summary statistics were masked as missing. Scenarios with different combinations of known variant effects, candidate variant effects and fractions of missingness were considered. Six analysis strategies were considered: 1) PCBS; 2) SYN+; 3) ImpG+meta; 4) COJO; 5) DISCARD and 6) REPLACE0. Type I error and power were evaluated using 105 replicates under the significance threshold of α = 0.005.
| Conditioned Variant Effect | Candidate Variant Effect | Fraction of Missing Data | Type I Error/Power | ||||||
|---|---|---|---|---|---|---|---|---|---|
| PCBS | SYN+ | ImpG+Meta | COJO | DISCARD | REPLACE0 | Analyze the Full Dataset [Gold Standard] | |||
| Type I Error | |||||||||
| 0.04 | 0 | 0.1 | 5.0 × 10−3 | 4.4 × 10−3 | 5.2 × 10−3 | 0.065 | 4.1 × 10−3 | 9.5 × 10−3 | 4.9 × 10−3 |
| 0.04 | 0 | 0.3 | 5.4 × 10−3 | 4.0 × 10−3 | 0.015 | 0.57 | 3.8 × 10−3 | 0.14 | 5.4 × 10−3 |
| 0.04 | 0 | 0.5 | 5.2 × 10−3 | 3.5 × 10−3 | 0.021 | 0.61 | 1.8 × 10−3 | 0.46 | 5.1 × 10−3 |
| 0.08 | 0 | 0.1 | 5.0 × 10−3 | 3.0 × 10−3 | 9.3 × 10−3 | 0.25 | 2.0 × 10−3 | 0.025 | 4.8 × 10−3 |
| 0.08 | 0 | 0.3 | 5.6 × 10−3 | 1.7 × 10−3 | 0.12 | 0.61 | 2.0 × 10−3 | 0.45 | 4.4 × 10−3 |
| 0.08 | 0 | 0.5 | 5.2 × 10−3 | 1.3 × 10−3 | 0.25 | 0.65 | 9.3 × 10−4 | 0.60 | 4.9 × 10−3 |
| Power | |||||||||
| 0.04 | 0.04 | 0.1 | 0.22 | 0.20 | - | - | 0.092 | - | 0.22 |
| 0.04 | 0.04 | 0.3 | 0.21 | 0.18 | - | - | 0.021 | - | |
| 0.04 | 0.04 | 0.5 | 0.20 | 0.17 | - | - | 4.5 × 10−3 | - | |
| 0.08 | 0.04 | 0.1 | 0.21 | 0.17 | - | - | 0.063 | - | 0.21 |
| 0.08 | 0.04 | 0.3 | 0.21 | 0.12 | - | - | 0.013 | - | |
| 0.08 | 0.04 | 0.5 | 0.19 | 0.11 | - | - | 3.2 × 10−3 | - | |
| 0.04 | 0.08 | 0.1 | 0.88 | 0.87 | - | - | 0.57 | - | 0.88 |
| 0.04 | 0.08 | 0.3 | 0.87 | 0.85 | - | - | 0.12 | - | |
| 0.04 | 0.08 | 0.5 | 0.86 | 0.83 | - | - | 0.017 | - | |
| 0.08 | 0.08 | 0.1 | 0.88 | 0.84 | - | - | 0.49 | - | 0.88 |
| 0.08 | 0.08 | 0.3 | 0.86 | 0.76 | - | - | 0.083 | - | |
| 0.08 | 0.08 | 0.5 | 0.83 | 0.74 | - | - | 0.011 | - | |
Power and type I errors of meta-analysis of gene-level tests in the presence of missing data.
Datasets were simulated according to the genetic and phenotype model described in METHODS. Within the gene region, 20% of the variant sites are deemed causal. Meta-analysis was performed to combine 10 cohorts with 2000 individuals each. For each replicate, summary association statistics were generated, and a certain fraction (10%, 30% or 50%) of the generated summary statistics were masked as missing. Scenarios with different combinations of known variant effect, candidate variant effects and fractions of missingness were considered. To evaluate the power loss due to missing data, we also analyzed the full dataset as a gold standard. Type I errors and power were evaluated for three rare variant tests (simple burden, SKAT and VT) using 1 million replicates under the significance threshold of α = 0.005.
| Conditioned Variant Effect | Candidate Variant Effect ( | Fraction of Missing Data | Type I Error/Power for Burden/SKAT/VT (α = 0.0005) | |
|---|---|---|---|---|
| PCBS | Analyze the Full Dataset [Gold Standard] | |||
| 0.05 | 0 | 0.1 | 4.5 × 10−3/3.1 × 10−3/3.8 × 10−3 | 4.8 × 10−3/4.1 × 10−3/4.5 × 10−3 |
| 0.05 | 0 | 0.3 | 4.7 × 10−3/4.4 × 10−3/3.4 × 10−3 | 4.7 × 10−3/4.4 × 10−3/6.0 × 10−3 |
| 0.05 | 0 | 0.5 | 6.4 × 10−3/4.0 × 10−3/3.4 × 10−3 | 4.7 × 10−3/5.0 × 10−3/4.4 × 10−3 |
| 0.1 | 0 | 0.1 | 3.3 × 10−3/2.6 × 10−3/4.9 × 10−3 | 5.3 × 10−3/5.9 × 10−3/5.3 × 10−3 |
| 0.1 | 0 | 0.3 | 6.0 × 10−3/4.7 × 10−3/4.1 × 10−3 | 4.7 × 10−3/5.4 × 10−3/4.1 × 10−3 |
| 0.1 | 0 | 0.5 | 6.3 × 10−3/6.7 × 10−3/6.3 × 10−3 | 5.8 × 10−3/5.9 × 10−3/4.9 × 10−3 |
| 0.05 | 0.1 | 0.1 | 0.21/0.21/0.19 | 0.22/0.23/0.21 |
| 0.05 | 0.1 | 0.3 | 0.19/0.19/0.17 | |
| 0.05 | 0.1 | 0.5 | 0.17/0.16/0.14 | |
| 0.1 | 0.1 | 0.1 | 0.22/0.22/0.20 | |
| 0.1 | 0.1 | 0.3 | 0.20/0.20/0.18 | |
| 0.1 | 0.1 | 0.5 | 0.17/0.16/0.14 | |
| 0.05 | 0.2 | 0.1 | 0.59/0.60/0.58 | 0.60/0.61/0.59 |
| 0.05 | 0.2 | 0.3 | 0.57/0.57/0.55 | |
| 0.05 | 0.2 | 0.5 | 0.54/0.53/0.52 | |
| 0.1 | 0.2 | 0.1 | 0.59/0.60/0.58 | |
| 0.1 | 0.2 | 0.3 | 0.58/0.58/0.56 | |
| 0.1 | 0.2 | 0.5 | 0.54/0.53/0.52 | |
Independently associated variants identified using sequential forward selection with PCBS method.
Sequential conditional analyses for the 9 loci were conducted, where we iteratively performed conditional analysis, conditioning on the top variants from earlier rounds. Top association signals at each iteration are shown. The sequential conditional analysis stops when the top association signal is no longer significant under the genome-wide significance threshold α = 5 × 10−8.
| POS | RS | REF | ALT | AF | PVALUE | BETA | SE | N | ANNO | GENE | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Locus rs2072659 Marginal association analysis | |||||||||||
| rs2072659 | C | G | 0.1 | 1.9 × 10−8 | -0.041 | 7.3 × 10−3 | 134862 | Utr3 | |||
| Locus rs550432263 Marginal association analysis | |||||||||||
| rs550432263 | G | A | 2.8 × 10−6 | 3.6 × 10−8 | 71 | 13 | 34858 | Intergenic | |||
| Locus rs9366836 Marginal association analysis | |||||||||||
| rs9366836 | A | G | 0.17 | 3.3 × 10−8 | 0.028 | 5.2 × 10−3 | 134862 | Intron | |||
| Locus rs215600 Marginal association analysis | |||||||||||
| rs215600 | G | A | 0.64 | 4.8 × 10−11 | -0.027 | 4.0 × 10−3 | 134862 | Intron | |||
| Locus rs58379124 Marginal association analysis | |||||||||||
| rs58379124 | T | C | 0.77 | 4.4 × 10−14 | 0.035 | 4.6 × 10−3 | 134862 | Intron | |||
| Locus rs1217106 Marginal association analysis | |||||||||||
| rs1217106 | A | G | 0.78 | 2.2 × 10−9 | -0.028 | 4.4 × 10−3 | 134862 | Intergenic | |||
| Locus rs56116178 Marginal association analysis | |||||||||||
| rs56116178 | A | G | 0.11 | 2.5 × 10−9 | 0.038 | 6.3 × 10−3 | 134862 | Intergenic | |||
| Locus rs11852372 Marginal association analysis | |||||||||||
| rs11852372 | A | C | 0.34 | 7.7 × 10−115 | 0.096 | 4.2 × 10−3 | 128249 | Intron | |||
| Conditional on rs11852372 | |||||||||||
| rs1317286 | A | G | 0.34 | 1.7 × 10−22 | 0.027 | 2.8 × 10−3 | 128249 | Intron | |||
| Conditional on rs11852372 and rs1317286 | |||||||||||
| rs7181245 | C | T | 0.21 | 2.5 × 10−13 | -0.032 | 4.4 × 10−3 | 128249 | Intron | |||
| Conditional on rs11852372, rs1317286 and rs7181245 | |||||||||||
| rs8040868 | T | C | 0.40 | 2.2 × 10−11 | 0.020 | 2.9 × 10−3 | 128249 | Synonymous | |||
| Conditional on rs11852372, rs1317286, rs7181245 and rs8040868 | |||||||||||
| rs2089162 | A | G | 0.33 | 3.5 × 10−8 | 0.011 | 2.0 × 10−3 | 128249 | Intron | |||
| Locus rs56113850 Marginal association analysis | |||||||||||
| rs56113850 | T | C | 0.58 | 6.6 × 10−67 | 0.070 | 4.0 × 10−3 | 128249 | Intron | |||
| Conditional on rs56113850 | |||||||||||
| rs117824460 | A | G | 0.029 | 6.2 × 10−23 | -0.13 | 0.013 | 128249 | Intergenic | |||
| Conditional on rs56113850 and rs117824460 | |||||||||||
| rs117540499 | G | A | 0.023 | 2.4 × 10−17 | -0.11 | 0.013 | 128249 | Intergenic | |||
| Conditional on rs56113850, rs117824460 and rs117540499 | |||||||||||
| rs7246742 | T | G | 0.13 | 1.9 × 10−8 | -0.033 | 5.9 × 10−3 | 128249 | Intergenic | |||