| Literature DB >> 29066733 |
Wan-Yu Lin1,2, Wei J Chen3,4,5,6, Chih-Min Liu7, Hai-Gwo Hwu3,7, Steven A McCarroll8,9,10, Stephen J Glatt11, Ming T Tsuang12,13.
Abstract
Multi-marker association tests can be more powerful than single-locus analyses because they aggregate the variant information within a gene/region. However, combining the association signals of multiple markers within a gene/region may cause noise due to the inclusion of neutral variants, which usually compromises the power of a test. To reduce noise, the "adaptive combination of P-values" (ADA) method removes variants with larger P-values. However, when both rare and common variants are considered, it is not optimal to truncate variants according to their P-values. An alternative summary measure, the Bayes factor (BF), is defined as the ratio of the probability of the data under the alternative hypothesis to that under the null hypothesis. The BF quantifies the "relative" evidence supporting the alternative hypothesis. Here, we propose an "adaptive combination of Bayes factors" (ADABF) method that can be directly applied to variants with a wide spectrum of minor allele frequencies. The simulations show that ADABF is more powerful than single-nucleotide polymorphism (SNP)-set kernel association tests and burden tests. We also analyzed 1,109 case-parent trios from the Schizophrenia Trio Genomic Research in Taiwan. Three genes on chromosome 19p13.2 were found to be associated with schizophrenia at the suggestive significance level of 5 × 10-5.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29066733 PMCID: PMC5654754 DOI: 10.1038/s41598-017-13177-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Type I error rates in 1,000,000 simulation replications.
| Significance level | ADABF | ADABF1 | ADA | TK | TLC |
|---|---|---|---|---|---|
| 2,000 case-parent trios | |||||
|
| 0.05012 | 0.05023 | 0.05041 | 0.05099 | 0.05062 |
|
| 0.00919 | 0.00925 | 0.00908 | 0.01033 | 0.01016 |
|
| 2 × 10−6 | 10−6 | 10−6 | 10−6 | 2 × 10−6 |
| 1,000 unrelated cases and 1,000 unrelated controls | |||||
|
| 0.04999 | 0.04999 | 0.04785 | 0.04983 | 0.05030 |
|
| 0.00915 | 0.00919 | 0.00879 | 0.00994 | 0.01017 |
|
| 2 × 10−6 | 3 × 10−6 | 2 × 10−6 | 10−6 | 2 × 10−6 |
Figure 1Simulation results of the case-parent trios. Top row: OR = 1.5 for a deleterious allele and OR = 0.67 for a protective allele; bottom row: OR = 1.25 for a deleterious allele and OR = 0.8 for a protective allele. Left column: all causal variants were deleterious; right column: ~50% of the causal variants were deleterious and the other ~50% were protective. The x-axis shows the number of causal variants, whereas the y-axis shows the power (given a significance level of 2.5 × 10−6).
Figure 2Simulation results of the unrelated cases and controls. Top row: OR = 1.5 for a deleterious allele and OR = 0.67 for a protective allele; bottom row: OR = 1.25 for a deleterious allele and OR = 0.8 for a protective allele. Left column: all causal variants were deleterious; right column: ~50% of the causal variants were deleterious and the other ~50% were protective. The x-axis shows the number of causal variants, whereas the y-axis shows the power (given a significance level of 2.5 × 10−6).
Average computation time (in seconds) for each test in our simulations.
| Number of causal variants | ADABF* | ADABF1* | ADA* | TK | TLC |
|---|---|---|---|---|---|
| 2000 case-parent trios | |||||
|
| 0.313 | 0.312 | 0.318 | 2.665 | 2.290 |
|
| 56.225 | 55.136 | 52.148 | ||
|
| 89.127 | 88.149 | 85.243 | ||
| 1,000 unrelated cases and 1,000 unrelated controls | |||||
|
| 0.128 | 0.127 | 0.125 | 0.086 | 0.084 |
|
| 36.572 | 35.581 | 33.236 | ||
|
| 64.145 | 64.132 | 63.259 | ||
*We used the sequential resampling approach[26] to compute the significance for ADABF, ADABF1, and ADA. The minimum and maximum numbers of resampling were set to be 102 and 107, respectively.
Rejection rates when analyzing Q4 (no causal variants exist) and Q1 (causal variants exist) in the GAW 17 data
| Trait | Analysis gene | Causal percentage1 | No. of common causal variants (MAF and effect size)2 | The mean effect size of causal variants3 | Rejection rates4 | ADABF | ADABF1 | ADA | TK | TLC | VW-TOW |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Q4 | All 3205 genes | 0 | 0 | 0 | Type I error rates | 1.7 × 10−5 | 1.7 × 10−5 | 1.9 × 10−5 | 1.0 × 10−4 | 1.1 × 10−4 | 1.2 × 10−5 |
| Q1 |
|
| 1 (MAF = 16.5%, |
| Power | 0.915 | 0.845 | 0.955 | 0.525 | 0.965 | 0.940 |
|
|
| 2 (MAF = 6.7%, |
| 1.000 | 1.000 | 1.000 | 0.975 | 0.815 | 1.000 | ||
|
|
| 0 |
| 0.130 | 0.130 | 0.080 | 0.180 | 0.090 | 0.000 | ||
|
|
| 1.975 | 2.035 | 1.680 | 1.870 | 1.940 | |||||
1Causal percentage = #(causal variants)/#(total variants).
2Following Ionita-Laza et al.[4], here, we define variants with MAF as common, where n = 697 is the sample size in the GAW 17 data. The effect size, β, is the displacement in mean levels of Q1 for each copy of the minor allele[27].
3 is the arithmetic mean of the β’s for the causal variants in the gene.
4The rejection rates given the significance level = , where 3205 is the number of genes in the GAW 17 data set. When analyzing Q4, in which no causal variants were simulated, the rejection rates were type I error rates. When analyzing Q1, which was influenced by certain causal variants, the rejection rates represented power.
Figure 3Manhattan plots of the Schizophrenia Trio Genomic Research in Taiwan (S-TOGET) data. Red lines indicate the genome-wide significance levels, i.e., 2.5 × 10−6 for the gene-based analyses and 5 × 10−8 for the single-locus analysis, respectively. Blue lines mark the suggestive significance levels, i.e., 5 × 10−5 for the gene-based analyses and 10−6 for the single-locus analysis, respectively. The three points surpassing the suggestive significance threshold represent the signals of the three genes (EVI5L, PRR36, and LYPLA2P2), although only the most significant gene (PRR36) is labeled.
Three genes on chromosome 19p13.2 detected to be associated with schizophrenia at the suggestive significance level of 5 ×10−5.
| Gene | Chr. | Analysis region1 (Base pairs) | #(variants) |
| ||||
|---|---|---|---|---|---|---|---|---|
| ADABF2 | ADABF12 | ADA2 | TK | TLC | ||||
|
| 19 | 7865161–7959862 | 11 | 1.82 × 10−5 | 1.77 × 10−5 | 1.95 × 10−5 | 2.40 × 10−5 | 0.05741 |
|
| 7903605–7969326 | 8 | 1.27 × 10−5 | 1.21 × 10−5 | 1.35 × 10−5 | 1.71 × 10−5 | 0.01646 | |
|
| 7913504–7975117 | 8 | 2.61 × 10−5 | 2.39 × 10−5 | 3.14 × 10−5 | 2.70 × 10−5 | 0.00015 | |
1The analysis regions were based on the human GRCh37/hg19 assembly. Following Song et al.[38], we also grouped the variants within ± 30 kb flanking regions of a gene into a multi-marker analysis.
2The P-values of ADABF, ADABF1, and ADA were obtained with 107 resampling replicates.
The 13 SNPs in the EVI5L- PRR36- LYPLA2P2 region.
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| |||||||||||
| rs12980113 | 7868715 | V | T | C | 0.442 | 583 | 519 | 1.123 | 3.72 | 0.05386 | 1.58 | ||
| rs580984 | 7881030 | V | G | A | 0.483 | 607 | 484 | 1.254 | 13.87 | 0.000196 | 161.52 | ||
| rs4804827 | 7898541 | V | T | C | 0.032 | 67 | 73 | 0.918 | 0.26 | 0.6121 | 0.70 | ||
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| rs537188 | 7921623 | V | V | V | A | G | 0.102 | 203 | 215 | 0.944 | 0.34 | 0.5572 | 0.51 |
| rs747990 | 7931525 | V | V | V | A | G | 0.430 | 481 | 604 | 0.796 | 13.94 | 0.000188 | 167.14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| rs483808 | 7957481 | V | V | V | C | T | 0.419 | 514 | 551 | 0.933 | 1.29 | 0.2569 | 0.53 |
| rs533822 | 7959480 | V | V | V | G | A | 0.450 | 552 | 525 | 1.051 | 0.68 | 0.4107 | 0.40 |
| exm1417450 | 7963948 | V | V | A | G | 0.097 | 208 | 181 | 1.149 | 1.87 | 0.171 | 0.95 | |
| rs4804833 | 7970635 | V | A | G | 0.411 | 496 | 526 | 0.943 | 0.88 | 0.348 | 0.45 | ||
1The analysis for the EVI5L gene contained 11 variants spanning from 7865161 to 7959862 base pair (bp), and the four SNPs shown in bold type were prioritized by ADABF, ADABF1, and ADA. The analysis for the PRR36 gene included 8 variants from 7903605 to 7969326 bp, and rs1651016, rs555609, and rs525420 were prioritized. The analysis for the LYPLA2P2 gene contained 8 variants from 7913504 to 7975117 bp, and rs555609 and rs525420 were prioritized.
2The minor allele frequencies (MAFs) were calculated according to the founder genotypes.
3The odds ratio of the minor allele compared with the major allele, b/c, where b is the number of transmissions of the minor allele from heterozygous parents to affected offspring, and c is the number of transmissions of the major allele.
4The prior distribution of log(ORs) was assumed to be a normal distribution with a mean of 0 and a standard deviation of 0.2.
Figure 4Ranking by Bayes factor vs. P-value. We performed 200,000 simulations to compare the rankings of a causal variant using the Bayes factor (x-axis) and the P-value (y-axis). The chromosomal region included ~150 rare or common variants, and one of these variants was specified as the causal variant. The scatter plot was stratified according to the MAF of the causal variant. The black line in each plot represents x = y.
Ranking of a causal variant (a smaller rank is better) in 200,000 replications.
|
|
|
|
|
|
|---|---|---|---|---|
| Mean rank of the causal variant by BF | 17.9 | 16.6 | 10.3 | 4.1 |
| Mean rank of the causal variant by | 50.3 | 29.8 | 12.0 | 4.1 |
| # (replications where BF ranking was superior)* | 46,027 | 33,271 | 13,665 | 1,551 |
| # (replications where BF ranking was identical to | 773 | 4,747 | 25,832 | 47,160 |
| # (replications where BF ranking was inferior)* | 3,200 | 11,982 | 10,503 | 1,289 |
| Total |
|
|
|
|
*Let R and R be the ranking of the causal variant by the BF and P-value, respectively. The following three outcomes could be obtained: (1) the BF ranking was superior if R < R ; (2) the BF ranking was identical to the P-value ranking if R = R ; and (3) the BF ranking was inferior if R > R .