| Literature DB >> 25294186 |
Abstract
BACKGROUND: The power of the genome wide association studies starts to go down when the minor allele frequency (MAF) is below 0.05. Here, we proposed the use of Cohen's h in detecting disease associated rare variants. The variance stabilizing effect based on the arcsine square root transformation of MAFs to generate Cohen's h contributed to the statistical power for rare variants analysis. We re-analyzed published datasets, one microarray and one sequencing based, and used simulation to compare the performance of Cohen's h with the risk difference (RD) and odds ratio (OR).Entities:
Mesh:
Year: 2014 PMID: 25294186 PMCID: PMC4198687 DOI: 10.1186/1471-2164-15-875
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Biases, MSEs and type I error rates for RD, Cohen’s h and OR
| ES | Type of SNP | No. SNPs | Bias | MSE | Min | Max | Type I error rate |
|---|---|---|---|---|---|---|---|
| RD | Common | 360839 | 0.00004 | 0.00012 | -0.046 | 0.044 | 0.050 |
| Rare | 52220 | 0.00002 | 0.00001 | -0.015 | 0.015 | 0.051 | |
| Cohen’s h | Common | 360839 | 0.00008 | 0.00068 | -0.103 | 0.097 | 0.050 |
| Rare | 52220 | 0.00018 | 0.00072 | -0.091 | 0.092 | 0.056 | |
| log(OR) | Common | 360839 | 0.00021 | 0.00477 | -0.346 | 0.346 | 0.050 |
| Rare | 52220 | 0.00178 | 0.11395 | -2.707 | 2.739 | 0.048 |
Figure 1Mean of empirical type I error rates for risk difference (RD), Cohen’s h and log(OR) in each autosome.
Empirical power for tests at nominal level 0.05 based on 1000 replicates
| Fixed gene length (bp) | Mean no of rare SNPs | RD | OR | Cohen’s h | CMC | WSS | VT | |
|---|---|---|---|---|---|---|---|---|
| 250 | 46.2 | 0.491 | 0.644 | 0.501 | ||||
| Unadj. | 0.142 | 0.107 | 0.178 | |||||
| BH | 0.042 | 0.037 | 0.051 | |||||
| Bonf. | 0.037 | 0.030 | 0.043 | |||||
| 500 | 96 | 0.878 | 0.931 | 0.882 | ||||
| Unadj. | 0.465 | 0.393 | 0.521 | |||||
| BH | 0.106 | 0.080 | 0.135 | |||||
| Bonf. | 0.087 | 0.064 | 0.113 | |||||
| 1000 | 187.5 | 0.992 | 0.998 | 0.992 | ||||
| Unadj. | 0.584 | 0.509 | 0.652 | |||||
| BH | 0.136 | 0.109 | 0.162 | |||||
| Bonf. | 0.121 | 0.083 | 0.141 | |||||
| 2000 | 377.7 | 1 | 1 | 1 | ||||
| Unadj. | 0.880 | 0.814 | 0.918 | |||||
| BH | 0.254 | 0.194 | 0.306 | |||||
| Bonf. | 0.211 | 0.143 | 0.256 | |||||
| 5000 | 944.3 | 1 | 1 | 1 | ||||
| Unadj. | 0.973 | 0.94 | 0.987 | |||||
| BH | 0.370 | 0.265 | 0.451 | |||||
| Bonf. | 0.305 | 0.191 | 0.388 |
Unadj.: Without adjustment for multiple testing. BH: Benjamini-Hochberg procedure. Bonf.: Bonferroni correction.
Figure 2Relationship between power and needed sample size based on OR and Cohen’s h for rare SNPs. (A) Line plot shows the power estimated by OR (bold line) and Cohen’s h (dotted line) at the same threshold. (B) power curves given fixed OR =3 with corresponding Cohen’s h at varying MAF in controls. (C) Power ratio at varying MAF in controls and varying sample size.
Figure 3Manhattan plot showing the significance of association between all rare SNPs and CAD. For all panels, the genome-wide significance threshold of 0.05/403,089 is shown. Distributions of -log10 p-values for (A) risk difference, (B) Cohen’s h and (C) log(OR).
Significant SNPs for CAD under different ES measures at genome-wide significance levels
| Type of SNPs | Significant SNPs | OR | RD | Cohen’s h |
|---|---|---|---|---|
| Common | Number | 26 | 26 | 26 |
| Median | 1.327 | 0.065 | 0.134 | |
| Range | (0.757, 6.104) | (-0.063, 0.354) | (-0.129, 0.795) | |
| Rare | Number | 9 | 13 | 18 |
| Median | 2.144 | 0.021 | 0.119 | |
| Range | (1.88, 2.41) | (-0.009, 0.038) | (-0.167, 0.179) |
Replication of rare SNPs showing statistically significant effect at genome-wide significance levels (1.2 × 10 ) for CAD
| Chr | ES | SNP | Nearest gene or SNP | Location | MAF in controls | MAF in cases | OR | P-value | RD | P-value | Cohen’s h | P-value | Exact test P-value b | Association of SNP or proxy with other cardiovascular phenotypes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes | within | associated | interval | |||||||||||
| 3 | All | rs17042882 | PLCL2 | 3p24.3 | 0.028 | 0.061 | 2.255 | 4.88 × 10-15 | 0.033 | 1.11 × 10-15 | 0.163 | 4.00 × 10-15 | --- | Heart failure, Arthritis |
| 3 | h | rs16827563 | VEPH1 | 3q24-q25 | 0.005 | 0 | NA | NA | -0.005 | 2.18 × 10-5 | -0.119 | 1.02 × 10-8 | 7.3 × 10-7 | Carotid artery disease, Diabetes Mellitus |
| 7 | RD | rs17146094 | EIF4H | 7q11.23 | 0.017 | 0.034 | 2.036 | 1.27 × 10-7 | 0.017 | 7.15 × 10-8 | 0.109 | 1.32 × 10-7 | --- | CAD |
| 8 | All | rs16891338 | SAMD12-AS1 | 8q24.12 | 0.023 | 0.043 | 1.908 | 4.11 × 10-8 | 0.02 | 2.50 × 10-8 | 0.113 | 4.66 × 10-8 | --- | Blood Pressure |
| 8 | All | rs16908145 | FLJ45872 | 8q24.23 | 0.022 | 0.043 | 1.998 | 6.54 × 10-9 | 0.021 | 3.46 × 10-9 | 0.12 | 7.08 × 10-9 | --- | |
| 15 | RD, h | rs7163007 | MAP2K5 | 15q23 | 0.002 | 0.011 | 5.551 | 2.13 × 10-7 | 0.009 | 5.33 × 10-9 | 0.121 | 5.85 × 10-9 | --- | BMI, Diabetes Mellitus |
| 16 | All | rs16955238 | GAN | 16q24.1 | 0.022 | 0.046 | 2.143 | 8.91 × 10-11 | 0.024 | 3.41 × 10-11 | 0.135 | 8.53 × 10-11 | --- | Cholesterol |
| 16 | h | rs7197337 | ANKRD26P1 | 16q11.2 | 0.006 | 0 | NA | NA | -0.006 | 2.88 × 10-6 | -0.132 | 1.76 × 10-10 | 2.3 × 10-8 | |
| 19 | All | rs11671119 | MEF2B MEF2NB | 19p13.11 | 0.033 | 0.071 | 2.239 | 0 | 0.038 | 0 | 0.174 | 0 | --- | Diabetes Mellitus |
| SNPs near associated SNPs within 500 kb | ||||||||||||||
| 1 | RD, h | rs6674781 | rs6671793 | 2a | 0.002 | 0.011 | 5.55 | 2.13 × 10-7 | 0.009 | 5.33 × 10-9 | 0.121 | 5.85 × 10-9 | --- | Coronary disease |
| 3 | h | rs17064749 | rs7615788 | 10a | 0.008 | 0.001 | 0.124 | 8.36 × 10-5 | -0.007 | 2.84 × 10-6 | -0.116 | 2.28 × 10-8 | 4.2 × 10-7 | Cholesterol |
| 3 | h | rs10510375 | rs1450097 | 400a | 0.009 | 0.001 | 0.11 | 2.97 × 10-5 | -0.008 | 4.03 × 10-7 | -0.127 | 9.67 × 10-10 | 2.2 × 10-8 | Cholesterol, HDL |
| 3 | h | rs6805861 | rs10510197 | 250a | 0.007 | 0 | NA | NA | -0.007 | 3.84 × 10-7 | -0.145 | 2.92 × 10-12 | 1.2 × 10-9 | Cholesterol, HDL |
| 4 | All | rs890447 | rs97669522 | 25a | 0.043 | 0.078 | 1.883 | 6.49 × 10-13 | 0.035 | 3.09 × 10-13 | 0.148 | 8.35 × 10-13 | --- | CAD |
| 5 | All | rs159171 | rs10520872 | 500a | 0.025 | 0.055 | 2.27 | 6.88 × 10-14 | 0.03 | 1.62 × 10-14 | 0.156 | 5.51 × 10-14 | --- | Cholesterol, LDL |
| 5 | h | rs41349146 | rs2431337 | 500a | 0.007 | 0 | NA | NA | -0.007 | 3.84 × 10-7 | -0.145 | 2.92 × 10-12 | 1.2 × 10-9 | Arteries |
| 6 | h | rs41518850 | rs12190287 | 300a | 0.006 | 0 | NA | NA | -0.006 | 2.88 × 10-6 | -0.132 | 1.76 × 10-10 | 2.3 × 10-8 | CAD |
| 6 | h | rs4398751 | rs9397922 | 150a | 0.005 | 0 | NA | NA | -0.005 | 2.18 × 10-5 | -0.119 | 1.02 × 10-8 | 7.3 × 10-7 | Lipoprotein |
| 8 | All | rs16883114 | rs10503973 | 200a | 0.021 | 0.041 | 1.993 | 1.57 × 10-8 | 0.02 | 8.57 × 10-9 | 0.117 | 1.69 × 10-8 | --- | Cholesterol, LDL |
| 9 | RD, h | rs12343115 | rs2149998 | 300a | 0.009 | 0 | NA | NA | -0.009 | 6.97 × 10-9 | -0.167 | 6.66 × 10-16 | 3.6 × 10-12 | Myocardial Infarction |
| 18 | All | rs41477147 | rs10502528 | 150a | 0.028 | 0.065 | 2.413 | 0 | 0.037 | 0 | 0.179 | 0 | --- | Arteries |
| (rs1595963) | ||||||||||||||
| 21 | h | rs7276641 | rs2829644 | 300a | 0.01 | 0.002 | 0.198 | 2.50 × 10-5 | -0.008 | 2.81 × 10-6 | -0.111 | 8.91 × 10-8 | --- | Coronary disease |
adenotes the physical distance (in kb) to the nearest validated SNP. bFisher’s exact test is only required when the asymptotic assumption does not hold. NA: not available; Chr., chromosome; MAF, minor allele frequency; location according to NCBI Build 37.5; Association of SNP or proxy with other cardiovascular phenotypes was based on the HuGE Navigator database (http://hugenavigator.net/HuGENavigator/startPagePubLit.do), dbSNP (NCBI website: http://www.ncbi.nlm.nih.gov/projects/SNP/) and MalaCards (http://www.malacards.org/pages/whatsmalacards).
Figure 4Scatter plot of OR and Cohen’s h for rare and common SNPs in CAD. (A) common SNPs and (B) rare SNPs. Despite the seemingly more outliers using the OR criteria, more outliers using Cohen’s h criteria were statistically significant.
Proportions of SNPs with mild, moderate, and large effect for CAD GWAS data
| Type of | No. of | Mild effect (%) | Moderate effect (%) | Large effect (%) | |||
|---|---|---|---|---|---|---|---|
| SNPs | SNPs | OR | Cohen’s h | OR | Cohen’s h | OR | Cohen’s h |
| Common | 360839 | 99.453 | 99.936 | 0.546 | 0.063 | 0.001 | 0.001 |
| Rare | 52220 | 69.772 | 73.416 | 23.932 | 26.505 | 6.296 | 0.079 |
The respective thresholds of ORs for mild, moderate and large effect at common SNPs were |log(OR)|≦log(1.2), log(1.2) < |log(OR)|≦log(1.5), and |log(OR)| > log(1.5), whereas Cohen’s h had respective thresholds of |h|≦0.075, 0.075 < |h|≦0.15, and |h| > 0.15. The respective thresholds of ORs for mild, moderate and large effect at rare SNPs were |log(OR)|≦log(1.5), log(1.5) < |log(OR)|≦log(2), and |log(OR)| > log(2), whereas Cohen’s h had respective thresholds of |h|≦0.05, 0.05 < |h|≦0.1, and |h| > 0.1.