| Literature DB >> 24866319 |
Abstract
Many risk factors/interventions in epidemiologic/biomedical studies are of minuscule effects. To detect such weak associations, one needs a study with a very large sample size (the number of subjects, n). The n of a study can be increased but unfortunately only to an extent. Here, we propose a novel method which hinges on increasing sample size in a different direction-the total number of variables (p). We construct a p-based 'multiple perturbation test', and conduct power calculations and computer simulations to show that it can achieve a very high power to detect weak associations when p can be made very large. As a demonstration, we apply the method to analyze a genome-wide association study on age-related macular degeneration and identify two novel genetic variants that are significantly associated with the disease. The p-based method may set a stage for a new paradigm of statistical tests.Entities:
Mesh:
Year: 2014 PMID: 24866319 PMCID: PMC4035575 DOI: 10.1038/srep05081
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Powers of MPT for the sharp null (solid lines, theoretical power assuming independent auxiliary variables with perturbation proportion of, from left to right respectively, π = 1.0, 0.2, 0.1 and 0.05) and the conventional test for the crude null (dashed line), under different number of subjects (a: n = 500, b: n = 1,000, c: n = 5,000) and number of auxiliary variables. The power of the n-based increases with n. The power gain is only 30%, from 8% (n = 500, a) to 38% (n = 5,000, c). The power of the p-based MPT increases with p in all scenarios that we considered and surpasses the power of when p ≈ 3,000 for π = 1, p ≈ 60,000 for π = 0.2, p ≈ 250,000 for π = 0.1 and p ≈ 1,000,000 for π = 0.05. Under π = 1, the power of MPT can reach nearly 100% when p is sufficiently large (p > ~1,000,000 when n = 500; p > ~100,000 when n = 1,000; p > ~10,000 when n = 5,000). Under π < 1, ~100% power is also possible if p can be made even larger.
Top five SNPs on chromosome 1with smallest P-values by MPT for age-related macular degeneration data. The P-value for each SNP is obtained from 500,000 rounds of permutation. To adjust for multiple testing, FDR is controlled at 0.05 and the q-values are calculated (QVALUE software)13
| Rank | RefSNP (rs) number | Minor allele frequency (%) | P-value of MPT | q-value | Odds ratio | P-value of Pearson chi-square test |
|---|---|---|---|---|---|---|
| 1 | rs2618034 | 7.19 | 4.00 × 10−6 | 0.026 | 0.53 | 0.201 |
| 2 | rs2014029 | 5.82 | 1.40 × 10−5 | 0.045 | 2.10 | 0.166 |
| 3 | rs437749 | 43.15 | 2.66 × 10−4 | 0.357 | 0.94 | 0.865 |
| 4 | rs3753298 | 5.82 | 2.74 × 10−4 | 0.357 | 1.84 | 0.241 |
| 5 | rs1749409 | 8.97 | 4.28 × 10−4 | 0.357 | 0.51 | 0.147 |
Figure 2Fixation ((a–c), respectively for the 1st to the 3rd top SNPs on chromosome 1) and drifting ((d–f), for three purposefully chosen middle-to-bottom ranking SNPs on chromosome 1) of the P-values of MPT when only a certain number of perturbation SNPs are randomly incorporated for the age-related macular degeneration data. Each panel includes three lines (solid, dashed and dotted) representing three random incorporation sequences.
Each P-value is obtained from 1,000,000 rounds of permutation. The P-values initially fluctuate a lot, when the number of perturbation SNPs incorporated is small. But beyond a certain point, the P-values become ‘fixed' exactly to the abscissa (P-values = 0) (a and b), or almost so (P-values ≈ 0) (c). By comparison, the P-values of all three purposefully chosen middle-to-bottom ranking SNPs are ‘drifting' all the way without showing any sign of a fixation (d–f).
Figure 3Power curve when a researcher includes the 100 informative variables (I = 0.02) known to him/her and then other low-informativity variables (dotted lines from left to right, for I = 0.001, 0.00025 and 0.0001, respectively) unselectively into MPT.