| Literature DB >> 25965389 |
Konstantin Schildknecht1, Sven Olek2, Thorsten Dickhaus3.
Abstract
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.Entities:
Mesh:
Year: 2015 PMID: 25965389 PMCID: PMC4428829 DOI: 10.1371/journal.pone.0125587
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Type I error for the global hypothesis, moderate sample sizes.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
| Perm |
| Perm |
| Perm |
| 0 | 0.0654 | 0.0428 | 0.1034 | 0.0432 | 0.2154 | 0.0480 |
| 0.2 | 0.0668 | 0.0432 | 0.1092 | 0.0478 | 0.2064 | 0.0408 |
| 0.4 | 0.0730 | 0.0488 | 0.1092 | 0.0482 | 0.2092 | 0.0476 |
| 0.6 | 0.0654 | 0.0426 | 0.1012 | 0.0494 | 0.1898 | 0.0468 |
| 0.8 | 0.0628 | 0.0460 | 0.0848 | 0.0410 | 0.1662 | 0.0448 |
Monte Carlo simulation results, based on K = 10,000 repetitions, regarding the type I error rate for testing the global hypothesis in the moderate sample size regime (n = 20,n = 30) for the asymptotic χ 2-based test (χ 2) and the permutation test (Perm). The data have been generated according to Model 1 with correlation parameter ρ. The nominal significance level was set to α = 5% in all simulations. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Type I error for the global hypothesis, large sample sizes.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
| Perm |
| Perm |
| Perm |
| 0 | 0.0527 | 0.0464 | 0.0604 | 0.0448 | 0.0734 | 0.0460 |
| 0.2 | 0.0551 | 0.0456 | 0.0554 | 0.0396 | 0.0772 | 0.0500 |
| 0.4 | 0.0543 | 0.0453 | 0.0590 | 0.0440 | 0.0792 | 0.0476 |
| 0.6 | 0.0520 | 0.0440 | 0.0526 | 0.0396 | 0.0708 | 0.0458 |
| 0.8 | 0.0547 | 0.0486 | 0.0585 | 0.0460 | 0.0640 | 0.0468 |
Monte Carlo simulation results, based on K = 10,000 repetitions, regarding the type I error rate for testing the global hypothesis in the large sample size regime (n = 100,n = 150) for the asymptotic χ 2-based test (χ 2) and the permutation test (Perm). The data have been generated according to Model 1 with correlation parameter ρ. The nominal significance level was set to α = 5% in all simulations. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Power for rejecting the global hypothesis, moderate sample sizes.
|
| 0.5 | 1 | 1.5 | 2 | 2.5 | 3 | |
|---|---|---|---|---|---|---|---|
|
|
| 0.1456 | 0.2696 | 0.4540 | 0.6512 | 0.7948 | 0.8984 |
| Perm | 0.0682 | 0.1524 | 0.2948 | 0.4964 | 0.6674 | 0.8176 | |
|
|
| 0.1702 | 0.3384 | 0.5834 | 0.7986 | 0.9152 | 0.9700 |
| Perm | 0.0890 | 0.2016 | 0.4148 | 0.6556 | 0.8270 | 0.9314 | |
|
|
| 0.1976 | 0.4108 | 0.6882 | 0.8780 | 0.9700 | 0.9926 |
| Perm | 0.1008 | 0.2722 | 0.5354 | 0.7824 | 0.9178 | 0.9736 | |
|
|
| 0.2082 | 0.4744 | 0.7768 | 0.9296 | 0.9882 | 0.9982 |
| Perm | 0.1098 | 0.3170 | 0.6402 | 0.8592 | 0.9642 | 0.9932 | |
|
|
| 0.2236 | 0.5182 | 0.8168 | 0.9580 | 0.9946 | 0.9992 |
| Perm | 0.1188 | 0.3560 | 0.6894 | 0.9056 | 0.9806 | 0.9962 |
Monte Carlo simulation results, based on K = 10,000 repetitions, regarding the power for testing the global hypothesis in the moderate sample size regime (n = 20,n = 30) for the asymptotic χ 2-based test (χ 2) and the permutation test (Perm). The data have been generated according to Model 1 with correlation parameter ρ and d = 5. The nominal significance level was set to α = 5% in all simulations. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Power for rejecting the global hypothesis, large sample sizes.
|
| 0.5 | 1 | 1.5 | 2 | 2.5 | 3 | |
|---|---|---|---|---|---|---|---|
|
|
| 0.2574 | 0.7732 | 0.9850 | 0.9998 | 1 | 1 |
| Perm | 0.2228 | 0.7326 | 0.9804 | 0.9996 | 1 | 1 | |
|
|
| 0.3624 | 0.9136 | 0.9990 | 1 | 1 | 1 |
| Perm | 0.3202 | 0.8938 | 0.9974 | 1 | 1 | 1 | |
|
|
| 0.4494 | 0.9676 | 1 | 1 | 1 | 1 |
| Perm | 0.4020 | 0.9524 | 0.9998 | 1 | 1 | 1 | |
|
|
| 0.5250 | 0.9848 | 1 | 1 | 1 | 1 |
| Perm | 0.4760 | 0.9804 | 1 | 1 | 1 | 1 | |
|
|
| 0.5760 | 0.9924 | 1 | 1 | 1 | 1 |
| Perm | 0.5258 | 0.9900 | 1 | 1 | 1 | 1 |
Monte Carlo simulation results, based on K = 10,000 repetitions, regarding the power for testing the global hypothesis in the large sample size regime (n = 100,n = 150) for the asymptotic χ 2-based test (χ 2) and the permutation test (Perm). The data have been generated according to Model 1 with correlation parameter ρ and d = 5. The nominal significance level was set to α = 5% in all simulations. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Empirical family-wise error rates.
|
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|---|
|
|
| 0.050 | 0.056 | 0.060 | 0.061 | 0.065 |
|
| Perm | 0.021 | 0.024 | 0.032 | 0.036 | 0.049 |
|
|
| 0.046 | 0.045 | 0.045 | 0.035 | 0.026 |
|
| Perm | 0.021 | 0.016 | 0.018 | 0.017 | 0.011 |
|
|
| 0.028 | 0.030 | 0.033 | 0.029 | 0.024 |
|
| Perm | 0.020 | 0.020 | 0.024 | 0.022 | 0.018 |
Monte Carlo simulation results, based on K = 5,000 repetitions, regarding the FWER for the asymptotic χ 2-based multiple test (χ 2) and the multiple permutation test (Perm). The data have been generated according to Model 1 with correlation parameter ρ and d = 5. The nominal FWER level was set to α = 5% in all simulations. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Results for the first real data example.
| Locus |
|
|
|
|
|
|
| 0.0046 | 0.0002 | 0.0002 | 0.0002 | 0.0002 |
| Perm | 0.0126 | 0.0029 | 0.0029 | 0.0029 | 0.0029 |
| Locus |
|
|
|
|
|
|
| 0.0002 | 0.0047 | 0.0001 | 0.0002 | 0.0002 |
| Perm | 0.0029 | 0.0146 | 0.0029 | 0.0076 | 0.0029 |
Multiplicity-adjusted p-values of the tests for relative effects for the loci selected at the screening stage based on the asymptotic χ 2 multiple test (χ 2) and the multiple permutation test (Perm) in combination with the closure principle. The multiplicity-adjusted p-value for locus ℓ denotes the smallest significance level such that is rejected for the actually observed data. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation.
Results for the second real data example.
| Parameter | Treg | tTL | immunoCRIT | |
|---|---|---|---|---|
| Cancer indicator: | ||||
| Healthy colon versus colorectal cancer |
| < 10−16 | 4.926×10−13 | < 10−16 |
| Perm | 0.0001 | 0.0001 | 0.0001 | |
| Cancerogenesis: | ||||
| Healthy colon versus early stage cancer |
| 5.292×10−12 | 0.0024 | < 10−16 |
| Perm | 0.0001 | 0.0044 | 0.0001 | |
| Cancer progression: | ||||
| Early stage cancer versus late stage cancer |
| 0.9043 | 9.710×10−5 | 0.0002 |
| Perm | 0.9044 | 0.0005 | 0.0011 |
Multiplicity-adjusted p-values of the tests for relative effects with respect to disease groups for three different immune-relevant parameters based on the asymptotic χ 2 multiple test (χ 2) and the multiple permutation test (Perm) in combination with the closure principle. The multiplicity-adjusted p-value for parameter ℓ denotes the smallest significance level such that is rejected for the actually observed data. The permutation test was carried out as a Monte Carlo permutation test employing 9,999 randomly chosen permutations of {1,…,N}, together with the identity permutation. Treg: number of regulatory T-cells, tTL: total number of T-cells, immunoCRIT: cellular ratio of immune tolerance