| Literature DB >> 15725357 |
Markus Neuhäuser1, Tanja Boes, Karl-Heinz Jöckel.
Abstract
BACKGROUND: One important application of microarray experiments is to identify differentially expressed genes. Often, small and negative expression levels were clipped-off to be equal to an arbitrarily chosen cutoff value before a statistical test is carried out. Then, there are two types of data: truncated values and original observations. The truncated values are not just another point on the continuum of possible values and, therefore, it is appropriate to combine two statistical tests in a two-part model rather than using standard statistical methods. A similar situation occurs when DNA methylation data are investigated. In that case, there are null values (undetectable methylation) and observed positive values. For these data, we propose a two-part permutation test.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15725357 PMCID: PMC551601 DOI: 10.1186/1471-2105-6-35
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
p-values of the original two-part test and the two-part permutation test for seven regions in lung cancer cell lines; DNA methylation data from Siegmund et al. [4]
| Region | Original two-part test | Two-part permutation test1 |
1performed based on simple random samples of 20,000 permutations (when using 20,000 permutations a 95%-confidence interval for the p-value is the observed p-value ± 0.007 when p = 0.5, or ± 0.003 when p = 0.05).
Figure 1Number of genes with the given number of truncations per gene (data from the microarray experiment of Tschentscher et al. [11], only genes with at least one truncation)
Frequencies of different size groups of the p-values of the original two-part test and the two-part permutation test for genes with at least one truncation; data from the microarray experiment of Tschentscher et al. [11]
| ≤ 0.001 | > 0.001 and ≤ 0.01 | > 0.01 and ≤ 0.05 | > 0.05 and ≤ 0.1 | > 0.1 | |
| ≤ 0.001 | 19 | 0 | 0 | 0 | 0 |
| > 0.001 and ≤ 0.01 | 39 | 56 | 0 | 0 | 0 |
| > 0.01 and ≤ 0.05 | 0 | 50 | 164 | 8 | 0 |
| > 0.05 and ≤ 0.1 | 0 | 0 | 49 | 112 | 17 |
| > 0.1 | 0 | 0 | 0 | 53 | 1648 |
Differences between the p-values of the original two-part test and the two-part permutation test for genes with at least one truncation and small p-values (i.e. the p-value of the original test must be as small as mentioned under condition), a positive difference means that the two-part permutation test has a smaller p-value than the original test; data from the microarray experiment of Tschentscher et al. [11]
| Condition | Number of remaining genes | Mean difference (± SD) | Median difference | Quartiles of the difference |
| 514 | 0.0065 (± 0.0126) | 0.0047 | 0.0012, 0.0124 | |
| 336 | 0.0057 (± 0.0064) | 0.0040 | 0.0018, 0.0095 | |
| 114 | 0.0025 (± 0.0018) | 0.0019 | 0.0010, 0.0035 | |
| 19 | 0.0006 (± 0.0003) | 0.0006 | 0.0003, 0.0009 |
Results of the simulation study: Differences between the p-values of the original two-part test and the two-part permutation test for data sets with small p-values (i.e. p-value of the original test ≤ 0.1), a positive difference means that the two-part permutation test has a smaller p-value than the original test; p1 and p2 are the probabilities for zero values and the positive values in group 1 are shifted by μ; 5,000 data sets with n1 = n2 = 10 were generated for each configuration
| Configuration | Number of data sets | Mean difference (± SD) | Median difference | Quartiles of the difference |
| 4538 | 0.0118 (± 0.0145) | 0.0057 | 0.0013, 0.0161 | |
| 4038 | 0.0008 (± 0.0069) | 0.0020 | 0.0008, 0.0033 | |
| 4346 | 0.0010 (± 0.0059) | 0.0019 | 0.0008, 0.0033 | |
| 4270 | 0.0011 (± 0.0056) | 0.0018 | 0.0006, 0.0030 | |
| 4883 | 0.0039 (± 0.0055) | 0.0020 | 0.0005, 0.0051 | |
| 4893 | 0.0040 (± 0.0053) | 0.0022 | 0.0007, 0.0049 | |
| 3427 | -0.0033 (± 0.0119) | 0.0006 | -0.0092, 0.0029 | |
| cutoff value = 0.5, μ = 2.5 | 4621 | 0.0059 (± 0.0081) | 0.0034 | 0.0009, 0.0072 |
| cutoff value = 1, μ = 2.5 | 4860 | 0.0017 (± 0.0052) | 0.0013 | 0.0003, 0.0032 |
Results of the simulation study: Powers of the original two-part test and the two-part permutation test; 5,000 data sets with n1 = n2 = 10 were generated for each configuration (notation as in Table 4, significance level α = 0.05)
| Configuration | Original two-part test | Two-part permutation test |
| 0.82 | 0.93 | |
| 0.67 | 0.67 | |
| 0.76 | 0.75 | |
| 0.74 | 0.74 | |
| 0.93 | 0.96 | |
| 0.95 | 0.97 | |
| 0.45 | 0.47 | |
| cutoff value = 0.5, μ = 2.5 | 0.85 | 0.90 |
| cutoff value = 1, μ = 2.5 | 0.92 | 0.92 |