| Literature DB >> 20846437 |
Wayne Wenzhong Xu1, Clay J Carter.
Abstract
BACKGROUND: In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array.Entities:
Mesh:
Year: 2010 PMID: 20846437 PMCID: PMC2955048 DOI: 10.1186/1471-2105-11-465
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Parallel multiplicity model versus simultaneous multiplicity model. A and B indicate the p-value distribution with x-axis of p-values and y-axis of counts of test statistics. C and D are the simultaneous multiplicity tests and parallel multiplicity tests, respectively. H0 and H1 represent the null hypotheses and alternative hypotheses, respectively.
Differentially expressed gene numbers reported by different multiple tests.
| Data sets | |||
|---|---|---|---|
| Test | Hyperinsulinemic | miRNA knockout | Colorectal cancers |
| raw p | 46 | 2350 | 10973 |
| PCER | 104 | 4351 | 18425 |
| PFER | 9 | 442 | 2025 |
| Bonferroni | 0 | 144 | 605 |
| Holm | 0 | 144 | 609 |
| Hochberg | 0 | 144 | 609 |
| SidakSD | 0 | 144 | 614 |
| BH | 0 | 407 | 5552 |
| BY | 0 | 227 | 2221 |
| qvalue | 0 | 407 | 6108 |
| SAM | 0 | 0 | 5330 |
| Bayes | 0 | 0 | 5705 |
| EDR | 5 | 593 | 4810 |
The raw cel files of these three data sets [21,23,25] were downloaded from the NCBI GEO database (GSE7146, GSE7333, GSE4107) and were preprocessed by the GC-RMA method. Two groups in each data set were tested by two-tailed t test assuming equal variance. All multiple tests and raw p-values were applied at the same significance level of α = 0.05.
EDR calculation
| id | Gene | raw- | EDR | |||
|---|---|---|---|---|---|---|
| M72885_rna1_s_at | GOS2 | 0.00018432 | 309.453094 | 5.318301 | 2 | 0.00000028 |
| X15729_s_at | DDX5 | 0.00044674 | 2.773653 | 1.332583 | 4 | 0.00193717 |
| M34516_r_at | IGL@ | 0.00321646 | 10.316779 | 1.807361 | 27 | 0.01042631 |
| D11428_at | PMP22 | 0.00342428 | 20.312435 | 1.910159 | 30 | 0.00555663 |
| HG3514-HT3708_at | Tropomyosin | 0.00437053 | 41.545886 | 1.188196 | 33 | 0.01844629 |
| L20971_at | PDE4B | 0.00494670 | 3.348770 | 1.998387 | 42 | 0.06214142 |
| U33448_s_at | LTB4R | 0.00625588 | 0.582200 | 1.066264 | 53 | 1 |
EDR detection of the Hyperinsulinemic data set. The EDR of gene i is the expectation of raw-p of this gene (p) multiplied by the number of negative gene controls (N') at p-value equal to or greater than 1- p, divided by the ratio of the maximum group mean of this gene with the median value of all genes (x) and by the fold changes (f) minus 1.
Figure 2Performances of all multiple tests on three different real data sets. (a) low-S0 (proportion of changed genes) hyperinsulinemic data. (b) moderate-S0 miRNA knockout data. (c) high-S0 colorectal cancer data.
TPR and FPR at significance of 0.05
| DEGs | 1070 | 606 | 1420 | 974 | 1734 | 1833 |
| TP | 90 | 63 | 101 | 83 | 106 | 109 |
| FP | 980 | 543 | 1319 | 891 | 1628 | 1724 |
| TN | 3774 | 4238 | 3424 | 3870 | 3110 | 3011 |
| FN | 91 | 118 | 80 | 98 | 75 | 72 |
| TPR | 0.4972 | 0.3481 | 0.5580 | 0.4586 | 0.5856 | 0.6022 |
| FPR | 0.2061 | 0.1136 | 0.2781 | 0.1871 | 0.3436 | 0.3641 |
The expression data set was downloaded from http://www.ambystoma.org and was preprocessed by the RMA method [38]. Differentially expressed genes (DEGs) were detected at the significance level of 0.05 by the EDR method and the other methods from the multtest package [39]. The resultant DEGs were compared with the true DEGs (TP) measured by digital expression. False positives (FP) are those DEGs that are not found to be differential in digital expression analysis. True negatives (TN) are those genes that are not differential in both platforms.
Figure 3Receiver operator characteristic (ROC) curve. The true positive rate (TPR) and false positive rate (FPR) in differentially expressed genes (DEGs) detected by EDR [equation (5)], EDR-n [equation (3)], EDR-i [equation (2)], or other methods were plotted as ROC curves. The microarray data set [26] tested was confirmed by sequence digital expression.
Figure 4Power comparison. Power comparison of all multiple tests on simulation data sets with different proportions of differentially expressed genes. All FWER methods have the same powers on the same line.