| Literature DB >> 18325100 |
Yuan-De Tan1, Myriam Fornage, Hongyan Xu.
Abstract
BACKGROUND: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18325100 PMCID: PMC2323973 DOI: 10.1186/1471-2105-9-142
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Profile of estimates of FDRs for a series of thresholds. λ1and λ2are two threshold functions from simulations 1 and 2 and were used to construct an estimation interval for estimate of FDR at threshold Δ. λand are true and estimated FDRs at threshold Δ, respectively, where k = 1, 2,...,L.
Figure 2The dot-plot of . F-values and -values obtained from the simulated microarray data of 3770 genes were ranked where -values were yielded by the random splitting approach. All ranked F - dots roughly fall on a diagonal line as expected by two sets of the same ranked distributions.
Efficiencies of different methods in identifying genes differentially expressed among four groups each with 6 replicates in 30 simulated datasets
| NGCS | ENFP | TNFP | Difference between ENFP and TNFP | ||||||||||
| Method | FDR | Mean (SD) | Min | Max | Mean (SD) | Min | Max | Mean (SD) | Min | Max | |||
| B procedure | 59.6 (6.6) | 46 | 73 | 3.0 (0.3) | 2 | 4 | 0.0(0.0) | 0 | 0 | 3.0 | 100% | ||
| BH Procedure | 102.2 (9.9) | 81 | 119 | 4.8 (1.0) | 4 | 6 | 1.6 (1.4) | 0 | 6 | 3.2 | 97% | ||
| SAM | 0.04 < | 111.5(14.3) | 89 | 129 | 5.1 (0.6) | 5 | 6 | 5.6(2.8) | 2 | 12 | 2.0 | 6.7 | 56.5% |
| 0.03 < | 106.8(13.2) | 84 | 119 | 3.7 (0.6) | 3 | 5 | 3.8(2.3) | 0 | 8 | 1.5 | 4.0 | 66.7% | |
| 0.02 < | 96.2(12.5) | 80 | 119 | 2.3 (0.6) | 1 | 3 | 3.1(1.7) | 1 | 6 | 1.4 | 3.1 | 39.4% | |
| 0.01 < | 91.0(12.7) | 71 | 107 | 1.3 (0.47) | 1 | 2 | 1.6(1.2) | 0 | 4 | 0.9 | 1.1 | 67.5% | |
| 0.00 < | 98.7(6.6) | 94 | 108 | 0.9 (0.1) | 1 | 1 | 1.5(1.1) | 0 | 3 | 1.0 | 1.9 | 36.4% | |
| 82.9(11.0) | 66 | 108 | 0.0 (0.0) | 0 | 0 | 1.0(0.6) | 0 | 3 | 1.0 | 1.4 | 23.1% | ||
| RAF | 0.04 < | 115.1 (9.2) | 96 | 131 | 5.1 (0.4) | 4 | 6 | 4.4(2.7) | 1 | 9 | 2.2 | 7.3 | 75.0% |
| 0.03 < | 110.6(12.2) | 85 | 128 | 3.9 (0.6) | 3 | 5 | 3.2(2.1) | 1 | 8 | 1.6 | 3.9 | 79.2% | |
| 0.02 < | 103.6 (10.6) | 86 | 120 | 2.7 (0.5) | 2 | 3 | 2.1(1.5) | 0 | 6 | 1.3 | 2.8 | 81.8% | |
| 0.01 < | 100.7 (10.8) | 81 | 118 | 1.7 (0.5) | 1 | 2 | 1.1(0.9) | 0 | 3 | 0.9 | 1.3 | 75.8% | |
| 0.00 < | 100.8 (4.1) | 96 | 112 | 1.1 (0.2) | 1 | 2 | 0.7(1.0) | 0 | 3 | 0.9 | 1.4 | 77.8% | |
| 83.8 (7.1) | 69 | 95 | 0.0 (0.0) | 0 | 0 | 0.1(0.3) | 0 | 1 | 0.1 | 0.1 | 86.2% | ||
FDR, false discovery rate; NGCS, number of genes called significant; ENFP, estimated number of false positives; TNFP, true number of false positives.
where d= ENFP- TNFPand Nis number of x <λ ≤ y in 30 simulations. where I= 1 if d≥ 0, otherwise, I= 0. .
Comparison between SAM and RAF in finding genes differentially expressed among four classes in a simulated data set of small sample size (n = 4)
| SAM | RAF | |||||||
| Delta | Number of significances | Number of false positives | Estimated FDR | Delta | Number of significances | Number of false positive | Estimated FDR | True FDR |
| 0.037534 | 10 | 5.6 | 0.56 | 0.01253 | 16 | 6 | 0.375 | 0.125 |
| 0.044668 | 10 | 5.6 | 0.56 | 0.37608 | 13 | 4 | 0.308 | 0.077 |
| 0.045738 | 10 | 5.6 | 0.56 | 0.74013 | 13 | 3 | 0.231 | 0.077 |
| 0.050144 | 9 | 4.7 | 0.52 | 1.10516 | 12 | 2 | 0.167 | 0.083 |
| 0.052423 | 9 | 4.7 | 0.52 | 1.47167 | 11 | 1 | 0.091 | 0 |
| 0.055937 | 9 | 4.7 | 0.52 | 1.84017 | 10 | 1 | 0.100 | 0 |
| 0.059564 | 9 | 4.7 | 0.52 | 2.58527 | 9 | 0 | 0 | 0 |
| 0.060798 | 9 | 4.7 | 0.52 | |||||
| 0.062046 | 9 | 4.7 | 0.52 | |||||
| 0.063305 | 0 | 0 | 0 | |||||
Figure 3The scatter plot of . F-values were observed from real microarray data set and -values yielded by random splitting approach are an estimate of null f-distribution.
Figure 4Comparison between the observed (red) and simulated (blue) plots of . F-values were observed from real (red) and simulated (blue) microarray data sets of 3770 genes and 6 replicates. -value yielded by randomly splitting approach is an estimate in null f-distribution. F-distribution from simulated data set without treatment effects is null distribution. Ranked F-values corresponds to ranked -values.
The results of RAF identifying genes differentially expressed among HS-SHRSPs, LS-SHRSPs, HS-SHRSRs, and LS-SHRSRs.
| Delta | Number of genes called significant | Number of false discoveries | Estimated FDR |
| 0.01253 | 3543 | 1181 | 0.333 |
| 0.74013 | 1504 | 500 | 0.332 |
| 1.10516 | 1157 | 173 | 0.150 |
| 1.47167 | 944 | 117 | 0.124 |
| 1.84017 | 794 | 83 | 0.105 |
| 2.21118 | 668 | 59 | 0.088 |
| 2.58527 | 580 | 44 | 0.076 |
| 2.96301 | 515 | 34 | 0.066 |
| 3.34503 | 437 | 24 | 0.055 |
| 3.73199 | 392 | 19 | 0.048 |
| 4.12463 | 370 | 15 | 0.041 |
| 4.52373 | 338 | 12 | 0.036 |
| 4.93017 | 307 | 10 | 0.033 |
| 5.34493 | 269 | 7 | 0.026 |
| 5.7691 | 250 | 6 | 0.024 |
| 6.20391 | 229 | 5 | 0.022 |
| 6.65078 | 209 | 4 | 0.019 |
| 7.11135 | 194 | 3 | 0.015 |
| 7.58753 | 182 | 2 | 0.011 |
| 9.13461 | 145 | 1 | 0.007 |
| 11.6257 | 107 | 0 | <0.007 |
HS, high salt; LS, low salt; SHRSP, stroke-prone SHR/A3 (Heid) rats; SHRSR, stroke-resistant SHR/N (CRiv) rats.