| Literature DB >> 18366694 |
Magnus Astrand1, Petter Mostad, Mats Rudemo.
Abstract
BACKGROUND: When analyzing microarray data a primary objective is often to find differentially expressed genes. With empirical Bayes and penalized t-tests the sample variances are adjusted towards a global estimate, producing more stable results compared to ordinary t-tests. However, for Affymetrix type data a clear dependency between variability and intensity-level generally exists, even for logged intensities, most clearly for data at the probe level but also for probe-set summarizes such as the MAS5 expression index. As a consequence, adjustment towards a global estimate results in an intensity-level dependent false positive rate.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18366694 PMCID: PMC2358895 DOI: 10.1186/1471-2105-9-156
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Probe-, or probe-set, wise sample variances against sample means. Scatter plots of sample variance s2 (logged with base 2) against mean intensity for logged PM intensities and three expression indexes. Left and right panels show data set A and B, respectively (see Section Data sets).
Figure 2False positive rate against mean intensity. False positive rate (α) calculated on re-sampled data and plotted against mean intensity. 100 data sets of size 6 were sampled from the complete data set B (see Section Data sets) of 18 replicated arrays and then analyzed using the Affymetrix MAS5 algorithm followed by a two group analysis of 3+3 arrays using the moderated t-test in the R-package LIMMA [3], on logged MAS5 indexes and indexes transformed using the variance stabilizing transformation in the R-package vsn [21], and the proposed method LMW using logged MAS5 indexes. false positive rate were obtained by averaging over the sampled data sets using loess-curves fitted to mean intensity and indicator of significance (1 if the probe-set is among the 5% probe-sets with highest absolute statistic, 0 otherwise). The mean intensities of each data set are shifted to the range [0,15].
Figure 3ROC curves. ROC curves for a subset of the methods compared when applied to RMA pre-processed data. The horizontal axis shows the number of false positives (FP) and the vertical axis the proportion of true positives found (TP).
Area under ROC curves up to 100 false positives, RMA and GCRMA
| Method | Pre-processing | Affymetrix U95 | Affymetrix 133A | Golden Spike | Gene Logic Tonsil | Gene Logic AML |
| PLW | RMA | 96(1) | 93(6) | 42(1) | 87(1) | 86(1) |
| LMW | RMA | 96(2) | 94(1) | 36(5) | 84(3) | 80(5) |
| LPE | RMA | 94(7) | 93(11) | 40(2) | 84(2) | 85(2) |
| combined-p | RMA | 95(4) | 92(12) | 39(3) | 83(4) | 81(4) |
| WAME | RMA | 95(5) | 94(2) | 33(7) | 81(7) | 78(8) |
| median-t | RMA | 95(3) | 93(10) | 39(4) | 82(6) | 80(6) |
| IBMT | RMA | 95(6) | 94(3) | 34(6) | 78(9) | 76(9) |
| Efron-t | RMA | 94(8) | 93(4) | 32(8) | 79(8) | 79(7) |
| FC | RMA | 92(12) | 93(5) | 29(12) | 83(5) | 85(3) |
| LIMMA | RMA | 94(9) | 93(7) | 32(9) | 76(10) | 75(10) |
| SAM | RMA | 94(10) | 93(8) | 32(11) | 74(12) | 74(11) |
| Shrink-t | RMA | 94(11) | 93(9) | 32(10) | 75(11) | 73(12) |
| t-test | RMA | 85(13) | 86(13) | 21(13) | 57(13) | 52(13) |
| PLW | GCRMA | 97(1) | 92(8) | 54(1) | 87(1) | 87(1) |
| LMW | GCRMA | 95(3) | 93(1) | 50(5) | 84(3) | 79(6) |
| median-t | GCRMA | 96(2) | 92(10) | 50(2) | 83(4) | 81(5) |
| combined-p | GCRMA | 95(5) | 91(12) | 50(4) | 86(2) | 81(4) |
| LPE | GCRMA | 95(6) | 91(11) | 50(3) | 82(6) | 86(2) |
| IBMT | GCRMA | 95(4) | 93(2) | 47(6) | 81(9) | 76(8) |
| Efron-t | GCRMA | 94(7) | 93(4) | 37(8) | 82(7) | 79(7) |
| WAME | GCRMA | 94(10) | 93(3) | 39(7) | 83(5) | 75(10) |
| FC | GCRMA | 93(12) | 93(7) | 30(13) | 81(8) | 86(3) |
| LIMMA | GCRMA | 94(9) | 93(5) | 37(9) | 80(10) | 73(11) |
| SAM | GCRMA | 94(8) | 93(6) | 36(11) | 79(11) | 76(9) |
| Shrink-t | GCRMA | 94(11) | 92(9) | 37(10) | 78(12) | 70(12) |
| t-test | GCRMA | 86(13) | 84(13) | 30(12) | 64(13) | 53(13) |
Area under ROC curves up to 100 false positives rounded to nearest integer value with an optimum of 100. Numbers within parenthesis are within data set ranks for the methods compared (separately for RMA and GCRMA). Methods are ordered with respect to mean rank across the five data sets. Results in the upper and lower part are based on RMA and GCRMA pre-processed data, respectively.
Area under ROC curves up to 100 false positives, MAS5, PPLR, BGX, and Logit-T
| Method | Pre-processing | Affymetrix U95 | Affymetrix 133A | Golden Spike | Gene Logic Tonsil | Gene Logic AML |
| LMW | MAS5 | 89(1) | 87(1) | 60(1) | 79(1) | 70(2) |
| IBMT | MAS5 | 87(2) | 87(2) | 59(2) | 77(3) | 69(3) |
| LPE | MAS5 | 84(3) | 84(3) | 57(3) | 78(2) | 79(1) |
| WAME | MAS5 | 71(6) | 81(5) | 34(5) | 69(4) | 54(8) |
| SAM | MAS5 | 74(4) | 81(4) | 11(8) | 67(5) | 54(9) |
| LIMMA | MAS5 | 71(7) | 81(6) | 31(6) | 67(6) | 54(6) |
| Shrink-t | MAS5 | 71(8) | 80(7) | 23(7) | 67(7) | 54(7) |
| t-test | MAS5 | 73(5) | 76(8) | 39(4) | 60(9) | 47(10) |
| Efron-t | MAS5 | 65(9) | 72(9) | 3(9) | 66(8) | 57(4) |
| FC | MAS5 | 56(10) | 61(10) | 0(10) | 58(10) | 55(5) |
| BGX | - | - | 58 | - | ||
| logit-T | 94 | 92 | - | 80 | 79 | |
| PPLR | multi-mgMOS | 88 | 90 | 57 | 71 | 69 |
| # of genes | 12626 | 22029 | 14010 | 12626 | 12626 | |
| # of spikes | 16 | 42 | 1331 | 11 | 11 | |
| # of groups | 20 | 14 | 2 | 12 | 10 | |
Area under ROC curves up to 100 false positives rounded to nearest integer value with an optimum of 100. Numbers within parenthesis are within data set ranks for the methods compared, and methods are ordered with respect to mean rank across the five data sets (within MAS5 only). *Results in italic are from the subset of 1011 probe sets of the Gene Logic AML data set.