| Literature DB >> 16504060 |
Pingzhao Hu1, Joseph Beyene, Celia M T Greenwood.
Abstract
BACKGROUND: Microarray data analysts commonly filter out genes based on a number of ad hoc criteria prior to any high-level statistical analysis. Such ad hoc approaches could lead to conflicting conclusions with no clear guidance as to which method is most likely to be reproducible. Furthermore, the number of tests performed with concomitant inflation in type I error also plagues the statistical analysis of microarray data, since the number of tested quantities in a study significantly affects the family-wise error rate. It would, therefore, be very useful to develop and adopt strategies that allow quantification of the quality of each probeset, to filter out or give little credence to low-quality or unexpressed probesets, and to incorporate these strategies into gene selection within a multiple testing framework.Entities:
Mesh:
Year: 2006 PMID: 16504060 PMCID: PMC1420292 DOI: 10.1186/1471-2164-7-33
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Test statistics and quality measures. For the quality-based tests, the summarized quality measure, across treatment groups, is defined by Q= max.
| 1.0 | 1.0 | No filtering | |
| All arrays must have present call to be included | |||
| N/A | Weighted means | ||
| N/A | N/A | Local pooled error test [18] |
Figure 1Receiver Operator Characteristic (ROC) plots for tests of differential expression in the simulated data with treatment effect δg = 2.0. Six weighted tests and the local pooled error (LPE) test are compared. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.
Figure 2Receiver Operator Characteristic (ROC) plots for tests of differential expression in the simulated data with treatment effect δg = 1.0. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.
Area under the curves (AUCs) for five simulation models *
| Model | Multiple Testing | Quality Measures | ||||||
| Model 1 | Unadjusted | 0.777 | 0.380 | 0.925 | 0.892 | 0.871 | 0.743 | 0.924 |
| WBH | 0.812 | 0.382 | 0.929 | 0.895 | 0.885 | 0.772 | 0.932 | |
| Model 2 | Unadjusted | 0.840 | 0.387 | 0.934 | 0.907 | 0.898 | 0.809 | 0.932 |
| WBH | 0.878 | 0.387 | 0.937 | 0.912 | 0.911 | 0.846 | 0.937 | |
| Model 3 | Unadjusted | 0.895 | 0.394 | 0.933 | 0.918 | 0.922 | 0.868 | 0.932 |
| WBH | 0.925 | 0.395 | 0.937 | 0.923 | 0.932 | 0.902 | 0.936 | |
| Model 4 | Unadjusted | 0.918 | 0.399 | 0.921 | 0.913 | 0.926 | 0.889 | 0.915 |
| WBH | 0.935 | 0.401 | 0.917 | 0.918 | 0.934 | 0.912 | 0.891 | |
| Model 5 | Unadjusted | 0.876 | 0.408 | 0.875 | 0.876 | 0.884 | 0.850 | 0.850 |
| WBH | 0.882 | 0.410 | 0.858 | 0.861 | 0.882 | 0.860 | 0.770 | |
For and , the sensitivity parameter ν was set to 0.05.
* Simulated probe level data was summarized using RMA
Agreement in probeset selections between our methods and Haslett et al. [16]: Given a chosen number of selected probesets, how many of the probesets selected by GFC were also selected by the methods in this paper (corresponding false discovery rate WBH-FDR)
| Summarization Method | Number of probesets selected | |||||||
| MAS5 | 30a | 10b (3.6e-06c) | 6 (7.6e-06) | 10 (6.2e-04) | 10 (4.9e-06) | 10 (3.6e-06) | 11 (3.6e-06) | 17 (2.3e-21) |
| 50 | 21 (1.4e-05) | 13 (1.6e-05) | 25 (2.3e-03) | 22 (1.0e-05) | 21 (1.2e-05) | 22 (7.8e-06) | 32 (6.0e-13) | |
| 100 | 56 (3.5e-05) | 40 (1.5e-04) | 59 (4.1e-03) | 57 (2.2e-05) | 56 (2.6e-05) | 56 (1.9e-05) | 61 (3.3e-09) | |
| 139 (2.3e-03) | 94 (1.1e-04) | 65 (3.3e-04) | 96 (9.2e-03) | 95 (8.4e-05) | 94 (9.3e-05) | 94 (6.1e-05) | 76 (4.1e-07) | |
| RMA | 30 | 8d (1.3e-05) | 7 (2.5e-05) | 9 (1.9e-03) | 8 (7.5e-06) | 8 (9.7e-06) | 8 (5.6e-06) | 20 (1.8e-18) |
| 50 | 24 (4.3e-05) | 13 (5.0e-05) | 26 (3.4e-03) | 24 (2.8e-05) | 24 (3.2e-05) | 24 (2.0e-05) | 30 (1.8e-11) | |
| 100 | 60 (1.4e-04) | 42 (3.3e-04) | 63 (1.1e-02) | 60 (9.8e-05) | 60 (1.3e-04) | 59 (8.0e-05) | 62 (3.5e-07) | |
| 139 | 85 (2.5e-04) | 69 (7.4e-04) | 92 (1.9e-02) | 92 (2.0e-04) | 89 (2.2e-04) | 85 (1.5e-04) | 77 (2.2e-05) |
a The top 30 probesets were selected by each method.
b 10 probesets overlapped between the top 30 probesets selected by GFC and the top 30 probesets selected by , when data were summarized using MAS5.
c The WBH-FDR for the top 30 probesets selected by was 3.6e-06.
d 8 probesets overlapped between the top 30 probesets selected by GFC (data normalized by MAS5) and the top 30 probesets selected by (data normalized by RMA).
e Sensitivity parameter ν = 0.05.
Comparison of our methods and Haslett et al. [16]. Identification of differentially-expressed probesets validated by RT-PCR: given a chosen number of probesets selected by GFC, and the number of probesets validated by RT-PCR within this set, how many of the RT-PCR probesets were selected by the methods in this paper.
| Method | # of probesets selected: | |||||||
| MAS5 | 30a : 8b | 5c | 1 | 5 | 5 | 5 | 5 | 4 |
| 50 : 11 | 8 | 2 | 7 | 7 | 8 | 7 | 6 | |
| 100 : 12 | 11 | 3 | 10 | 10 | 11 | 10 | 11 | |
| 139 : 13 | 12 | 4 | 11 | 11 | 12 | 12 | 12 | |
| RMA | 30 : N/A | 2d | 1 | 4 | 2 | 2 | 2 | 6 |
| 50 : N/A | 6 | 2 | 7 | 6 | 6 | 6 | 7 | |
| 100 : N/A | 9 | 4 | 9 | 9 | 9 | 8 | 8 | |
| 139 : N/A | 10 | 4 | 11 | 10 | 10 | 10 | 11 |
The top 30 probesets were selected by each method.
b 8 probesets in the top 30 were validated by RT-PCR in Haslett et al. [16].
c 5 RT-PCR probesets were among the top 30 probesets selected by . The 5 RT-PCR probesets are among the 8 RT-PCR probesets selected by GFC.
d 2 RT-PCR probesets were among the top 30 probesets selected by (data normalized by RMA). The 2 RT-PCR probesets are among the 8 RT-PCR probesets selected by GFC, using the MAS5 normalization.
e Sensitivity parameter ν = 0.05.
Figure 3Receiver Operator Characteristic (ROC) plots for tests of differential expression in Choe's spiked-in data summarized by MAS5. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.
Figure 4Receiver Operator Characteristic (ROC) plots for tests of differential expression in Choe's spiked-in data was summarized by RMA. (a) p-values unadjusted for multiple testing, (b) p-values adjusted by the weighted Benjamini and Hochberg (WBH) multiple testing method.
Area under the curves (AUCs) of Choe's spiked-in data (ν = 0.05)
| Method | Multiple Testing | Quality Measures | ||||||
| RMA | Unadjust | 0.800 | 0.881 | 0.885 | 0.882 | 0.850 | 0804 | 0.854 |
| WBH | 0.779 | 0.871 | 0.874 | 0.868 | 0.832 | 0.776 | 0.847 | |
| MAS5 | Unadjust | 0.815 | 0.889 | 0.901 | 0.896 | 0.869 | 0.814 | 0.922 |
| WBH | 0.800 | 0.877 | 0.895 | 0.884 | 0.856 | 0.789 | 0.653 | |
Sensitivity and specificity for detecting differentially expressed genes, as a function of the sensitivity parameter ν
| Sensitivity | Specificity | Sensitivity | Specificity | |
| 0.6 | 0.9996 | 0.7562 | 0.9998 | 0.7207 |
| 0.5 | 0.9995 | 0.7763 | 0.9997 | 0.7372 |
| 0.4 | 0.9981 | 0.7930 | 0.9994 | 0.7490 |
| 0.3 | 0.9967 | 0.8070 | 0.9990 | 0.7590 |
| 0.2 | 0.9922 | 0.8211 | 0.9984 | 0.7668 |
| 0.1 | 0.9831 | 0.8373 | 0.9956 | 0.7752 |
| 0.05 | 0.9660 | 0.8507 | 0.9936 | 0.7812 |
| 0.01 | 0.9167 | 0.8665 | 0.9413 | 0.7873 |
Simulation structure of Affymetrix microarray data
| Group A | Group B | ||||||
| Array 1 | ...... | Array | Array 1 | ...... | Array | ||
| Up-regulated Gene Group | |||||||
| ... | |||||||
| Down-regulated Gene Group | |||||||
| ... | |||||||
| Non-differentially Expressed Gene Group | |||||||
| ... | |||||||
| Non- Expressed Gene Group | |||||||
| ... | |||||||