| Literature DB >> 19455259 |
Abstract
Consider a gene expression array study comparing two groups of subjects where the goal is to explore a large number of genes in order to select for further investigation a subset that appear to be differently expressed. There has been much statistical research into the development of formal methods for designating genes as differentially expressed. These procedures control error rates such as the false detection rate or family wise error rate. We contend however that other statistical considerations are also relevant to the task of gene selection. These include the extent of differential expression and the strength of evidence for differential expression at a gene. Using real and simulated data we first demonstrate that a proper exploratory analysis should evaluate these aspects as well as decision rules that control error rates. We propose a new measure called the mp-value that quantifies strength of evidence for differential expression. The mp-values are calculated with a resampling based algorithm taking into account the multiplicity and dependence encountered in microarray data. In contrast to traditional p-values our mp-values do not depend on specification of a decision rule for their definition. They are simply descriptive in nature. We contrast the mp-values with multiple testing p-values in the context of data from a breast cancer prognosis study and from a simulation model.Entities:
Year: 2008 PMID: 19455259 PMCID: PMC2675859
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
The classic hypothesis testing frame work for gene selection. Non-null hypotheses correspond to differentially expressed genes.
| #null hypotheses | |||
| #non-null hypotheses | |||
| total |
Simulated data for 50 cases and 50 controls. Genes are ranked using the TPR(0.20). Differentially expressed genes ranked 1 through 100. Non-differentially expressed genes ranked 101–2000. P-values based on the TPR statistic have superscript T while those based on the AUC statistic have superscript A. BHp-value uses the Benjamini-Hochberg (1995) rejection rule while Sp-value uses that of Storey (2002).
| 1 | 0.88 | 0 | 0 | 0 | 0.884 | 0 | 0 | 0 |
| 2 | 0.86 | 0 | 0 | 0 | 0.880 | 0 | 0 | 0 |
| 3 | 0.86 | 0 | 0 | 0 | 0.870 | 0 | 0 | 0 |
| 4 | 0.86 | 0 | 0 | 0 | 0.890 | 0 | 0 | 0 |
| 5 | 0.84 | −0 | 0 | 0 | 0.864 | 0 | 0 | 0 |
| 6 | 0.84 | 0 | 0 | 0 | 0.863 | 0 | 0 | 0 |
| 7 | 0.82 | 0 | 0 | 0 | 0.869 | 0 | 0 | 0 |
| 8 | 0.8 | 0 | 0 | 0 | 0.912 | 0 | 0 | 0 |
| 9 | 0.8 | 0 | 0 | 0 | 0.886 | 0 | 0 | 0 |
| 10 | 0.8 | 0 | 0 | 0 | 0.901 | 0 | 0 | 0 |
| 11 | 0.8 | 0 | 0 | 0 | 0.858 | 0 | 0 | 0 |
| 12 | 0.8 | 0 | 0 | 0 | 0.847 | 0 | 0 | 0 |
| 13 | 0.78 | 0 | 0 | 0 | 0.869 | 0 | 0 | 0 |
| 88 | 0.6 | 0.03 | 0.003 | 0.003 | 0.752 | 0.006 | 0 | 0 |
| 89 | 0.6 | 0.03 | 0 | 0 | 0.721 | 0.061 | 0.001 | 0.001 |
| 90 | 0.6 | 0.03 | 0.004 | 0.004 | 0.771 | 0.003 | 0 | 0 |
| 91 | 0.58 | 0.06 | 0.001 | 0.001 | 0.738 | 0.02 | 0 | 0 |
| 92 | 0.58 | 0.06 | 0.004 | 0.004 | 0.752 | 0.006 | 0 | 0 |
| 93 | 0.58 | 0.06 | 0.039 | 0.037 | 0.760 | 0.005 | 0 | 0 |
| 94 | 0.56 | 0.113 | 0.025 | 0.024 | 0.678 | 0.604 | 0.021 | 0.02 |
| 95 | 0.56 | 0.113 | 0.005 | 0.005 | 0.746 | 0.01 | 0 | 0 |
| 96 | 0.56 | 0.114 | 0.018 | 0.017 | 0.751 | 0.006 | 0 | 0 |
| 97 | 0.56 | 0.114 | 0.014 | 0.013 | 0.760 | 0.005 | 0 | 0 |
| 98 | 0.56 | 0.114 | 0.005 | 0.004 | 0.734 | 0.023 | 0.001 | 0.001 |
| 99 | 0.56 | 0.113 | 0.003 | 0.003 | 0.736 | 0.023 | 0.001 | 0 |
| 100 | 0.56 | 0.114 | 0.007 | 0.007 | 0.776 | 0.002 | 0 | 0 |
| 101 | 0.52 | 0.375 | 0.010 | 0.010 | 0.645 | 0.985 | 0.119 | 0.116 |
| 102 | 0.52 | 0.375 | 0.070 | 0.067 | 0.718 | 0.08 | 0.002 | 0.002 |
| 103 | 0.46 | 0.924 | 0.157 | 0.150 | 0.615 | 1 | 0.347 | 0.34 |
| 104 | 0.46 | 0.925 | 0.083 | 0.080 | 0.653 | 0.96 | 0.08 | 0.079 |
| 105 | 0.44 | 0.976 | 0.307 | 0.293 | 0.637 | 0.998 | 0.169 | 0.165 |
| 106 | 0.44 | 0.976 | 0.791 | 0.755 | 0.614 | 1 | 0.353 | 0.346 |
| 107 | 0.44 | 0.976 | 0.124 | 0.118 | 0.573 | 1 | 0.758 | 0.743 |
| 108 | 0.44 | 0.976 | 0.402 | 0.383 | 0.576 | 1 | 0.727 | 0.712 |
| 109 | 0.44 | 0.976 | 0.321 | 0.306 | 0.617 | 1 | 0.326 | 0.32 |
| 110 | 0.42 | 0.997 | 0.619 | 0.591 | 0.608 | 1 | 0.425 | 0.416 |
Figure 1.Histogram of AUC for all genes from the breast cancer study (top); Histogram of TPF(0.2) for all genes from the breast cancer study (bottom).
Results from the breast cancer prognosis study. Genes are ranked according to TPR(0.20) and result displayed for the top 20. The same notation as in Table 2 is used.
| 1 | 196 | 0.676 | 0.009 | 0.525 | 0.245 | 0.743 | 0.169 | 0.04 | 0.007 |
| 2 | 2348 | 0.676 | 0.009 | 0.073 | 0.034 | 0.706 | 0.611 | 0.044 | 0.008 |
| 3 | 208 | 0.647 | 0.028 | 0.243 | 0.113 | 0.801 | 0.002 | 0.006 | 0.001 |
| 4 | 4106 | 0.647 | 0.028 | 0.175 | 0.082 | 0.792 | 0.01 | 0.006 | 0.001 |
| 5 | 732 | 0.647 | 0.028 | 0.083 | 0.039 | 0.791 | 0.01 | 0.006 | 0.001 |
| 6 | 1823 | 0.647 | 0.028 | 0.011 | 0.005 | 0.744 | 0.164 | 0.04 | 0.007 |
| 7 | 4682 | 0.647 | 0.028 | 0.152 | 0.071 | 0.724 | 0.363 | 0.042 | 0.007 |
| 8 | 1793 | 0.647 | 0.028 | 0.024 | 0.011 | 0.709 | 0.556 | 0.044 | 0.008 |
| 9 | 1051 | 0.618 | 0.085 | 0.191 | 0.089 | 0.735 | 0.251 | 0.042 | 0.007 |
| 10 | 3816 | 0.618 | 0.085 | 0.595 | 0.277 | 0.725 | 0.353 | 0.042 | 0.007 |
| 11 | 3502 | 0.618 | 0.085 | 0.286 | 0.134 | 0.723 | 0.368 | 0.042 | 0.007 |
| 12 | 3570 | 0.618 | 0.085 | 0.237 | 0.110 | 0.721 | 0.404 | 0.042 | 0.007 |
| 13 | 4610 | 0.618 | 0.085 | 0.191 | 0.089 | 0.711 | 0.529 | 0.043 | 0.008 |
| 14 | 2332 | 0.618 | 0.085 | 0.011 | 0.005 | 0.700 | 0.684 | 0.048 | 0.009 |
| 15 | 1899 | 0.618 | 0.085 | 0.065 | 0.030 | 0.697 | 0.716 | 0.049 | 0.009 |
| 16 | 2603 | 0.618 | 0.085 | 0.243 | 0.113 | 0.689 | 0.818 | 0.056 | 0.01 |
| 17 | 4048 | 0.618 | 0.085 | 0.073 | 0.034 | 0.686 | 0.854 | 0.057 | 0.01 |
| 18 | 4698 | 0.588 | 0.172 | 0.008 | 0.004 | 0.762 | 0.062 | 0.032 | 0.006 |
| 19 | 917 | 0.588 | 0.172 | 0.49 | 0.228 | 0.739 | 0.21 | 0.04 | 0.007 |
| 20 | 936 | 0.588 | 0.172 | 0.274 | 0.128 | 0.732 | 0.274 | 0.042 | 0.007 |