| Literature DB >> 19725948 |
Luca Abatangelo1, Rosalia Maglietta, Angela Distaso, Annarita D'Addabbo, Teresa Maria Creanza, Sayan Mukherjee, Nicola Ancona.
Abstract
BACKGROUND: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.Entities:
Mesh:
Year: 2009 PMID: 19725948 PMCID: PMC2746222 DOI: 10.1186/1471-2105-10-275
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data sets used in our experiments. The breast cancer data set is annotated by gene symbols.
| Myc | [ | 10 vs 10 | 54675 |
| Ras | [ | 10 vs 10 | 54675 |
| E2F3 | [ | 9 vs 10 | 54675 |
| Src | [ | 7 vs 10 | 54675 |
| [ | 9 vs 10 | 54675 | |
| P53 | NCI-60 | 12 vs 50 | 12625 |
| Hypoxia | [ | 6 vs 6 | 54675 |
| Breast | [ | 28 vs 32 | 15017 |
| Lung | [ | 45 vs 48 | 54675 |
Results of simulation study: comparison of RS, GSEA and GLAPA. P-values for the first gene set for the three methods (columns) and five different scenarios (rows) described in the text.
| 1 | 0.0177 | 0.0069 | 0.0678 | 0.0135 | 0.0173 | 0.0057 | 0.0002 | 0.0001 |
| 2 | 0.0005 | 0.0002 | 0.0152 | 0.0036 | 0.0007 | 0.0003 | 0.0002 | 0.0001 |
| 3 | 0.0004 | 0.0002 | 0.0064 | 0.0026 | 0.0003 | 0.0002 | 0 | 0 |
| 4 | 0.0126 | 0.0081 | 0.0073 | 0.0025 | 0.0002 | 9e-05 | 0.0166 | 0.0044 |
| 5 | 0 | 0 | 0.0001 | 8e-05 | 0 | 0 | 0.0877 | 0.0151 |
Results of simulation study: Fisher's exact test.
| mean | se | mean | se | mean | se | mean | se | mean | se | |
| 1 | 0.4117 | 0.0901 | 0.2789 | 0.0773 | 0.1509 | 0.0411 | 0.1243 | 0.0406 | 0.1137 | 0.0427 |
| 2 | 0.1961 | 0.0795 | 0.0342 | 0.0171 | 0.0270 | 0.0217 | 0.0287 | 0.0265 | 0.0140 | 0.0120 |
| 3 | 0.0085 | 0.0033 | 0.0019 | 0.0011 | 0.0024 | 0.0010 | 0.0034 | 0.0020 | 0.0053 | 0.0034 |
| 4 | 0.0030 | 0.0017 | 0.0016 | 0.0006 | 0.0039 | 0.0018 | 0.0081 | 0.0037 | 0.0113 | 0.0039 |
| 5 | 2e-04 | 0.0002 | 1e-06 | 0 | 1e-06 | 0 | 6e-07 | 0 | 7e-07 | 0 |
P-values for the first gene set in the five different scenarios (rows) described in the text. In each column we report the significance level (α) adopted in t-test to find DE genes.
Results for the P53 gene sets in the Wild-Type/P53 mutant data set.
| 17 | 2 | 0.000 | 3 | 0.004 | 0.000 | 2 | 0.000 | |
| 30 | 4 | 0.000 | 9 | 0.013 | 0.000 | 8 | 0.000 | |
| 41 | 10 | 0.000 | 10 | 0.019 | 0.000 | 5 | 0.000 | |
| 52 | 9 | 0.001 | 11 | 0.018 | 0.000 | 6 | 0.000 | |
| 38 | 21 | 0.001 | 15 | 0.022 | 0.000 | 4 | 0.000 | |
| 40 | 137 | 0.068 | 66 | 0.084 | 0.008 | 1 | 0.000 | |
| P53_SIGNALING | 163 | 59 | 0.023 | 60 | 0.075 | 0.000 | 203 | 0.063 |
| P21_P53_EARLY_DN | 14 | 75 | 0.103 | 619 | 0.348 | 0.329 | 47 | 0.017 |
| P21_P53_ANY_DN | 46 | 470 | 0.294 | 1212 | 0.548 | 0.736 | 66 | 0.067 |
| P21_P53_LATE_DN | 11 | 661 | 0.357 | 1218 | 0.527 | 0.650 | 454 | 0.323 |
| P21_P53_MIDDLE_DN | 21 | 1381 | 0.744 | 1243 | 0.544 | 0.683 | 440 | 0.307 |
| KANNAN_P53_DN | 25 | 1478 | 0.893 | 1015 | 0.476 | 0.569 | 944 | 0.548 |
Results for the Hypoxia gene sets in the Hypoxia/normal data set.
| 342 | 1 | 0.000 | 19 | 0.074 | 0.000 | 2 | 0.000 | |
| 51 | 3 | 0.000 | 18 | 0.081 | 0.004 | 9 | 0.000 | |
| 14 | 31 | 0.000 | 1 | 0.054 | 0.000 | 4 | 0.000 | |
| 305 | 11 | 0.001 | 30 | 0.083 | 0.000 | 7 | 0.004 | |
| 196 | 7 | 0.003 | 42 | 0.083 | 0.000 | 11 | 0.002 | |
| 38 | 10 | 0.000 | 31 | 0.086 | 0.000 | 26 | 0.014 | |
| 587 | 4 | 0.000 | 64 | 0.086 | 0.000 | 6 | 0.000 | |
| 98 | 8 | 0.004 | 66 | 0.088 | 0.006 | 37 | 0.013 | |
| 220 | 5 | 0.002 | 116 | 0.111 | 0.005 | 8 | 0.000 | |
| 85 | 38 | 0.001 | 135 | 0.093 | 0.016 | 59 | 0.011 | |
| 105 | 2 | 0.002 | 241 | 0.128 | 0.037 | 29 | 0.010 | |
| MANALO_HYPOXIA_DN | 211 | 133 | 0.178 | 253 | 0.125 | 0.032 | 126 | 0.094 |
| HIFPATHWAY | 42 | 49 | 0.018 | 611 | 0.196 | 0.142 | 31 | 0.008 |
| P53HYPOXIAPATHWAY | 57 | 285 | 0.070 | 249 | 0.120 | 0.053 | 238 | 0.126 |
| VHL_NORMAL_UP | 1251 | 26 | 0.083 | 328 | 0.134 | 0.008 | 522 | 0.081 |
| RCC_NL_UP | 1529 | 15 | 0.033 | 239 | 0.116 | 0.001 | 727 | 0.222 |
| VHL_RCC_UP | 288 | 374 | 0.188 | 130 | 0.118 | 0.003 | 539 | 0.229 |
| HYPOXIA_RCC_NOVHL_UP | 159 | 528 | 0.201 | 177 | 0.111 | 0.014 | 560 | 0.260 |
| HYPOXIA_RCC_UP | 330 | 716 | 0.304 | 415 | 0.136 | 0.092 | 561 | 0.177 |
Deregulation of the five oncogenes as measured by the three methods.
| Myc_up | 119 | 60 | 0.083 | 997 | 0.005 | 0.831 | 7 | 0.000 |
| Myc_down | 129 | 1099 | 0.629 | 242 | 0.006 | 0.199 | 6 | 0.000 |
| Ras_up | 195 | 1181 | 0.726 | 842 | 0.007 | 0.998 | 6 | 0.019 |
| Ras_down | 153 | 439 | 0.442 | 1216 | 0.004 | 0.991 | 5 | 0.006 |
| E2F3_up | 138 | 35 | 0.088 | 79 | 0.012 | 0.472 | 4 | 0.000 |
| E2F3_down | 160 | 994 | 0.619 | 1111 | 0.016 | 0.965 | 10 | 0.008 |
| Src_up | 28 | 182 | 0.186 | 1409 | 0.018 | 0.513 | 17 | 0.060 |
| Src_down | 45 | 41 | 0.104 | 781 | 0.019 | 0.303 | 9 | 0.032 |
| 43 | 231 | 0.198 | 1588 | 0.063 | 0.952 | 4 | 0.006 | |
| 55 | 87 | 0.105 | 495 | 0.016 | 0.211 | 6 | 0.011 | |
Figure 1Overlaps of the ranks of gene sets across the three methods in a) P53, b) hypoxia, c) breast cancer and d) lung cancer data sets. x-axis represents the number of top gene sets considered and y-axis represents the overlap in each pairwise comparison.
Number of statistical significant gene sets highlighted by RS with p-value < 0.05 and by GLAPA with p-value1, p-value2 < 0.05.
| P53 | 91 | 35 | 27 |
| Breast | 77 | 47 | 27 |
| Lung | 340 | 76 | 31 |