| Literature DB >> 19192285 |
Marit Ackermann1, Korbinian Strimmer.
Abstract
BACKGROUND: Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear.Entities:
Mesh:
Year: 2009 PMID: 19192285 PMCID: PMC2661051 DOI: 10.1186/1471-2105-10-47
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview over statistical algorithms for the analysis of gene set enrichment.
| Draghici | |
| Mootha | |
| Pavlidis | |
| Kim and Volsky [ | |
| Newton | |
| Efron [ | |
| Goeman | |
| Mansmann und Meister [ | |
| Kong | |
| Rahnenführer | |
| Goeman and Bühlmann [ |
Figure 1Schematic overview of the modular structure underlying procedures for gene set enrichment analysis.
The effect of choice of gene level statistic and of a corresponding transformation on the detection rate.
| set 1 | 0.94 | 0.94 | 0.94 | 0.63 | 0.67 | 0.74 | 0.84 | 0.84 | 0.84 |
| set 2 | 1.00 | 1.00 | 1.00 | 0.90 | 0.92 | 0.95 | 1.00 | 1.00 | 1.00 |
| set 3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
| set 4 | 0.79 | 0.81 | 0.83 | 0.40 | 0.44 | 0.50 | 0.54 | 0.54 | 0.54 |
| set 5 | 0.95 | 0.96 | 0.95 | 0.34 | 0.41 | 0.44 | 0.38 | 0.41 | 0.38 |
| set 6 | 0.00 | 0.00 | 0.00 | 0.86 | 0.85 | 0.88 | 0.95 | 0.96 | 0.95 |
| set 7 | 0.01 | 0.01 | 0.00 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| set 8 | 0.00 | 0.00 | 0.00 | 0.70 | 0.70 | 0.75 | 0.74 | 0.75 | 0.74 |
| set 9 | 0.00 | 0.01 | 0.00 | 0.81 | 0.90 | 0.89 | 0.80 | 0.82 | 0.80 |
| set 1 | 0.62 | 0.61 | 0.55 | 0.49 | 0.48 | 0.46 | |||
| set 2 | 0.81 | 0.87 | 0.73 | 0.77 | 0.78 | 0.71 | |||
| set 3 | 0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | |||
| set 4 | 0.38 | 0.44 | 0.34 | 0.32 | 0.36 | 0.32 | |||
| set 5 | 0.27 | 0.33 | 0.20 | 0.23 | 0.26 | 0.20 | |||
| set 6 | 0.83 | 0.85 | 0.76 | 0.71 | 0.66 | 0.69 | |||
| set 7 | 1.00 | 1.00 | 0.99 | 0.99 | 0.99 | 0.95 | |||
| set 8 | 0.63 | 0.66 | 0.58 | 0.61 | 0.60 | 0.55 | |||
| set 9 | 0.71 | 0.73 | 0.65 | 0.74 | 0.75 | 0.66 | |||
The values indicated are the proportion of significantly enriched gene sets (p-values ≤ 0.05) for various combinations of test statistics and their transformations, with fixed global statistic (= mean) and the use of resampling for computing the significance.
Impact of choice of gene set statistics on detecting gene set enrichment.
| set 1 | 0.67 | 0.67 | 0.82 | 0.56 | 0.68 | 0.84 |
| set 2 | 0.92 | 0.93 | 0.98 | 0.65 | 0.94 | 1.00 |
| set 3 | 0.01 | 0.01 | 0.00 | 0.00 | 0.01 | 0.01 |
| set 4 | 0.44 | 0.45 | 0.57 | 0.35 | 0.45 | 0.54 |
| set 5 | 0.41 | 0.41 | 0.32 | 0.22 | 0.42 | 0.41 |
| set 6 | 0.85 | 0.86 | 0.94 | 0.80 | 0.85 | 0.96 |
| set 7 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 |
| set 8 | 0.70 | 0.71 | 0.78 | 0.65 | 0.84 | 0.74 |
| set 9 | 0.90 | 0.86 | 0.81 | 0.69 | 0.99 | 0.80 |
The indicated values correspond to the proportion of p-values ≤ 0.05 for squared moderated t and resampling.
Comparison of methods for assigning significance.
| set 1 | 0.82 | 0.67 | 0.59 | 0.66 |
| set 2 | 1.00 | 0.92 | 1.00 | 0.92 |
| set 3 | 0.03 | 0.01 | 0.06 | 0.01 |
| set 4 | 0.60 | 0.44 | 0.51 | 0.43 |
| set 5 | 0.85 | 0.41 | 0.90 | 0.41 |
| set 6 | 0.96 | 0.85 | 0.83 | 0.84 |
| set 7 | 1.00 | 1.00 | 1.00 | 1.00 |
| set 8 | 0.90 | 0.70 | 0.79 | 0.68 |
| set 9 | 0.99 | 0.90 | 1.00 | 0.87 |
The values indicated are the proportion of p-values ≤ 0.05 using the gene set statistic mean of the individual squared moderated t-statistics.
Performance of the globaltest.
| set 1 | 0.59 | 0.66 | 0.61 | 0.63 |
| set 2 | 1.00 | 0.94 | 1.00 | 0.93 |
| set 3 | 0.02 | 0.00 | 0.05 | 0.00 |
| set 4 | 0.46 | 0.44 | 0.49 | 0.44 |
| set 5 | 0.85 | 0.42 | 0.91 | 0.43 |
| set 6 | 0.82 | 0.85 | 0.80 | 0.84 |
| set 7 | 1.00 | 1.00 | 1.00 | 1.00 |
| set 8 | 0.74 | 0.69 | 0.76 | 0.68 |
| set 9 | 0.99 | 0.86 | 1.00 | 0.89 |
The indicated values are the proportion of p-values ≤ 0.05.
Performance of the Hotelling approach using a shrinkage correlation matrix.
| set 1 | 0.09 | 0.18 | 0.08 |
| set 2 | 0.92 | 1.00 | 0.92 |
| set 3 | 0.01 | 0.05 | 0.01 |
| set 4 | 0.06 | 0.16 | 0.05 |
| set 5 | 0.41 | 0.91 | 0.40 |
| set 6 | 0.25 | 0.26 | 0.22 |
| set 7 | 1.00 | 1.00 | 1.00 |
| set 8 | 0.25 | 0.51 | 0.25 |
| set 9 | 0.89 | 1.00 | 0.87 |
The indicated values are the proportion of p-values ≤ 0.05.
Figure 2Analysis of p53 data set. Left: Number of significant gene sets in dependence of p-value cutoff and choice of gene set statistic. On the gene level, the squared moderated t-statistic was employed. Right: Bar plot for p-value cutoff 0.01.
Figure 3Distribution of correlation across the 290 gene sets investigated for the p53 data. Top: histogram of averaged pairwise correlations. Bottom: histogram of averaged absolute values.
Figure 4Analysis of Hedenfalk data set. Left: Number of significant gene sets in dependence of p-value cutoff and choice of gene set statistic. On the gene level, the squared moderated t-statistic was employed. Right: Bar plot for p-value cutoff 0.01.
Top scoring gene sets resulting from the analysis of the Hedenfalk data.
| Gene set | p-value | |
| 1 | breast_cancer_estrogen_signalling | 0.000 |
| 2 | cell_surface_receptor_linked_signal_transduction | 0.000 |
| 3 | insulin_signalling | 0.000 |
| 4 | p53_signalling | 0.000 |
| 5 | pparaPathway | 0.000 |
| 6 | VOXPHOS | 0.000 |
| 7 | RAP_UP | 0.000 |
| 8 | PROLIF_GENES | 0.000 |
| 9 | UPREG_BY_HOXA9 | 0.000 |
| 10 | cell_adhesion | 0.001 |
| 11 | CR_CAM | 0.001 |
| 12 | CR_DEATH | 0.001 |
| 13 | il2rbPathway | 0.001 |
| 14 | ST_Tumor_Necrosis_Factor_Pathway | 0.001 |
| 15 | tcrPathway | 0.001 |
| 16 | HTERT_UP | 0.001 |
| 17 | CBF_LEUKEMIA_DOWNING_AML | 0.001 |
| 18 | CR_SIGNALLING | 0.002 |
| 19 | ghPathway | 0.002 |
| 20 | ST_B_Cell_Antigen_Receptor | 0.002 |
| 21 | tpoPathway | 0.002 |
| 22 | GO_0005739 | 0.002 |
| 23 | LEU_UP | 0.002 |
| 24 | Cell_Cycle | 0.003 |
| 25 | GLUT_UP | 0.003 |
| 26 | FRASOR_ER_DOWN | 0.003 |
| 27 | ANDROGEN_UP_GENES | 0.003 |
| 28 | fmlppathway | 0.004 |
| 29 | hivnefPathway | 0.004 |
| 30 | biopeptidesPathway | 0.005 |
| 31 | cell_adhesion_molecule_activity | 0.005 |
| 32 | SIG_InsulinReceptorPathwayInCardiacMyocytes | 0.005 |
| 33 | ST_Integrin_Signaling_Pathway | 0.005 |
| 34 | drug_resistance_and_metabolism | 0.006 |
| 35 | ST_Differentiation_Pathway_in_PC12_Cells | 0.007 |
| 36 | gleevecPathway | 0.008 |
| 37 | DNA_DAMAGE_SIGNALLING | 0.009 |
| 38 | ST_ERK1_ERK2_MAPK_Pathway | 0.009 |
| 39 | mRNA_splicing | 0.010 |
| 40 | nfatPathway | 0.010 |
| 41 | HUMAN_CD34_ENRICHED_TF_JP | 0.010 |
As gene set statistic we used the mean combined with the squared moderated t-statistic on the gene level and sample label permutation.
Contingency table for testing gene set enrichment.