| Literature DB >> 27694195 |
Monther Alhamdoosh1, Milica Ng1, Nicholas J Wilson1, Julie M Sheridan2,3, Huy Huynh1, Michael J Wilson1, Matthew E Ritchie4,5.
Abstract
Motivation: Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions.Entities:
Mesh:
Substances:
Year: 2017 PMID: 27694195 PMCID: PMC5408797 DOI: 10.1093/bioinformatics/btw623
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1A schematic overview of the EGSEA pipeline for gene set enrichment analysis
EGSEA’s performance at different levels of differential expression
| FC | pCM | Recall | F1-measure | ||||
|---|---|---|---|---|---|---|---|
| Mean | Std | Mean | Std | Mean | Std | ||
| 1.3 | FP | 0.1144 | 0.0442 | 1.0000 | 0.0000 | 0.9387 | 0.0250 |
| LP | 0.0164 | 0.0181 | 1.0000 | 0.0000 | 0.9917 | 0.0093 | |
| MP | 0.0533 | 0.0282 | 1.0000 | 0.0000 | 0.9724 | 0.0149 | |
| SP | 0.0550 | 0.0291 | 1.0000 | 0.0000 | 0.9715 | 0.0154 | |
| SZ | 0.0257 | 0.0244 | 1.0000 | 0.0000 | 0.9868 | 0.0127 | |
| vote | 0.0003 | 0.0025 | 0.9998 | 0.0025 | 0.9998 | 0.0025 | |
| avg | 0.0005 | 0.0035 | 0.9995 | 0.0035 | 0.9995 | 0.0035 | |
| med | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
| 1.5 | FP | 0.0992 | 0.0422 | 1.0000 | 0.0000 | 0.9473 | 0.0234 |
| MP | 0.0318 | 0.0237 | 1.0000 | 0.0000 | 0.9837 | 0.0123 | |
| SP | 0.0334 | 0.0245 | 1.0000 | 0.0000 | 0.9828 | 0.0127 | |
| SZ | 0.0262 | 0.0237 | 1.0000 | 0.0000 | 0.9866 | 0.0123 | |
| WP | 0.0212 | 0.0224 | 1.0000 | 0.0000 | 0.9891 | 0.0115 | |
| vote | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
| avg | 0.0010 | 0.0049 | 0.9990 | 0.0049 | 0.9990 | 0.0049 | |
| med | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
| 1.8 | FP | 0.0869 | 0.0416 | 1.0000 | 0.0000 | 0.9541 | 0.0229 |
| LP | 0.0157 | 0.0171 | 1.0000 | 0.0000 | 0.9920 | 0.0087 | |
| SP | 0.0095 | 0.0137 | 1.0000 | 0.0000 | 0.9952 | 0.0070 | |
| SZ | 0.0159 | 0.0171 | 1.0000 | 0.0000 | 0.9919 | 0.0087 | |
| WP | 0.0235 | 0.0245 | 1.0000 | 0.0000 | 0.9879 | 0.0127 | |
| vote | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
| avg | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
| med | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | |
FC is the differential expression level. FP, LP, MP, SP, SZ and WP stand for the Fisher, logitp, average, summation, summation of Z and Wilkinson P-value combining methods (pCMs), respectively. The best performing pCM is highlighted in bold for each FC configuration.
EGSEA’s performance using a variable number of base methods with simulated FCs at the level of 1.3
| ID | FDR | Recall | F1-measure | |||
|---|---|---|---|---|---|---|
| Mean | Std | Mean | Std | Mean | Std | |
| E1 | ||||||
| E2 | 0.0175 | 0.0210 | 1.0000 | 0.0000 | 0.9911 | 0.0108 |
| E3 | 0.0219 | 0.0231 | 1.0000 | 0.0000 | 0.9888 | 0.0119 |
| E4 | 0.0191 | 0.0221 | 1.0000 | 0.0000 | 0.9902 | 0.0114 |
| E5 | 0.0184 | 0.0212 | 1.0000 | 0.0000 | 0.9906 | 0.0109 |
Wilkinson’s method is used to combine P-values. The experiment E1 combines the eleven methods, E2 excludes ora, E3 excludes ora, gage and padog, E4 includes only camera, safe, zscore, ssgsea and fry and E5 includes only camera, gsva, zscore and fry. The best performing configuration is highlighted in bold.
Fig. 2Multidimensional scaling plot based on the gene set rankings of the KEGG signalling and disease collections for ten GSE methods applied to the Human IL-13 versus control dataset. Methods that perform similarly on this dataset cluster together
The top ten gene sets retrieved by EGSEA for the human PBMC data, based on the Average Rank scoring function
| IL-13 Stimulated versus Control | IL-13R Antagonist versus IL-13 Stimulated | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene Set ID | Gene Set Name | Ranks | Gene Set ID | Gene Set Name | Ranks | ||||||
| Vote | Avg. | Med. | Min. | Vote | Avg. | Med. | Min. | ||||
| hsa05310 | Asthma (1) | 15 | 26.2 | 13 | 9 | hsa04621 | NOD-like recept. (6) | 20 | 29.3 | 25.5 | 9 |
| hsa04672 | Immune net. (8) | 20 | 31.6 | 26.5 | 12 | hsa04620 | Toll-like recept. (7) | 15 | 31 | 17 | 4 |
| hsa05416 | Viral myocarditis (2) | 20 | 32.4 | 20.5 | 7 | hsa05310 | Asthma (1) | 20 | 32.1 | 17.5 | 9 |
| hsa05146 | Amoebiasis (4) | 65 | 32.9 | 26.5 | 3 | hsa05414 | Dilated cardiomyo. (13) | 45 | 37.5 | 27.5 | 5 |
| hsa04961 | Calcium reabsorp. (11) | 10 | 36.5 | 25.5 | 4 | hsa05166 | HTLV-I infection (3) | 35 | 37.8 | 30.5 | 3 |
| hsa05134 | Legionellosis (5) | 10 | 40.7 | 35.5 | 8 | hsa05416 | Viral myocarditis (2) | 20 | 37.9 | 19 | 2 |
| hsa05166 | HTLV-I infection (3) | 15 | 41.9 | 32.5 | 9 | hsa05134 | Legionellosis (5) | 10 | 40.1 | 26 | 9 |
| hsa05020 | Prion diseases (>20) | 20 | 42.8 | 45.5 | 5 | hsa04623 | DNA-sensing (>20) | 55 | 40.6 | 40.5 | 8 |
| hsa05205 | Proteoglycans (17) | 50 | 46.2 | 44.5 | 15 | hsa05144 | Malaria (9) | 5 | 40.8 | 17.5 | 1 |
| hsa05145 | Toxoplasmosis (15) | 35 | 46.9 | 37.5 | 2 | hsa04064 | NF-kappa B sig. (>20) | 60 | 43.3 | 55.5 | 7 |
Two experimental contrasts were evaluated in this dataset. The gene set rank of the comparative analysis of these two contrasts is given in parentheses in the table below. The FDR is less than 0.05 for all sets.
Fig. 3Visualization of the gene sets retrieved by EGSEA at different levels. (A) Summary plots of EGSEA on the human dataset. The IDs of the top ten pathways based on EGSEA average rank are highlighted in black font and the top five pathways based on EGSEA significance score whose average ranks are not in the top ten ranks are highlighted in blue font. The bubble size indicates the level of pathway significance. The red and blue colours indicate that the majority of gene set genes are up- or down-regulated, respectively. (B) Heat maps of the gene expression fold-changes in three selected gene sets
Comparative analysis results for three contrasts from the mouse mammary cell dataset
| ID | Gene Set | Ranks | Significance Score | Regulation Direction | |||
|---|---|---|---|---|---|---|---|
| Vote | Avg. | Med. | Min. | ||||
| M2573 | Lim Mammary Stem Cell Up | 5 | 725.46 | 8 | 1 | 13.52 | Up |
| M2574 | Lim Mammary Stem Cell Dn | 5 | 464.46 | 8.5 | 1 | 58.29 | Down |
| M2578 | Lim Mammary Luminal Mature Up | 5 | 569.88 | 13.5 | 1 | 59.25 | Down |
| M2575 | Lim Mammary Luminal Progenitor Up | 5 | 734.83 | 61.5 | 1 | 63.04 | Down |
| M17299 | Charafe Breast Cancer Luminal versus Mesenchymal Up | 85 | 545.83 | 73.5 | 3 | 18.06 | Down |
| M6744 | Coldren Gefitinib Resistance Dn | 10 | 663 | 78 | 7 | 19.74 | Down |
| M2761 | Nakayama Soft Tissue Tumors Pca2 Up | 15 | 1054.67 | 84.5 | 4 | 24.06 | Up |
| M2580 | Lim Mammary Luminal Mature Dn | 5 | 480.54 | 105.5 | 3 | 30.93 | Up |
| M19391 | Liu Prostate Cancer Dn | 10 | 715.46 | 132 | 4 | 42.77 | Up |
| M4888 | Zhan Multiple Myeloma Pr Up | 15 | 1075.38 | 150 | 13 | 15.58 | Up |