| Literature DB >> 20494976 |
Jignesh R Parikh1, Bertram Klinger, Yu Xia, Jarrod A Marto, Nils Blüthgen.
Abstract
High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20494976 PMCID: PMC2896193 DOI: 10.1093/nar/gkq424
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of the SPEED algorithm. The SPEED algorithm is based on the identification of signature genes that are consistently regulated by specific signaling pathways using publicly available micro-array data. Gene-expression data sets from single-pathway perturbation experiments were manually selected from the GEO database. Next, gene-expression values from the selected database were automatically processed using custom R-scripts, and expression changes were stored as Z-score rank percentiles in the SPEED database. The SPEED web server extracts signature genes per pathway on the fly based on user-specified parameters describing the level of differential expression (Z-score percentile; ex: top 1%) and the level of consistency across experiments (percentage of experimental data sets where a gene is differentially expressed; ex: at least 20%). Users can compare their own gene sets against the extracted signature genes to identify modulated upstream signaling pathways.
Figure 2.Average number of SPEED signature genes. SPEED signature genes are sensitive to user-specified parameters. Average signature genes per pathway without the uniqueness constraint are listed as a function of Z-score percentile and percent overlap across experiments. The heat map corresponds to log10 of the average number of signature genes. The sensitivity and specificity is noted in parenthesis. The values for the default parameter set of Z-score percentile ≤1% and percent overlap ≥20% are boxed. Our choice of default parameter set, as well as recommended parameters (in black text) are determined ad hoc based on their biological meaning, resulting number of signature genes and performance metrics. Only the default pathways are considered here and bottom 50% expressed genes are discarded for all calculations.
Figure 3.Signaling pathway crosstalk demonstrated by overlapping signature genes. The percent overlap between signature genes between all pairs of pathways is displayed as a heat map with higher overlap suggesting greater crosstalk between the respective signaling pathways. Pathway similarity is calculated as the negative log of the percent overlap and similar pathways are grouped using hierarchical clustering. Two major clusters are realized, separating pathways involved in immune response (JAK-STAT, TLR, IL-1 and TNF-α) from pathways controlling cell cycle.
Validation of the SPEED algorithm on gene lists from independent literature sources
| Pathway | Test sets | Correct top-ranking events (sensitivity) | Significantly overrepresented events (sensitivity) | ||
|---|---|---|---|---|---|
| SPEED (%) | GATHER (%) | SPEED (%) | GATHER (%) | ||
| JAK-STAT | 10 | 8 (80) | 1 ( | 10 (100) | 7 (70) |
| TGF-β | 5 | 4 (80) | 2 (40) | 5 (100) | 2 (40) |
| MAPK+PI3K | 6 | 4 (67) | 1 ( | 6 (100) | 6 (100) |
| TLR | 6 | 5 (83) | 3 (50) | 6 (100) | 3 (50) |
| Total | 27 | 21 (78) | 7 ( | 27 (100) | 18 (65) |
SPEED results from 27 literature-derived gene lists for the four default pathways are summarized. As compared to traditional signaling pathway membership analysis using GATHER, SPEED correctly predicts the perturbed signaling pathway as the top-ranking result at a higher rate (78% compared to 29%). SPEED also outperforms GATHER in identifying the correct pathway as one of the significantly overrepresented pathways (FDR ≤ 0.05 for SPEED and no threshold for GATHER).
Figure 4.SPEED analysis of downregulated genes in high-risk AML patients with CEBPa mutations. Screenshot of SPEED results shows TNF-α as the top-ranking upstream signaling pathway for target genes downregulated in high-risk AML patients with CEBPa mutations versus CEBPa wild-type. The pathways are sorted by FDR and graphical outputs aid in interpretation of the results.
Identification of signaling pathways upstream of transcription factor in THP-1 cells
| Top-ranked pathway | Transcription factor |
|---|---|
| MAPK+PI3K | CEBPA ( |
| JAK-STAT | FOXJ3, FOXP1, GFI1 ( |
| TLR | BMI1, CEBPG, GATA2 ( |
| TGF-β | MYB ( |
SPEED was run programatically on lists of differentially expressed genes following 52 transcription factor knockdowns. For 24 transcription factors, the corresponding differentially expressed gene lists had an overrepresentation of at least one upstream signaling pathway. The top-ranking signaling pathway with a minimum FDR of 0.05 is listed for each of the 24 transcription factors. Literature references for transcription factors known to be associated with identified signaling pathways are noted.
aAlthough NRAS is not a transcription factor, the significant overlap with MAPK+PI3K signature genes serves as validation of the SPEED algorithm because NRAS is upstream of both the MAPK and PI3K signaling pathways (25,26).