| Literature DB >> 25392694 |
Bahman Afsari1, Donald Geman2, Elana J Fertig3.
Abstract
Analysis of gene sets can implicate activity in signaling pathways that is responsible for cancer initiation and progression, but is not discernible from the analysis of individual genes. Multiple methods and software packages have been developed to infer pathway activity from expression measurements for set of genes targeted by that pathway. Broadly, three major methodologies have been proposed: over-representation, enrichment, and differential variability. Both over-representation and enrichment analyses are effective techniques to infer differentially regulated pathways from gene sets with relatively consistent differentially expressed (DE) genes. Specifically, these algorithms aggregate statistics from each gene in the pathway. However, they overlook multivariate patterns related to gene interactions and variations in expression. Therefore, the analysis of differential variability of multigene expression patterns can be essential to pathway inference in cancers. The corresponding methodologies and software packages for such multivariate variability analysis of pathways are reviewed here. We also introduce a new, computationally efficient algorithm, expression variation analysis (EVA), which has been implemented along with a previously proposed algorithm, Differential Rank Conservation (DIRAC), in an open source R package, gene set regulation (GSReg). EVA inferred similar pathways as DIRAC at reduced computational costs. Moreover, EVA also inferred different dysregulated pathways than those identified by enrichment analysis.Entities:
Keywords: gene expression; gene set analysis; multivariate analysis; variability analysis
Year: 2014 PMID: 25392694 PMCID: PMC4218688 DOI: 10.4137/CIN.S14066
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Examples of software available for gene set analysis, divided into three major families of algorithms: over-representation, enrichment, and differential variability analyses.
| ANALYSIS FAMILY | METHODS | AVAILABILITY | REFERENCE |
|---|---|---|---|
| Over-representation | GeneMAPP | ||
| GoMiner | |||
| GatiGo | |||
| Gostat | |||
| FunAssociate | |||
| GOToolBox | |||
| GeneMergeGOEAST | |||
| ClueGo | |||
| FunSpec | |||
| Go:TermFinder | |||
| WebGestalt | |||
| agriGo | |||
|
| |||
| Enrichment | GSEA | ||
| SAFE | Bioconductor (safe) | ||
| LIMMA | Bioconductor (LIMMA) | ||
| DAVID | |||
| TopGO | Bioconductor (topGo) | ||
| Gage | Bioconductor (gage) | ||
| sigPathway | Bioconductor (sigPathway) | ||
|
| |||
| Differential variability | |||
| GINEA | No implementation | ||
| IB-GSA | No implementation | ||
| MAVTgsa | CRAN | ||
| synergy | |||
Figure 1Pathway analysis methodologies from gene expression: (A) Over-representation analysis first performs a statistical test for each gene by comparing expression values in phenotypes to identify a set of significantly DE genes, obtaining a gene count N. The procedure then counts the number of DE genes that are also annotated to a specified pathway (N) and calculates a P-value for enrichment of that pathway by testing if N is unusually high relative to N and N (the number of genes in the pathway). (B) Enrichment analysis first assigns an individual DE score to each of the genes annotated to a pathway, and aggregates these into a pathway score Z. A similar score is computed for a null distribution, Z. For example, this null distribution may be defined empirically from the DE score for alternative sets of genes or permuted sample labels. Enrichment analysis forms a pathway statistic by comparing the distribution of DE scores in Z to that of DE scores in Z. (C) Differential variability analysis defines a statistic to measure variability of the expression of pathway genes for samples from a given phenotype, denoted by V1 and V2 for phenotypes 1 and 2, respectively. If the variability between two phenotypes is significantly high (ie, |V1 − V2| >> 0), the pathway is identified as dysregulated.
Figure 2Comparison of dysregulated pathways identified by (A) DIRAC and (B) EVA in comparing head and neck squamous cell carcinoma samples (y-axis) and normal samples (x-axis). Hence, the pathways shown above the line are those with significantly (P-value <0.05) higher variability in tumor than normal samples, and those below the line have significantly higher variability in normal samples.
Figure 3P-value comparison of DIRAC and EVA: Each circle represents a pathway. x-axis and y-axis represent DIRAC and EVA P-values, respectively.