| Literature DB >> 33858350 |
Chengshu Xie1, Shaurya Jauhari1, Antonio Mora2.
Abstract
BACKGROUND: Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies.Entities:
Keywords: Benchmark; GSEA; Gene set analysis; Pathway analysis
Year: 2021 PMID: 33858350 PMCID: PMC8050894 DOI: 10.1186/s12859-021-04124-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Statistics from GSARefDB v.1.0 (for more recent statistics, visit our website: https://gsa-central.github.io/gsarefdb.html). a Number of GSA publications per year. b Number of publications per type of GSA method. c Number of publications per used programming language. d Number of citations per used programming language. e Website availability. f Number of publications per reported validation method
Benchmarking studies of GSA methods
| References | Scope | Size | Criteria | Best performing methods |
|---|---|---|---|---|
| Naeem et al. [ | ORA and FCS methods | 14 methods | Method’s AUC (evaluated by predicting targets of TFs and miRNAs) | ANOVA, Z-SCORE, and Wilcoxon’s rank sum (WRS) |
| Tarca et al. [ | ORA, FCS, and SS methods | 16 methods | Prioritization, Sensitivity, and FPR | GLOBALTEST and PLAGE (sensitivity), PADOG and ORA (prioritization), and CAMERA (FPR). Author’s general recommendation: PLAGE, GLOBALTEST, and PADOG |
| Bayerlova et al. [ | ORA, FCS, and PT methods | 7 methods | Sensitivity and prioritization (for benchmark), and Sensitivity, specificity, and accuracy (for simulations of pathway overlap) | For benchmark: CePaGSA (sensitivity) and PathNet (prioritization). For simulation -original pathways: CePAGSA (sensitivity), WRS (specificity), and WRS (accuracy). For simulation -non-overlapping pathways: KS (sensitivity), and SPIA, CePaORA, CePaGSA, and PathNet (specificity and accuracy) |
| Jaakkola et al. [ | ORA, FCS, and PT methods | 5 methods | Consistency of significant pathways between datasets, and Sensitivity | SPIA and CePaORA (consistency), and SPIA, CePaORA, and NetGSA (sensitivity). Author’s general recommendation: SPIA |
| De Meyer et al. [ | ORA, FCS, and NI methods | 4 methods | Prioritization, Sensitivity, and Specificity | PADOG (specificity) and BinoX (sensitivity) |
| Lim et al. [ | SS/Pathway-activity methods | 13 methods | Classification performance, preservation of data structure, robustness to noise, and reproducibility between pathway databases | ESEA, Pathifier, SAS, and PADOG (classification tasks), Pathifier and PLAGE (data structure), ssGSEA (robustness), and individPath, Pathifier, and SAS (reproducibility). Author’s general recommendation: Pathifier, SAS, and individPath |
| Nguyen et al. [ | ORA, FCS, and PT methods | 13 methods | In order of importance: Number of biased pathways, Prioritization, Method’s AUC, and sensitivity (evaluated using both disease target pathways and KO data) | GSEA (bias), PADOG (prioritization), ROntoTools (AUC), and CePaGSA (p-values). Author’s general recommendation: ROntoTools |
| Ma et al. [ | FCS, PT, and NI methods | 9 methods | Ranking of empirical powers | DEGraph, followed by PathNet and NetGSA |
| Zyla et al. [ | ORA, FCS, and SS methods | 9 methods | Sensitivity, FPR, prioritization, computational time, and reproducibility | PLAGE (sensitivity), ORA and PADOG (specificity/FPR), PADOG (prioritization), and CERNO (reproducibility) |
| Geistlinger et al. [ | ORA, FCS, and SS methods | 10 methods | Sensitivity, computational time, and phenotype relevance score | Author’s general recommendation: ROAST and GSVA (for self-contained hypothesis). ORA and PADOG (for competitive hypothesis) |
Ten benchmark studies from 2012 to 2020, showing a plurality of scopes, sizes, and method recommendations. Details on each study can be found in Additional file 2
ORA, over-representation analysis; FCS, functional class scoring; PT, pathway topology-based; SS, single-sample; NI, network interaction
Comparison of performance criteria
| Criteria | Objectivity | Reproducibility | Scalability | Drawbacks |
|---|---|---|---|---|
| Tool agreement | Low | High | High | Subjective |
| Consistency between similar samples | Low | High | High | Subjective |
| Consistency with biological knowledge (Literature search) | Low | Low | Low | Subjective |
| Benchmark (target pathways as gold standard) | High | High | High | Centered on true positives |
| Benchmark (pathway relevance ranking as gold standard) | High | High | High | – |
| Benchmark (KO/perturbation data as gold standard) | High | High | High | – |
| Simulations | Low | Low | High | Human-designed datasets may be unrealistic |
Objectivity, Reproducibility, and Scalability of the main performance criteria. Objectivity refers to the results not depending on human interpretation. Reproducibility refers to any researcher being able to find the same results by following the same procedures. Scalability refers to the possibility of easily applying the procedure to an increasingly higher number of methods and datasets
Fig. 2Screenshots of our tools for popularity and performance analysis of the GSA field. a GSARefDB: A screenshot of the R/shiny interface to GSARefDB, showing the options of searching by year, tool name, paper’s first author, title, type of GSA, and programming language. b GSA BenchmarKING: One jupyter notebook containing an R workflow for benchmarking single-sample GSA methods, and one shiny app with the same purpose. Both tools display sensitivity, specificity, and precision plots for all the methods under study. See: https://gsa-central.github.io/gsarefdb.html and https://gsa-central.github.io/benchmarKING.html