| Literature DB >> 19033363 |
Da Wei Huang1, Brad T Sherman, Richard A Lempicki.
Abstract
Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.Entities:
Mesh:
Year: 2008 PMID: 19033363 PMCID: PMC2615629 DOI: 10.1093/nar/gkn923
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
List of 68 enrichment tools
| Enrichment tool name | Year of release | Key statistical method | Category |
|---|---|---|---|
| FunSpec | 2002 | Hypergeometric | Class I |
| Onto-express | 2002 | Fisher's exact; hypergeometic; binomial; chi-square | Class I |
| EASE | 2003 | Fisher's exact (modified as EASE score) | Class I |
| FatiGO/FatiWise/FatiGO+ | 2003 | Fisher's exact | Class I |
| FuncAssociate | 2003 | Fisher's exact | Class I |
| GARBAN | 2003 | Hypergeometric | Class I |
| GeneMerge | 2003 | Hypergeometric | Class I |
| GoMiner | 2003 | Fisher's exact | Class I |
| MAPPFinder | 2003 | Class I | |
| CLENCH | 2004 | Hypergeometric; chi-square; binomial | Class I |
| GO::TermFinder | 2004 | hypergeometric | Class I |
| GOAL | 2004 | Permutation | Class I |
| GOArray | 2004 | Hypergeometric; | Class I |
| GOStat | 2004 | Fisher's exact; chi-squre | Class I |
| GoSurfer | 2004 | Chi-square | Class I |
| OntologyTraverser | 2004 | Hypergeometric; Fisher's exact | Class I |
| THEA | 2004 | Hypergeometric | Class I |
| BiNGO | 2005 | Hypergeometric; binomial | Class I |
| FACT | 2005 | Adopt GeneMerge and GO::TermFinder statistical modules | Class I |
| gfinder | 2005 | Fisher's exact | Class I |
| Gobar | 2005 | Hypergeometric | Class I |
| GOCluster | 2005 | Hypergeometric | Class I |
| GOSSIP | 2005 | Fisher's exact | Class I |
| L2L | 2005 | Binomial; hypergeometric | Class I |
| WebGestalt | 2005 | Hypergeometric | Class I |
| BayGO | 2006 | Bayesian; Goodman and Kruskal's gamma factor | Class I |
| eGOn/GeneTools | 2006 | Fisher's exact | Class I |
| Gene Class Expression | 2006 | Class I | |
| GOALIE | 2006 | Hidden Kripke model | Class I |
| GOFFA | 2006 | Fisher's inverse chi-square | Class I |
| GOLEM | 2006 | Hyerpgeometric | Class I |
| JProGO | 2006 | Fisher's exact; Kolmogorov–Smirnov test; student's | Class I |
| PageMan | 2006 | Fisher's exact; chi-square; Wilcoxon | Class I |
| STEM | 2006 | Hypergeometric | Class I |
| WEGO | 2006 | Chi-square | Class I |
| EasyGO | 2007 | Hypergeometric; chi-square; binomial | Class I |
| g:Profiler | 2007 | Hypergeometric | Class I |
| ProbCD | 2007 | Yule's Q; Goodman-Kruskal's gamma; Cramer's T | Class I |
| GOEAST | 2008 | Hypergeometric | Class I |
| GOHyperGAll | 2008 | Hypergeometric | Class I |
| CatMap | 2004 | Permutations | Class II |
| Godist | 2004 | Kolmogorov–Smirnov test | Class II |
| GO-Mapper | 2004 | Gaussian distribution; EQ-score | Class II |
| iGA | 2004 | Permutations; hypergeometric; | Class II |
| GSEA | 2005 | Kolmogorov–Smirnov-like statistic | Class II |
| MEGO | 2005 | Class II | |
| PAGE | 2005 | Class II | |
| T-profiler | 2005 | Class II | |
| FuncCluster | 2006 | Fisher's exact | Class II |
| FatiScan | 2007 | Fisher's Exact | Class II |
| FINA | 2007 | Fisher's exact | Class II |
| GAzer | 2007 | Class II | |
| GeneTrail | 2007 | Hypergeometric; Kolmogorov–Smirnov | Class II |
| MetaGP | 2007 | Class II | |
| Ontologizer | 2004 | Fisher's exact | Class III |
| POSOC | 2004 | POSET (a discrete math: finite partially ordered set) | Class III |
| topGO | 2006 | Fisher's exact | Class III |
| GO-2D | 2007 | Hypergeometric; binomial | Class III |
| GENECODIS | 2007 | Hypergeometric; chi-square | Class III |
| GOSim | 2007 | Resnik's similarity | Class III |
| PalS | 2008 | Percent | Class III |
| ProfCom | 2008 | Greedy heuristics | Class III |
| GOTM | 2004 | Hypergeometric | Class I,II |
| ermineJ | 2005 | Permutations; Wilcoxon rank-sum test | Class I,II |
| DAVID | 2003 | Fisher's Exact (modified as EASE score) | Class I,III |
| GOToolBox | 2004 | Hypergeometric; Fisher's exact; Binomial | Class I,III |
| ADGO | 2006 | Class II,III | |
| FunNet | 2008 | Unclear | Unclear |
Figure 1.The infrastructure of typical enrichment tools. Even though the enrichment analysis tools have distinct features, they can be generally described as three major layers: backend annotation database; data mining; and result presentation. Each of the layers, rather than statistical methods alone, greatly influences the analytic results.
Categorization of enrichment analysis tools
| Tool category | Description | Indication and limitation | Sub-type of algorithms | Methods | Example tool |
|---|---|---|---|---|---|
| Class I: singular enrichment analysis (SEA) | Enrichment | Capable of analyzing any gene list, which could be selected from any high-throughput biological studies/technologies (e.g. Microarray, ChIP-on-CHIP, ChIP-on-sequence, SNP array, EXON array, large scale sequence, etc.). However, the deeper inter-relationships among the terms may not be fully captured in linear format report. | Global reference background Local reference background Neural network | Fisher's exact hypergeometric chi-square binomial Fisher's Exact hypergeometric chi-square binomial Bayesian | GoStat, GoMiner, GOTM, BinGO, GOtoolBox, GFinder, etc. DAVID, Onto-Express, GARBAN, FatiGO, etc. BayGO |
| Class II: gene set enrichment analysis (GSEA) | Entire genes (without pre-selection) and associated experimental values are considered in the enrichment analysis. The unique features of this strategy are: (i) No need to pre-select interesting genes, as opposed to Classes I and II; (ii) Experimental values integrated into | Suitable for pair-wide biological studies (e.g. disease versus control). Currently, may be difficult to be applied to the diverse data structures derived by a complex experimental design and some of the new technologies (e.g. SNP, EXON, Promoter arrays). | Based on ranked gene list Based on continuous gene values | Kolmogorov–Smirnov-like | GSEA, CapMap, etc. FatiScan, ADGO, ermineJ, PAGE, iGA, GO-Mapper, GOdist, FINA, T-profiler, MetaGP, etc. |
| Class III: modular enrichment analysis (MEA) | This strategy inherits key spirit of SEA. However, the term–term/gene–gene relationships are considered into enrichment | Capable of analyzing any gene lists, which could be selected from any high-throughput biological studies/technologies, like Class I. Emphasis on network relationships during analysis. ‘Orphan’ gene/term (with little relationships to other genes/terms), that sometimes could be very interesting, too, may be left out from the analysis. | Composite annotations DAG Structure Global annotation relationship | Measure enrichment on joint terms Measure enrichment by considering parents-child relationships Measure term–term global similarity with Kappa Statistics Czekanowski-Dice Pearson's correlation | ADGO, GeneCodis, ProfCom, etc. topGO, Ontologizer, POSOC, etc. DAVID, GoToolBox, etc. |