| Literature DB >> 22383865 |
Purvesh Khatri1, Marina Sirota, Atul J Butte.
Abstract
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.Entities:
Mesh:
Year: 2012 PMID: 22383865 PMCID: PMC3285573 DOI: 10.1371/journal.pcbi.1002375
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Examples of pathway analysis tools in each generation.
| Name | Availability | Reference |
|
| ||
| Onto-Express | Web ( |
|
| GenMAPP | Standalone ( |
|
| GoMiner | Standalone, Web ( |
|
| FatiGO | Web ( |
|
| GOstat | Web ( |
|
| FuncAssociate | Web ( |
|
| GOToolBox | Web ( |
|
| GeneMerge | Standalone, Web ( |
|
| GOEAST | Web ( |
|
| ClueGO | Standalone ( |
|
| FunSpec | Web ( |
|
| GARBAN | Web |
|
| GO:TermFinder | Standalone ( |
|
| WebGestalt | Web ( |
|
| agriGO | Web ( |
|
| GOFFA | Standalone, Web ( |
|
| WEGO | Web ( |
|
|
| ||
| GSEA | Standalone ( |
|
| sigPathway | Standalone (BioConductor) |
|
| Category | Standalone (BioConductor) |
|
| SAFE | Standalone (BioConductor) |
|
| GlobalTest | Standalone (BioConductor) |
|
| PCOT2 | Standalone (BioConductor) |
|
| SAM-GS | Standalone ( |
|
| Catmap | Standalone ( |
|
| T-profiler | Web ( |
|
| FunCluster | Standalone ( |
|
| GeneTrail | Web ( |
|
| GAzer | Web |
|
|
| ||
| ScorePAGE | No implementation available |
|
| Pathway-Express | Web ( |
|
| SPIA | Standalone (BioConductor) |
|
| NetGSA | No implementation available |
|
Figure 1Overview of existing pathway analysis methods using gene expression data as an example.
Note that this overview is equally applicable to molecular measurements using proteomics, and any other high-throughput technologies. The data generated by an experiment using a high-throughput technology (e.g., microarray, proteomics, metabolomics), along with functional annotations (pathway database) of the corresponding genome, are input to virtually all pathway analysis methods. While ORA methods require that the input is a list of differentially expressed genes, FCS methods use the entire data matrix as input. In addition to functional annotations of a genome, PT-based methods utilize the number and type of interactions between gene products, which may or may not be a part of a pathway database. The result of every pathway analysis method is a list of significant pathways in the condition under study. DE, differentially expressed.
Figure 2Overview of low resolution, missing, and incomplete information.
Green arrows represent abundantly available information, and red arrows represent missing and/or incomplete information. The ultimate goal of pathway analysis is to analyze a biological system as a large, single network. However, the links between smaller individual pathways are not yet well known. Furthermore, the effects of a SNP on a given pathway are also missing from current knowledge bases. While some pathways are known to be related to a few diseases, it is not clear whether the changes in pathways are the cause for those diseases or the downstream effects of the diseases.
Figure 3Number of GO-annotated genes (left panel) and number of GO annotations (right panel) for human from January 2003 to November 2009.
As the estimated number of known genes in the human genome is adjusted (between January 2003 and December 2003) and annotation practices are modified (between December 2004 and December 2005, and between October 2008 and November 2009), one can argue that, although the number of annotated genes and the annotations are decreasing (which is mainly due to the adjusted number of genes in the human genome and changes in the annotation process), the quality of annotations is improving, as demonstrated by the steady increase in non-IEA annotations and the number of genes with non-IEA annotations. However, the increase in the number of genes with non-IEA annotations is very slow. In almost 7 years, between January 2003 and November 2009, only 2,039 new genes received non-IEA annotations. At the same time, the number of non-IEA annotations increased from 35,925 to 65,741, indicating a strong research bias for a small number of genes.