Literature DB >> 22743224

GO-Elite: a flexible solution for pathway and ontology over-representation.

Alexander C Zambon¹, Stan Gaj, Isaac Ho, Kristina Hanspers, Karen Vranizan, Chris T Evelo, Bruce R Conklin, Alexander R Pico, Nathan Salomonis.

Abstract

UNLABELLED: We introduce GO-Elite, a flexible and powerful pathway analysis tool for a wide array of species, identifiers (IDs), pathways, ontologies and gene sets. In addition to the Gene Ontology (GO), GO-Elite allows the user to perform over-representation analysis on any structured ontology annotations, pathway database or biological IDs (e.g. gene, protein or metabolite). GO-Elite exploits the structured nature of biological ontologies to report a minimal set of non-overlapping terms. The results can be visualized on WikiPathways or as networks. Built-in support is provided for over 60 species and 50 ID systems, covering gene, disease and phenotype ontologies, multiple pathway databases, biomarkers, and transcription factor and microRNA targets. GO-Elite is available as a web interface, GenMAPP-CS plugin and as a cross-platform application. AVAILABILITY: http://www.genmapp.org/go_elite

Entities: Chemical Species

Mesh：

Year: 2012 PMID： 22743224 PMCID： PMC3413395 DOI： 10.1093/bioinformatics/bts366

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The analysis of pathways, ontologies and other gene sets has become the preferred method for biologists looking to identify global trends from genomic datasets. Although a myriad of tools exist for pathway over-representation, few consider the structured nature of associated ontology data, alternative ontologies and diverse gene sets; few support a wide array of genomes or biological measurements, and they are often limited in scope (Huang da ). Unlike ontologies, pathways provide valuable qualitative contexts (interactions, reactions, metabolites and cellular compartments) that highlight biological relevance. Although various pathway resources now exist (Soh ), most over-representation analysis (ORA) tools are limited to one resource that is often outdated. To address these deficiencies, GO-Elite was developed to provide an interchangeable and updatable model of pathway, ontology, species and gene ID system relationships. Using these relationships, GO-Elite performs ontology pruning to report a minimally non-redundant set of results (Fig. 1). Multiple options for running GO-Elite exist: source-code, cross-platform binaries, Opal web service (Ren ), online interface or as extensions to the programs GenMAPP-CS (http://www.genmapp.org/) and AltAnalyze (http://www.altanalyze.org). The stand-alone versions of GO-Elite provide an intuitive user interface and command-line control. As previously shown, GO-Elite can be applied to a broad range of biological applications and data types (Hochstenbach ; Lemay ).

Fig. 1

GO-Elite workflow and information sources. Before performing ORA, users create two text files containing a list of input IDs (e.g. regulated genes) and a denominator list (e.g. all genes examined), source ID type (e.g. Affymetrix) and numerical values (optional). These IDs are mapped to a primary ID system (EntrezGene, Ensembl, HMD or custom) for ORA upon pathways, ontologies or loaded gene sets. Regulated genes and metabolites can be immediately viewed on WikiPathways using the stand-alone or GenMAPP-CS interface. Pathway or ontology summarized expression values can be clustered and visualized outside of GO-Elite

2 METHODS AND IMPLEMENTATION

2.1 Database architecture

Users working with GO-Elite can create their own databases (species, ID systems, relationships) or download official GO-Elite species databases available for each release of Ensembl. The official databases are created primarily from the Ensembl database, which include all external ID systems related to Ensembl (e.g. EntrezGene, UniProt, EMBL) as well as supported microarray platforms (e.g. Affymetrix, Agilent, Codelink, Illumina). The database is augmented with relationships directly from NCBI EntrezGene and Affymetrix. Currently, relationships to multiple biological Ontology [Gene Ontology (GO), Disease and Phenotype], pathway (WikiPathways, PathwayCommons, KEGG) and gene set resources (e.g. PAZAR, Amadeus, miRanda, RNAhybrid, InterProt and Lineage Biomarkers) are supported (Supplementary Methods). In addition to gene relationships, metabolomics analyses are available for WikiPathways and KEGG. Although only a select few (ID) systems link directly to pathway and ontology annotations [Ensembl, EntrezGene and HMDB (http://www.hmdb.ca)] by default, all secondary ID systems (e.g. Affymetrix, RefSeq, MGI and Symbol) connect to these through relationship tables. Thus, users can import and analyze ID lists for dozens of supported or user added ID systems. All resources and annotations provided by GO-Elite can be easily updated or further customized using built-in importers. These importers connect online to the various resources (e.g. WikiPathways, GO and Ensembl) or import local relationships from multiple file formats (e.g. GPML, BioPax and GMT). Alternative ontologies can also be added in GO-Elite, by specifying the URL for any OBO ontology file and importing a species-specific ontology ID relationship file through the user interface.

2.2 Optimized pathway over-representation

For ORA, ontologies, pathways and gene sets are analyzed by a method similar to the program MAPPFinder (Doniger s). GO-Elite ranks each analyzed term according to a Z −score, calculated with a normal approximation to the hypergeometric distribution along with a permutation or a Fisher's exact test P-value. False-discovery rate adjusted P-values are calculated using a Benjamini–Hochberg correction (Reiner et al., 2003). The ontology ORA results from this step are further evaluated by a simple yet robust pruning method. Pruning occurs by importing these ORA statistics (Z-score, P-values and gene counts), matching user-defined or default filtering options and building all unique branch paths of these results based on the ontology tree structure. Branch paths are pruned to obtain the nodes with the largest Z-score relative to all corresponding child and parent nodes, to report the most informative, highest scoring term for a network of related terms (Supplementary Methods). The compared scores can be optionally weighted based on the number of IDs associated with each term. This adjustment can result in more or less reported results, by favoring higher level parent nodes with more associated genes, resulting in up to an 80% reduction in the number of reported terms (Supplementary Table). Since several alternative ORA methods exist, such as GSEA (Huang da ), users wishing to load results from such algorithms can restrict their analysis to this pruning step.

2.3 Data representation

From these analysis steps, multiple results files are produced. The most informative of these is the pruned summary report, which includes all summary term statistics and associated gene or metabolite symbols for both ontology and non-ontology terms. Gene content redundancy between reported terms is also provided, to highlight unrelated terms with similar or identical gene content. When numerical values, such as fold changes, are included with each input ID, GO-Elite will also report mean and standard deviation ontology/pathway-level values in this summary file, analogous to GO-Quant (Yu ), allowing for downstream pathway-level expression clustering (Supplementary Methods). In addition, a full list of ontology and pathway statistics, associated IDs (e.g. gene symbol and associated Ensembl), comparison of reported ontology/pathway statistics between input files (where applicable) and additional gene redundancy focused files are provided. Regulated genes and metabolites can also be immediately visualized on WikiPathways in the stand-alone interface or following GenMAPP-CS analysis. Relationships between all regulated IDs and ORA terms can also be easily visualized as networks in Cytoscape using produced output files (Supplementary Methods). This application should be of considerable interest to the genomics community, as it represents a highly customizable, simple to use and powerful framework for minimal ontology/pathway reporting. As GO-Elite is agnostic to the type of data input (e.g. gene, protein or metabolite), source ontology, pathway or gene set, we hope to rely further on community-contributed content to improve the utility of this tool in the years to come.

7 in total

1. A system-based approach to interpret dose- and time-dependent microarray data: quantitative integration of gene ontology analysis for risk assessment.

Authors: Xiaozhong Yu; William C Griffith; Kristina Hanspers; James F Dillman; Hansel Ong; Melinda A Vredevoogd; Elaine M Faustman
Journal: Toxicol Sci Date: 2006-04-06 Impact factor: 4.849

2. Opal web services for biomedical applications.

Authors: Jingyuan Ren; Nadya Williams; Luca Clementi; Sriram Krishnan; Wilfred W Li
Journal: Nucleic Acids Res Date: 2010-06-06 Impact factor: 16.971

3. Transcriptomic profile indicative of immunotoxic exposure: in vitro studies in peripheral blood mononuclear cells.

Authors: Kevin Hochstenbach; Danitsja M van Leeuwen; Hans Gmuender; Solvor B Stølevik; Unni C Nygaard; Martinus Løvik; Berit Granum; Ellen Namork; Joost H M van Delft; Henk van Loveren
Journal: Toxicol Sci Date: 2010-08-11 Impact factor: 4.849

4. Consistency, comprehensiveness, and compatibility of pathway databases.

Authors: Donny Soh; Difeng Dong; Yike Guo; Limsoon Wong
Journal: BMC Bioinformatics Date: 2010-09-07 Impact factor: 3.169

5. The bovine lactation genome: insights into the evolution of mammalian milk.

Authors: Danielle G Lemay; David J Lynn; William F Martin; Margaret C Neville; Theresa M Casey; Gonzalo Rincon; Evgenia V Kriventseva; Wesley C Barris; Angie S Hinrichs; Adrian J Molenaar; Katherine S Pollard; Nauman J Maqbool; Kuljeet Singh; Regan Murney; Evgeny M Zdobnov; Ross L Tellam; Juan F Medrano; J Bruce German; Monique Rijnkels
Journal: Genome Biol Date: 2009-04-24 Impact factor: 13.583

6. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors: Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal: Nucleic Acids Res Date: 2008-11-25 Impact factor: 16.971

7. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data.

Authors: Scott W Doniger; Nathan Salomonis; Kam D Dahlquist; Karen Vranizan; Steven C Lawlor; Bruce R Conklin
Journal: Genome Biol Date: 2003-01-06 Impact factor: 13.583

7 in total

138 in total

1. Comparison of toxicogenomic responses to phthalate ester exposure in an organotypic testis co-culture model and responses observed in vivo.

Authors: Sean Harris; Sanne A B Hermsen; Xiaozhong Yu; Sung Woo Hong; Elaine M Faustman
Journal: Reprod Toxicol Date: 2015-10-22 Impact factor: 3.143

2. NOTCH1 regulates matrix gla protein and calcification gene networks in human valve endothelium.

Authors: Mark P White; Christina V Theodoris; Lei Liu; William J Collins; Kathleen W Blue; Joon Ho Lee; Xianzhong Meng; Robert C Robbins; Kathryn N Ivey; Deepak Srivastava
Journal: J Mol Cell Cardiol Date: 2015-04-12 Impact factor: 5.000

3. Whole-Genome rVISTA: a tool to determine enrichment of transcription factor binding sites in gene promoters from transcriptomic data.

Authors: Inna Dubchak; Matthew Munoz; Alexandre Poliakov; Nathan Salomonis; Simon Minovitsky; Rolf Bodmer; Alexander C Zambon
Journal: Bioinformatics Date: 2013-06-04 Impact factor: 6.937

4. A robust method to derive functional neural crest cells from human pluripotent stem cells.

Authors: Faith R Kreitzer; Nathan Salomonis; Alice Sheehan; Miller Huang; Jason S Park; Matthew J Spindler; Paweena Lizarraga; William A Weiss; Po-Lin So; Bruce R Conklin
Journal: Am J Stem Cells Date: 2013-06-30

5. Transcriptional Perturbations in Graft Rejection.

Authors: Matthew J Vitalone; Tara K Sigdel; Nathan Salomonis; Reuben D Sarwal; Szu-Chuan Hsieh; Minnie M Sarwal
Journal: Transplantation Date: 2015-09 Impact factor: 4.939

6. Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf.

Authors: Bliss Magella; Mike Adam; Andrew S Potter; Meenakshi Venkatasubramanian; Kashish Chetal; Stuart B Hay; Nathan Salomonis; S Steven Potter
Journal: Dev Biol Date: 2017-11-26 Impact factor: 3.582

7. Variation in the Ovine Abomasal Lymph Node Transcriptome between Breeds Known to Differ in Resistance to the Gastrointestinal Nematode.

Authors: Albin M Ahmed; Barbara Good; James P Hanrahan; Paul McGettigan; John Browne; Orla M Keane; Bojlul Bahar; Jai Mehta; Bryan Markey; Amanda Lohan; Torres Sweeney
Journal: PLoS One Date: 2015-05-15 Impact factor: 3.240

8. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia.

Authors: Zhaohui Gu; Michelle L Churchman; Kathryn G Roberts; Ian Moore; Xin Zhou; Joy Nakitandwe; Kohei Hagiwara; Stephane Pelletier; Sebastien Gingras; Hartmut Berns; Debbie Payne-Turner; Ashley Hill; Ilaria Iacobucci; Lei Shi; Stanley Pounds; Cheng Cheng; Deqing Pei; Chunxu Qu; Scott Newman; Meenakshi Devidas; Yunfeng Dai; Shalini C Reshmi; Julie Gastier-Foster; Elizabeth A Raetz; Michael J Borowitz; Brent L Wood; William L Carroll; Patrick A Zweidler-McKay; Karen R Rabin; Leonard A Mattano; Kelly W Maloney; Alessandro Rambaldi; Orietta Spinelli; Jerald P Radich; Mark D Minden; Jacob M Rowe; Selina Luger; Mark R Litzow; Martin S Tallman; Janis Racevskis; Yanming Zhang; Ravi Bhatia; Jessica Kohlschmidt; Krzysztof Mrózek; Clara D Bloomfield; Wendy Stock; Steven Kornblau; Hagop M Kantarjian; Marina Konopleva; Williams E Evans; Sima Jeha; Ching-Hon Pui; Jun Yang; Elisabeth Paietta; James R Downing; Mary V Relling; Jinghui Zhang; Mignon L Loh; Stephen P Hunger; Charles G Mullighan
Journal: Nat Genet Date: 2019-01-14 Impact factor: 38.330

9. The Evolutionarily-conserved Polyadenosine RNA Binding Protein, Nab2, Cooperates with Splicing Machinery to Regulate the Fate of pre-mRNA.

Authors: Sharon Soucek; Yi Zeng; Deepti L Bellur; Megan Bergkessel; Kevin J Morris; Qiudong Deng; Duc Duong; Nicholas T Seyfried; Christine Guthrie; Jonathan P Staley; Milo B Fasken; Anita H Corbett
Journal: Mol Cell Biol Date: 2016-08-15 Impact factor: 4.272

10. Transcriptomic changes in the pre-parasitic juveniles of Meloidogyne incognita induced by silencing of effectors Mi-msp-1 and Mi-msp-20.

Authors: Vishal Singh Somvanshi; Victor Phani; Prakash Banakar; Madhurima Chatterjee; Roli Budhwar; Rohit Nandan Shukla; Uma Rao
Journal: 3 Biotech Date: 2020-07-29 Impact factor: 2.406