Hagai Levi1, Nima Rahmanian2, Ran Elkon3,4, Ron Shamir1. 1. The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel. 2. University of California, Berkeley, CA, USA. 3. Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, 69978, Israel. 4. Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 69978, Israel.
Abstract
MOTIVATION: Active module identification (AMI) is an essential step in many omics analyses. Such algorithms receive a gene network and a gene activity profile as input and report subnetworks that show significant over-representation of accrued activity signal ("active modules"). Such modules can point out key molecular processes in the analyzed biological conditions. RESULTS: We recently introduced a novel AMI algorithm called DOMINO, and demonstrated that it detects active modules that capture biological signals with markedly improved rate of empirical validation. Here, we provide an online server that executes DOMINO, making it more accessible and user-friendly. To help the interpretation of solutions, the server provides GO enrichment analysis, module visualizations, and accessible output formats for customized downstream analysis. It also enables running DOMINO with various gene identifiers of different organisms. AVAILABILITY: The server is available at http://domino.cs.tau.ac.il. Its codebase is available at https://github.com/Shamir-Lab.
MOTIVATION: Active module identification (AMI) is an essential step in many omics analyses. Such algorithms receive a gene network and a gene activity profile as input and report subnetworks that show significant over-representation of accrued activity signal ("active modules"). Such modules can point out key molecular processes in the analyzed biological conditions. RESULTS: We recently introduced a novel AMI algorithm called DOMINO, and demonstrated that it detects active modules that capture biological signals with markedly improved rate of empirical validation. Here, we provide an online server that executes DOMINO, making it more accessible and user-friendly. To help the interpretation of solutions, the server provides GO enrichment analysis, module visualizations, and accessible output formats for customized downstream analysis. It also enables running DOMINO with various gene identifiers of different organisms. AVAILABILITY: The server is available at http://domino.cs.tau.ac.il. Its codebase is available at https://github.com/Shamir-Lab.
High-throughput omics data analysis frequently utilizes biological networks. In these networks each node represents a cellular subunit (e.g. a gene or its protein product) and each edge represents a relationship between two subunits (e.g. a physical interaction between two proteins). Integrated analysis of a biological network and a molecular profile measuring gene activity levels under a certain condition can greatly boost the functional interpretation of the data (Mitra ). Activity levels can be calculated by measuring differential expression between two conditions or samples (Chuang ; Ideker ), by providing a set of genes associated with a disease as inferred from a GWAS (Chang ; Fernández-Tajes ; Nakka ), or by estimating the mutation load of genes in cancer patients (Cerami ). Active Module Identification (AMI) methods seek ‘active modules’, i.e. connected subnetworks that show a marked over-representation of high activity levels (Ideker ; Leiserson ). Such modules can reveal biological processes involved in the probed condition. A popular way to infer these biological processes is by conducting Gene Ontology (GO) enrichment analysis on each module (The Gene Ontology Consortium, 2019).Recently, we evaluated six popular AMI algorithms and analyzed the GO terms that were enriched on their modules. We observed a high rate of non-specific calls of enriched GO terms in most algorithms, putting to question their capacity to illuminate processes that are specifically relevant to the probed conditions (Levi ). Furthermore, we introduced DOMINO, a novel AMI algorithm with markedly higher rate of empirically validated calls (Levi ). Of note, similar results were observed by a recent independent benchmark study, which also reported that DOMINO’s modules had substantially higher biological signals than modules found by other AMI algorithms (Lazareva ).The original DOMINO tool requires download and installation on the user’s machine. Here, in order to make DOMINO more accessible to researchers, we provide an online service requiring no installation. The server also enables GO-term enrichment analysis on each module, module visualization, standard output formats for downstream analysis, and options to run DOMINO with other organisms.
2 Materials and methods
2.1 Input files: active gene sets and network file
The input for DOMINO is a set of active genes and a network. Note that DOMINO uses only binary gene scores (active/not active under the probed condition) and not real value scores. For the network, the user can upload a custom network file or choose a pre-loaded network. The available networks include DIP (Xenarios ), HuRI (Luck ) and STRING (Szklarczyk ) with edge confidence score > 900, and PCNet (Huang ). The preloaded networks use a cache mechanism (detailed in (Levi )) for faster runtime. Runs on DIP, STRING and HuRI networks (<250K edges and <20K nodes) typically require up to two minutes, and on PCNet (∼2.5M edges and ∼20K nodes) ∼10 minutes. For custom networks (<10 MB; <300K edges) the runtime can be up to 5 min. In addition, the user can provide several active gene sets (e.g. for different diseases) in order to analyze and compare the results in a single execution.
2.2 Resulting modules
After providing the input files and clicking the ‘execute’ button, a request is sent to the server to run DOMINO and perform additional analyses. The resulting modules are ranked by DOMINO’s internal scores, and visualized using Cytoscape.js (Franz ). Gene symbols are shown and right-clicking on a gene's node brings up its GeneCards page (Stelzer ). Alongside the module, the genes it comprises are shown. The user can navigate between different modules and solutions.
2.3 GO enrichment analysis
GO enrichment analysis is performed on each module and FDR corrected for multiple testing using the goaltools library (Klopfenstein ). The background genes used for this analysis are those comprising the input network. A list of GO terms and their enrichment scores are reported in a table alongside the visualized module.
2.4 Downloading results
To enable further use of the solution, results can be downloaded by the user. Each module can be downloaded in two forms: (i) HTML (with the visualization and other results as they are shown in the DOMINO website), and (ii) text files of the list of genes in the modules and a table summarizing the results of the GO enrichment analysis. This enables additional customized downstream analyses of modules and enriched GO terms.
2.5 Analyzing other organisms and gene identifiers
DOMINO uses by default human ENSEMBL or gene symbol identifiers. The website provides two options to run DOMINO with a list of non-human gene identifiers: (i) if the active gene list contains mouse ENSEMBL identifiers, and one of the pre-loaded networks is chosen, the genes in the active gene set will be converted to their corresponding human orthologs. In this case, GO enrichment analysis will be applied to the resulting modules. (ii) If a custom network is supplied by the user, DOMINO matches the gene identifiers in the active gene list to the network. Note that in this case, the gene identifiers need not be taken from ENSEMBL, but can be of any species. In this case, GO enrichment analysis is not executed.
2.6 API calls for automated pipelines
To enable scripts to perform automatic calls to the server, we exposed a web-API for the execution of DOMINO. Details are provided on the landing page of the website.
2.7 Supported browsers
We extensively tested the website under Firefox (94.0.1) and Chrome (version 96.0.4664.110). It can be run on other browsers as well.
3 Results
As a showcase, we uploaded an input set of 155 genes related to autism spectrum disorder (ASD) [taken from The SPARK Consortium (2018)]. We ran the tool with the preloaded STRING network. DOMINO detected eight modules in this run. Figure 1 shows the two largest modules. Reassuringly, they correspond to two distinct fundamental biological processes that are known to be severely abrogated in brains of ASD patients (Satterstrom. FK. ): (i) chromatin remodeling and regulation of transcription (Cunniff ; LaSalle, 2013) (Fig. 1A and C) and (ii) defects in neuronal trans-synaptic signaling (Guang ) (Fig. 1B and D).
Fig. 1.
Two modules reported by DOMINO web-server on a set of 155 ASD related genes using the preloaded STRING network. The website runs the DOMINO algorithms and provides visualizations of the resulting modules (A, B) along with the most enriched GO terms found on each (C, D). The red nodes indicate the module’s genes that are included in the input set of active genes (here the set of ASD genes). Ontology: molecular function (MF), biological process (BP) or cellular component (CC); pval: nominal P-value; qval: FDR corrected P-value
Two modules reported by DOMINO web-server on a set of 155 ASD related genes using the preloaded STRING network. The website runs the DOMINO algorithms and provides visualizations of the resulting modules (A, B) along with the most enriched GO terms found on each (C, D). The red nodes indicate the module’s genes that are included in the input set of active genes (here the set of ASD genes). Ontology: molecular function (MF), biological process (BP) or cellular component (CC); pval: nominal P-value; qval: FDR corrected P-valueWe repeated the analysis on three other networks: DIP, HuRI and PCNet. Results are available in the project’s Github. Several modules generated using different networks showed high overlap in genes and GO terms. A recent systematic evaluation of different networks reported that interactions that appear in more than one network are informative for module identification and recommended the usage of PCNet (Huang ). As the optimal network also depends on the input set of active genes, users may wish to run DOMINO with several biological networks.
Funding
Study supported in part by German-Israeli Project [DFG RE 4193/1-1 to R.S., R.E.]; the Israel Science Foundation [1339/18 to R.S., 2118/19 to R.E.]; Len Blavatnik and the Blavatnik Family foundation [to R.S.]; and the Koret-UC Berkeley-Tel Aviv University Initiative in Computational Biology and Bioinformatics [to R.E., R.S.]. H.L. was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. R.E. is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University.Conflict of Interest: none declared.
Authors: Mark D M Leiserson; Fabio Vandin; Hsin-Ta Wu; Jason R Dobson; Jonathan V Eldridge; Jacob L Thomas; Alexandra Papoutsaki; Younhun Kim; Beifang Niu; Michael McLellan; Michael S Lawrence; Abel Gonzalez-Perez; David Tamborero; Yuwei Cheng; Gregory A Ryslik; Nuria Lopez-Bigas; Gad Getz; Li Ding; Benjamin J Raphael Journal: Nat Genet Date: 2014-12-15 Impact factor: 38.330
Authors: D V Klopfenstein; Liangsheng Zhang; Brent S Pedersen; Fidel Ramírez; Alex Warwick Vesztrocy; Aurélien Naldi; Christopher J Mungall; Jeffrey M Yunes; Olga Botvinnik; Mark Weigel; Will Dampier; Christophe Dessimoz; Patrick Flick; Haibao Tang Journal: Sci Rep Date: 2018-07-18 Impact factor: 4.379
Authors: Juan Fernández-Tajes; Kyle J Gaulton; Martijn van de Bunt; Jason Torres; Matthias Thurner; Anubha Mahajan; Anna L Gloyn; Kasper Lage; Mark I McCarthy Journal: Genome Med Date: 2019-03-26 Impact factor: 15.266
Authors: Katja Luck; Dae-Kyum Kim; Luke Lambourne; Kerstin Spirohn; Bridget E Begg; Wenting Bian; Ruth Brignall; Tiziana Cafarelli; Francisco J Campos-Laborie; Benoit Charloteaux; Dongsic Choi; Atina G Coté; Meaghan Daley; Steven Deimling; Alice Desbuleux; Amélie Dricot; Marinella Gebbia; Madeleine F Hardy; Nishka Kishore; Jennifer J Knapp; István A Kovács; Irma Lemmens; Miles W Mee; Joseph C Mellor; Carl Pollis; Carles Pons; Aaron D Richardson; Sadie Schlabach; Bridget Teeking; Anupama Yadav; Mariana Babor; Dawit Balcha; Omer Basha; Christian Bowman-Colin; Suet-Feung Chin; Soon Gang Choi; Claudia Colabella; Georges Coppin; Cassandra D'Amata; David De Ridder; Steffi De Rouck; Miquel Duran-Frigola; Hanane Ennajdaoui; Florian Goebels; Liana Goehring; Anjali Gopal; Ghazal Haddad; Elodie Hatchi; Mohamed Helmy; Yves Jacob; Yoseph Kassa; Serena Landini; Roujia Li; Natascha van Lieshout; Andrew MacWilliams; Dylan Markey; Joseph N Paulson; Sudharshan Rangarajan; John Rasla; Ashyad Rayhan; Thomas Rolland; Adriana San-Miguel; Yun Shen; Dayag Sheykhkarimli; Gloria M Sheynkman; Eyal Simonovsky; Murat Taşan; Alexander Tejeda; Vincent Tropepe; Jean-Claude Twizere; Yang Wang; Robert J Weatheritt; Jochen Weile; Yu Xia; Xinping Yang; Esti Yeger-Lotem; Quan Zhong; Patrick Aloy; Gary D Bader; Javier De Las Rivas; Suzanne Gaudet; Tong Hao; Janusz Rak; Jan Tavernier; David E Hill; Marc Vidal; Frederick P Roth; Michael A Calderwood Journal: Nature Date: 2020-04-08 Impact factor: 49.962