Literature DB >> 33095781

Multiscale community detection in Cytoscape.

Akshat Singhal1, Song Cao2, Christopher Churas2, Dexter Pratt2, Santo Fortunato3, Fan Zheng2, Trey Ideker1,2.   

Abstract

Detection of community structure has become a fundamental step in the analysis of biological networks with application to protein function annotation, disease gene prediction, and drug discovery. This recent impact creates a need to make these techniques and their accompanying visualization schemes available to a broad range of biologists. Here we present a service-oriented, end-to-end software framework, CDAPS (Community Detection APplication and Service), that integrates the identification, annotation, visualization, and interrogation of multiscale network communities, accessible within the popular Cytoscape network analysis platform. With novel design principles, CDAPS addresses unmet new challenges, such as identifying hierarchical community structures, comparison of outputs generated from diverse network resources, and easy deployment of new algorithms, to facilitate community-sourced science. We demonstrate that the CDAPS framework can be applied to high-throughput protein-protein interaction networks to gain novel insights, such as the identification of putative new members of known protein complexes.

Entities:  

Mesh:

Year:  2020        PMID: 33095781      PMCID: PMC7584444          DOI: 10.1371/journal.pcbi.1008239

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


This is a PLOS Computational Biology Software paper.

Introduction

One of the fundamental features of a complex network is the notion of community, which can be defined as a group of nodes that are more densely connected with each other than they are to the rest of the network. Community detection is a class of pattern recognition methods that assign network nodes to groups, or communities, based on the network‘s structural organization. As an important technique to probe the structural organization of a complex network, it has been successfully applied to many problem domains in systems biology, such as identifying protein complexes[1-4], cataloging ‘omics profiles[5-8], and prioritizing new disease genes[9-11]. Clustering, which relates to community detection at a single scale, is a well-established technique in network analysis and is supported by many applications such as Clustermaker2[12], CytoCluster[13], ClusterViz[14], and MCODE[15], which have been made available in bioinformatics environments like Cytoscape[16]. Recent work in multiscale network community detection makes it possible to build hierarchical representations of biological structure and function directly from networks[17-19]. These developments create a new challenge: to make multiscale community detection techniques and their accompanying visualization schemes available to a broad range of biologists. Here, we present a software infrastructure to address these challenges, termed CDAPS (Community Detection APplication and Service), deployed as an App in the Cytoscape platform for network analysis. CDAPS has several novel aspects in its design relative to existing applications in the field of bioinformatics. First, beyond assigning nodes in an input network to a set of clusters, multiscale community detection in CDAPS summarizes a network’s underlying multiscale structure in a separate hierarchical ontology, where each node represents a community. Second, CDAPS invokes community detection and functional enrichment algorithms via REST (REpresentational State Transfer) queries to a remote service outside of the Java-based Cytoscape desktop environment (). On that service, community detection algorithms are encapsulated using Docker containers[20], enabling them to run in their native language and environment. Further, the use of remote servers allows incorporation of new algorithms without the need to update the App, and it removes the computational overhead of running these algorithms in the Cytoscape desktop application. As a whole, CDAPS provides an end-to-end pipeline that creates, annotates, visualizes, and interrogates hierarchical models based on multiscale patterns in networks.

Overview of the CDAPS workflow.

A flowchart illustrating various components of the CDAPS framework and the data flow between them. The CD app can send edges of an interaction network for community detection to the CD service. The service then launches the corresponding Docker image which executes the required task and sends back a hierarchy network to the CD app for visualization and community labeling.

Design and implementation

The CDAPS framework contains two major components: CD App and CD service. CD App is part of the Cytoscape desktop application and acts as a client to the CD service. The CD service, as a typical service-oriented architecture [21], performs the community detection computation on a remote server. In the CD App, the user chooses a community detection algorithm to be applied to a given interaction network and then the pertinent data is sent to the CD service, where a Docker image hosting the algorithm is invoked to produce a hierarchical network. The results are returned to the CD App, which generates a new hierarchy network, or ontology, which contains information of the resulting communities (as nodes) and their hierarchical relationships (as edges). The whole framework provides efficient interfaces among different modular components of the pipeline ().

CD App

CD App is an App for Cytoscape [16] which acts as a client for the CD service, providing a convenient user interface through menu options. Below we briefly describe its functions: Community detection: In this initial release, we provide four algorithms: Louvain [22], Infomap [23,24], OSLOM [25], and CliXO [26]. Louvain, Infomap, and OSLOM are popular community detection algorithms which are highly scalable and capable of organizing multiscale communities as a hierarchy. We also include our previously described CliXO algorithm, which uniquely organizes communities hierarchically in the form of a directed acyclic graph (DAG), i.e. a community is allowed to be part of different overlapping communities. These algorithms have previously been implemented without graphical user interfaces, making it difficult for new users to tune their parameters. Through CDAPS, we provide an interactive interface to tune parameters, such as the “resolution parameters” in Louvain (via Reichardt-Bornholdt configuration model [27] implemented in http://github.com/vtraag/louvain-igraph), Infomap (via the Markov time parameter in the Infomap package [28]), and OSLOM that control the granularity of the resulting communities. Because community detection is an active research field, and many new algorithms are proposed every year [29], the CD app is designed to be easily extended by packaging new algorithms as Docker images in a standardized format and adding them to the CD service (see details in "CD service"). Functional enrichment: One of the key downstream analyses of community detection is to annotate the resultant protein communities by determining their overlap with gene sets associated with known functional or biological entities. We currently provide interfaces with the following external functional enrichment tools: g:Profiler [30], Enrichr [31], and iQuery (iquery.ndexbio.org). These tools can be applied to all or a subset of communities. Interaction sub-networks: We also provide a feature to enable interrogation of the communities in the hierarchical model by retrieving the subgraph of the interaction network corresponding to the genes in a given community. Functional enrichment and sub-network interrogation can be invoked by menu options associated with any node (i.e. a community) in the hierarchy network. The CDAPS framework considers a hierarchy network and its source interaction network as two separate entities. We link these networks by keeping a simple reference to the source interaction network as an attribute of the hierarchy network. For a given interaction network, users can generate many alternative hierarchies with different algorithms and parameter settings, and they can save and manage them independently. For reproducibility, the CD App stores the name of the algorithms and the accompanying parameters as an attribute of the hierarchy network.

CD service

CD service is a REST web service that acts as a bridge between the Cytoscape desktop app and the hierarchical community detection algorithms. The service provides a platform-independent endpoint that is accessible to CD App and other tools such as a website or Jupyter notebook [32]. To facilitate easy installation and reproducibility, community detection algorithms are packaged as Docker images that are called by the CD Service on remote servers (). An additional advantage of using remote servers is to avoid problems on platforms that do not support direct invocation of Docker images. The infrastructure of the CD service facilitates updates and extends the capabilities of the CD app by enabling new Docker images to be incorporated into the CD service to support additional algorithms. Updates to the CD service are transparent to users because the CD App automatically updates its interface by querying the service to find the latest algorithms and their parameters. In addition to the community detection algorithms, the CD Service interfaces to external functional enrichment tools to find annotations that can help label a community and provide insight into its function (see the “CD App” section above).

Results

As proof of concept, we explored a workflow for creating a hierarchical model from a large protein interaction network in Cytoscape. Without loss of generality, we chose to use OSLOM [25] as the community detection algorithm and BioPlex 2.0 [4] as the interaction network. BioPlex is a physical protein-protein interaction network containing 10,961 proteins and 56,553 interactions determined by the affinity-purification mass-spectrometry (AP-MS) technique. The goal of the analysis was to reveal meaningful structures of this complex network, as well as to discover de novo protein modules or novel extensions of previously identified protein modules. We then used a functional enrichment tool, g:Profiler [30], to generate labels for communities based on significant overlap with gene sets in databases. We describe the steps of this workflow as follows. Download and install Cytoscape from http://www.cytoscape.org/. Start Cytoscape and instantiate a new session. Install the app. Apps > App Manager > Select “CyCommunityDetection” > Install. Import the example network. Click NDEx icon > Import Network from NDEx > Search “BioPlex” > Select "BioPlex 2.0 (56,000 interactions)" > Close. The DOI of this network is http://doi.org/10.18119/N91597. Execute OSLOM for BioPlex (). Click Apps > Community Detection > Run Community Detection > Select “OSLOM” from the algorithm drop-down > Set “coverage parameter” to 0.2 > Set “random number seed” to 1 > Run. This step creates a hierarchy network. Note that community detection algorithms are in general stochastic, and thus they may generate different results if the random seed parameter is not fixed. The parameters of community detection, including (if applicable) the random seed, are stored as a property of this hierarchical network to ensure the reproducibility of this workflow (column “description” of network table). Run functional enrichment on the hierarchy network. Click Apps > Community Detection > Run Functional Enrichment > Select g:Profiler from the algorithm drop-down > Set “minimum overlap” to 0.05 > Run. This step adds annotations to the hierarchy network (). Without loss of generality, with the hierarchy network in the viewport, type “mediator” in the search box at the upper-right corner of Cytoscape to highlight the two nodes annotated as “mediator complex”. This is because the “mediator complex” is the best-matched gene set in reference databases of g:Profiler for both communities (). Between these two communities, the smaller one with 51 genes better matches the reference version of the “mediator complex” (CORUM:230; Jaccard index 0.64; relevant information can be found in columns “CD_AnnotatedMembers_SourceTerm” and “CD_AnnotatedMembers_Overlap”). For this node, copy the content in the attribute “CD_NonAnnotatedMembers” in the “Node Table” of the hierarchy network, which represents a subset of proteins in the community that is not recorded in the database entry of “mediator complex” queried by g:Profiler (i.e. CORUM:230 in this example). Right-click the node > Apps > Community Detection > View Interactions for Selected Node. It creates a sub-network of the BioPlex 2.0, which contains only the proteins in this community (51 nodes, 269 edges) (). With the interaction sub-network in the viewport, paste the string copied from Step 8 in the search box above the upper right corner of the viewport. It highlights some nodes in the interaction sub-network. From the results, it can be noticed that several proteins that have not been known as members of the mediator complex (e.g. BIRC5, LZIC, NR1I3, TCL1B) have many interactions with known mediator complex members [33] (). These proteins are putative novel components of the mediator complex based on the BioPlex 2.0 network.

Run community detection on a protein-protein interaction network.

A Cytoscape example screenshot showing the BioPlex 2.0 network [4] imported from NDEx (http://doi.org/10.18119/N91597), along with settings for running OSLOM [25]. Related to steps 4–5 in the example workflow.

Results of hierarchical community detection and downstream analyses.

(A) A hierarchy derived from the BioPlex 2.0 [4] interaction network with OSLOM [25] and annotated by g:Profiler. Red rectangle: a subregion in the hierarchy related to the mediator complex. (B) The menu to set up functional annotation with g:Profiler. (C) A zoom-in view of the sub-hierarchy highlighted in panel A. Red rectangle: a community to be further explored. (D) The interaction subnetwork of the community highlighted in panel C. Proteins, not known as members of the mediator complex, are highlighted (yellow). Related to steps 6–11 in the example workflow. Node color: The overlap (Jaccard Index) between a community and the gene set corresponding to its annotation (panels A and C). Networks can be found on NDEx: http://www.ndexbio.org/#/network/380d5904-bd9f-11ea-aaef-0ac135e8bacf (panel A). http://www.ndexbio.org/#/network/28fdd535-bbe5-11ea-aaef-0ac135e8bacf (panel D). In support of their association with Mediator, BIRC5 and NR1I3 have reported roles as transcriptional activators [34,35], in agreement with the function of the Mediator complex which transduces signals from transcriptional activators bound to enhancers to the basal transcription machinery (including RNA polymerase II) [33], suggesting that these physical interactions are functionally relevant. To test the performance of the CDAPS framework, we ran three community detection algorithms, Louvain, Infomap, and OSLOM [22,23,25], available through CDAPS, on popular biological networks and recorded the time taken by the algorithms inside the CD service along with the runtime corresponding to overhead (time taken transmitting the network and visualizing/constructing the resultant hierarchy). We observed that the fraction of time spent on overhead decreases sharply as the total runtime increases, suggesting that our implementation of CDAPS efficiently integrates the algorithms into the Cytoscape interface. For instance, for the Bioplex 2.0 network [4], the total runtime with OSLOM is about 43.85 seconds, including the CDAPS overhead time of 0.99 seconds, 2.3% of the total runtime (). In general, for any network for which the total runtime exceeds 50 seconds, the time overhead of running a community detection algorithm through CDAPS has a median of less than 1%. The CD service is currently hosted on an Intel(R) Xeon(R) E5-2687W v3 processor with 32 GB of memory. All the Docker containers are run with the default configuration and, thus, each container has all the resources of the host processor. *T1 indicates the time taken by a community detection algorithm inside the CD Service. †T2 indicates T1 plus the network transmission time to and from the CD Service. ‡T3 indicates T2 plus network creation time in Cytoscape.

Availability and future directions

Flexibility and extensibility are the two main strengths of the CDAPS framework, allowing us and others to continue to grow and evolve the repertoire of community detection algorithms available to users. The use of Docker images to encapsulate algorithms will reduce the need for customization and simplify their deployment to the service. As new algorithms are deployed to CDAPS, the CD App will fetch the list of currently available algorithms from the CD service and update the CD app interface in Cytoscape without requiring any action on the part of users. To enable a web-based, interactive investigation of hierarchical models of network communities, a compelling future direction is to create a pipeline that connects the CDAPS to our HiView platform [36]. HiView provides an intuitive visualization of hierarchical/nested structures with a powerful “circle packing” layout [37], as well as the ability to use any community-defined gene list to probe the support of the same community in orthogonal network datasets hosted on the NDEx database [38]. We will also apply workflows that use the results of community detection to select particular communities of high interest, such as those that significantly aggregate the disease-related statistical signals of individual genes/proteins extracted from large-scale genomic studies including GWAS (Genome-Wide Association Studies) [11,39] and cancer whole-exome sequencing. The source code of CD App, along with the user documentation, is available at https://github.com/cytoscape/cy-community-detection. CD App can be directly installed through the Cytoscape Application Manager. The codebase of CD service and its documentation is available at https://github.com/cytoscape/communitydetection-rest-server.

Bioplex CDAPS cytoscape session.

This file contains the Bioplex 2.0 interaction network, the derived hierarchy network, and the interaction sub-network used in the worked example. (CYS) Click here for additional data file.

CD App binary.

The java archive file can be directly installed in Cytoscape. (JAR) Click here for additional data file.

CD App user document.

This file contains the steps required to compile CD App source code, install the binary in Cytoscape, and use the application. (PDF) Click here for additional data file.

Test data.

This file contains a list of popular networks, publicly available through NDEx, that were used to test the CDAPS framework. Each test in the file provides the details of parameters used for that particular test and the NDEx URLs for both the input and output networks. (PDF) Click here for additional data file.
Table 1

Run-time performance of CDAPS.

Input Network (#nodes/ #edges)Louvain Runtime (T1*/T2/T3 in s)Infomap Runtime (T1*/T2/T3 in s)OSLOM Runtime (T1*/T2/T3 in s)
Drugbank Database—v4.1 (7,093/13,239)1.09/2.16/2.310.97/1.10/1.4820.79/21.21/21.60
BioPlex 2.0 (10,961/56,553)1.58/2.09/2.381.35/2.17/2.7042.86/43.30/43.85
BioGRID: Protein-Protein Interactions (D. melanogaster) (9,337/61,188)1.92/2.19/2.393.26/4.18/4.4273.15/73.40/73.65
HumanNet—PI (15,351/158,499)2.60/3.18/3.383.34/4.21/4.62382.15/382.66/383.23
STRING—Human Protein Links—High Confidence (Score > = 0.7) (17,185/420,534)4.01/4.39/4.6311.57/12.42/12.89191.02/192.25/193.97
Human Gene Regulatory Network of Mesothelioma (17,490/631,556)8.03/8.56/8.8820.35/21.58/22.11421.13/422.28/422.80
DisGeNet—Gene-Disease-PMID Associations (41,710/1,573,979)11.76/13.29/13.8814.80/16.15/17.10922.33/924.54/940.62

*T1 indicates the time taken by a community detection algorithm inside the CD Service.

†T2 indicates T1 plus the network transmission time to and from the CD Service.

‡T3 indicates T2 plus network creation time in Cytoscape.

  33 in total

1.  Statistical mechanics of community detection.

Authors:  Jörg Reichardt; Stefan Bornholdt
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2006-07-18

Review 2.  Maps of random walks on complex networks reveal community structure.

Authors:  Martin Rosvall; Carl T Bergstrom
Journal:  Proc Natl Acad Sci U S A       Date:  2008-01-23       Impact factor: 11.205

3.  Hierarchical structure and the prediction of missing links in networks.

Authors:  Aaron Clauset; Cristopher Moore; M E J Newman
Journal:  Nature       Date:  2008-05-01       Impact factor: 49.962

Review 4.  Transcription regulation by the Mediator complex.

Authors:  Julie Soutourina
Journal:  Nat Rev Mol Cell Biol       Date:  2017-12-06       Impact factor: 94.444

5.  Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Authors:  Andrew Butler; Paul Hoffman; Peter Smibert; Efthymia Papalexi; Rahul Satija
Journal:  Nat Biotechnol       Date:  2018-04-02       Impact factor: 54.908

6.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Authors:  John H Morris; Leonard Apeltsin; Aaron M Newman; Jan Baumbach; Tobias Wittkop; Gang Su; Gary D Bader; Thomas E Ferrin
Journal:  BMC Bioinformatics       Date:  2011-11-09       Impact factor: 3.307

7.  High-resolution mapping of cancer cell networks using co-functional interactions.

Authors:  Evan A Boyle; Jonathan K Pritchard; William J Greenleaf
Journal:  Mol Syst Biol       Date:  2018-12-20       Impact factor: 11.429

8.  g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).

Authors:  Uku Raudvere; Liis Kolberg; Ivan Kuzmin; Tambet Arak; Priit Adler; Hedi Peterson; Jaak Vilo
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

9.  Inferring gene ontologies from pairwise similarity data.

Authors:  Michael Kramer; Janusz Dutkowski; Michael Yu; Vineet Bafna; Trey Ideker
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

10.  CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

Authors:  Min Li; Dongyan Li; Yu Tang; Fangxiang Wu; Jianxin Wang
Journal:  Int J Mol Sci       Date:  2017-08-31       Impact factor: 5.923

View more
  11 in total

1.  Restriction factor compendium for influenza A virus reveals a mechanism for evasion of autophagy.

Authors:  Laura Martin-Sancho; Shashank Tripathi; Ariel Rodriguez-Frandsen; Lars Pache; Maite Sanchez-Aparicio; Michael J McGregor; Kelsey M Haas; Danielle L Swaney; Thong T Nguyen; João I Mamede; Christopher Churas; Dexter Pratt; Sara B Rosenthal; Laura Riva; Courtney Nguyen; Nish Beltran-Raygoza; Stephen Soonthornvacharin; Guojun Wang; David Jimenez-Morales; Paul D De Jesus; Hong M Moulton; David A Stein; Max W Chang; Chris Benner; Trey Ideker; Randy A Albrecht; Judd F Hultquist; Nevan J Krogan; Adolfo García-Sastre; Sumit K Chanda
Journal:  Nat Microbiol       Date:  2021-09-23       Impact factor: 30.964

2.  MicroRNA expression changes in Parkinson's disease (PD) patients' leukocytes prior to and following deep brain stimulation (DBS).

Authors:  Soreq Lilach; Bergman Hagai; Israel Zvi; Soreq Hermona; Mohamed Wael
Journal:  Am J Neurodegener Dis       Date:  2021-06-15

Review 3.  Mapping the multiscale structure of biological systems.

Authors:  Leah V Schaffer; Trey Ideker
Journal:  Cell Syst       Date:  2021-06-16       Impact factor: 11.091

4.  HiDeF: identifying persistent structures in multiscale 'omics data.

Authors:  Fan Zheng; She Zhang; Christopher Churas; Dexter Pratt; Ivet Bahar; Trey Ideker
Journal:  Genome Biol       Date:  2021-01-07       Impact factor: 13.583

5.  Functional landscape of SARS-CoV-2 cellular restriction.

Authors:  Laura Martin-Sancho; Mary K Lewinski; Lars Pache; Charlotte A Stoneham; Xin Yin; Mark E Becker; Dexter Pratt; Christopher Churas; Sara B Rosenthal; Sophie Liu; Stuart Weston; Paul D De Jesus; Alan M O'Neill; Anshu P Gounder; Courtney Nguyen; Yuan Pu; Heather M Curry; Aaron L Oom; Lisa Miorin; Ariel Rodriguez-Frandsen; Fan Zheng; Chunxiang Wu; Yong Xiong; Matthew Urbanowski; Megan L Shaw; Max W Chang; Christopher Benner; Thomas J Hope; Matthew B Frieman; Adolfo García-Sastre; Trey Ideker; Judd F Hultquist; John Guatelli; Sumit K Chanda
Journal:  Mol Cell       Date:  2021-04-13       Impact factor: 17.970

6.  Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction.

Authors:  Richard Karlsson Linnér; Travis T Mallard; Peter B Barr; Sandra Sanchez-Roige; James W Madole; Morgan N Driver; Holly E Poore; Ronald de Vlaming; Andrew D Grotzinger; Jorim J Tielbeek; Emma C Johnson; Mengzhen Liu; Sara Brin Rosenthal; Trey Ideker; Hang Zhou; Rachel L Kember; Joëlle A Pasman; Karin J H Verweij; Dajiang J Liu; Scott Vrieze; Henry R Kranzler; Joel Gelernter; Kathleen Mullan Harris; Elliot M Tucker-Drob; Irwin D Waldman; Abraham A Palmer; K Paige Harden; Philipp D Koellinger; Danielle M Dick
Journal:  Nat Neurosci       Date:  2021-08-26       Impact factor: 24.884

7.  Transcriptome-wide changes in gene expression, splicing, and lncRNAs in response to a live attenuated dengue virus vaccine.

Authors:  Eun-Young Kim; Yan Che; Hansi J Dean; Ramon Lorenzo-Redondo; Michael Stewart; Caroline K Keller; Daniel Whorf; Dawson Mills; Nikita N Dulin; Tiffany Kim; Megan Votoupal; Miriam Walter; Ana Fernandez-Sesma; Heejin Kim; Steven M Wolinsky
Journal:  Cell Rep       Date:  2022-02-08       Impact factor: 9.423

8.  Mapping the gene network landscape of Alzheimer's disease through integrating genomics and transcriptomics.

Authors:  Sara Brin Rosenthal; Hao Wang; Da Shi; Cin Liu; Ruben Abagyan; Linda K McEvoy; Chi-Hua Chen
Journal:  PLoS Comput Biol       Date:  2022-02-25       Impact factor: 4.475

9.  A BioID-derived proximity interactome for SARS-CoV-2 proteins.

Authors:  Danielle G May; Laura Martin-Sancho; Valesca Anschau; Sophie Liu; Rachel J Chrisopulos; Kelsey L Scott; Charles T Halfmann; Ramon Díaz Peña; Dexter Pratt; Alexandre R Campos; Kyle J Roux
Journal:  bioRxiv       Date:  2021-09-21

10.  A BioID-Derived Proximity Interactome for SARS-CoV-2 Proteins.

Authors:  Danielle G May; Laura Martin-Sancho; Valesca Anschau; Sophie Liu; Rachel J Chrisopulos; Kelsey L Scott; Charles T Halfmann; Ramon Díaz Peña; Dexter Pratt; Alexandre R Campos; Kyle J Roux
Journal:  Viruses       Date:  2022-03-15       Impact factor: 5.048

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.