Literature DB >> 24027418

SBEToolbox: A Matlab Toolbox for Biological Network Analysis.

Kranti Konganti1, Gang Wang, Ence Yang, James J Cai.   

Abstract

We present SBEToolbox (Systems Biology and Evolution Toolbox), an open-source Matlab toolbox for biological network analysis. It takes a network file as input, calculates a variety of centralities and topological metrics, clusters nodes into modules, and displays the network using different graph layout algorithms. Straightforward implementation and the inclusion of high-level functions allow the functionality to be easily extended or tailored through developing custom plugins. SBEGUI, a menu-driven graphical user interface (GUI) of SBEToolbox, enables easy access to various network and graph algorithms for programmers and non-programmers alike. All source code and sample data are freely available at https://github.com/biocoder/SBEToolbox/releases.

Entities:  

Keywords:  Matlab toolbox; biological network; network evolution; node centrality

Year:  2013        PMID: 24027418      PMCID: PMC3767578          DOI: 10.4137/EBO.S12012

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

The complexity of biological systems represents an enormous intellectual challenge to researchers who mostly remain reliant on correlative approaches to biology. The volumes of biological network data gathered in systems biology have outpaced their ability to assimilate them into real knowledge and the requirement of software tools for analyzing network data is continually growing. Network analysis software usually transforms the network data into a graph framework in order to take distinct advantages of being able to adopt techniques developed in graph theory, engineering, and computer science. In our opinion, good network analysis software should include a sufficient number of graph algorithms being efficiently implemented as functions, which are flexible enough to be extended easily, and accessible through easy-to-use user interfaces (UIs). With such software, it is possible for users without programming expertise to directly relate specific biological interactions with the network properties and dynamics. In the following sections, we describe how we designed Systems Biology and Evolution Toolbox (SBEToolbox), subsequently discuss various algorithms implemented for network analysis, and finally mention the advantages as well as potential drawbacks of our software and point out our future research roadmap.

Implementation

SBEToolbox is implemented using a combination of native Matlab code and MEX/C functions based on the Boost Graph Library (Fig. 1).1 Functions developed in native code can be classified into: (1) core functions that execute the tasks organized under SBEToolbox’s menu; and (2) helper functions that automate repetitive tasks and assist core functions. SBEToolbox can read and write network information in three commonly used network file formats: tab-delimited, SIF, and Pajek. It saves the network information on disk in a Matlab MAT-file for each working session as an n × n sparse adjacency matrix representing the network of n nodes. The adjacency matrix is loaded into the variable named sbeG, and the node information is stored in a cell string vector sbeNode. SBEGUI is a figure window to which user-operated controls are organized as a drop-down menu to access core functions of the SBEToolbox (Fig. 2). All major functions can be accessed from the menu and it is easy and simple for the end-users to load a network file, compute statistic measures for the network and its nodes, detect highly connected node clusters (or modules) using graph clustering algorithms, and evolve the networks. Different graph layout algorithms were either implemented natively (Random, Circle, and Tree Ring) or incorporated from Matlab BGL (Kamada-Kawai Spring, Gürsoy Atun, and Fruchterman-Reingold) so that small- to medium-sized networks can be plotted and manipulated as standard figures using Matlab’s built-in figure controls. Additionally, graphs can be exported to external network analysis tools such as Cytoscape2 and Pajek3 for further analysis, and visualization libraries such as Protovis and Sigmajs for further illustration. Since network information is written to disk and network variables remain unchanged between sessions, SBEToolbox is highly customizable through the use of SBE-plugins. New functions can be developed from built in template code as SBE-plugins. We define SBE-plugins as custom functions, which can easily access the network information for current working session that will work on the command line, and can also be easily incorporated into SBEGUI by using built-in plugin management tools. Links to screencasts on: (1) how to install SBEToolbox; (2) how to detect, visualize and export the network modules, as well as links to WIKI pages describing SBE-plugin development are provided in the README file with the software download.
Figure 1

Structure and organization of SBEToolbox and related software components.

Figure 2

SBEGUI—the main interface of SBEToolbox (left) and an overview visualization of detected network modules (ie, clusters of highly connected nodes) (right). The modules were detected by using the MCL algorithm.

Results

Our main strategy for developing this new network analysis tool is to use Matlab, which has been one of the default choices of programming language for efficiently dealing with matrix manipulations. Matlab itself is a “merging” language that can seamlessly integrate with many other languages such as C and Java, and functions written in Matlab have features that stay on the continuum between low-and high-level languages. In addition, Matlab has a sophisticated plotting library, and the software tools developed in Matlab inherit these distinct characteristics; however, to our knowledge, a Matlab toolbox specifically designed for biologists with graphical user interfaces to conduct comprehensive network analyses is still missing. SBEToolbox intends to fill this gap.

Main functions

SBEToolbox covers a wide range of algorithms for computing network statistics. These algorithms include commonly used ones, such as betweenness centrality, clustering coefficient, and closeness centrality, as well as newly developed ones, such as bridging centrality,4 Soffer’s clustering coefficient,5 and brokering coefficient.6 Statistics that can be computed using the SBEToolbox also include local average connectivity, core number, graph mean distance, graph diameter, graph efficiency, current information flow,7 neighborhood connectivity,8 participation coefficient, 9 rich-club coefficient,10 and so on. Random networks can be generated using Erdös-Réyni, small-world, and ring lattice algorithms. SBEToolbox’s module-detecting functions clusters nodes into highly connected subnetworks or modules using three different algorithms: MCODE;11 ClusterOne;12 and MCL.13 MCODE is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed node to isolate the dense regions, whereas MCL is based on the simulation of stochastic flow in a graph. ClusterOne generates overlapping clusters and has been shown to outperform MCODE and MCL in predicting members in protein complexes.12 Our toolbox includes all three algorithms to facilitate the users who intend to compare predictions with different algorithms. Cluster membership information of nodes is displayed in the output window and detected modules can be plotted. SBEToolbox contains a collection of general-purpose functions that can facilitate applications of specific methods to real datasets. For example, input and output functions allow file conversion between different formats, and plot functions allow network visualization using interfaces to external programs such as Cytoscope2 and Pajek3 (http://pajek.imfm.si). It also uses Java Script libraries, Protovis (http://mbostock.github.com/protovis), and Sigmajs (http://sigmajs.org) to render scalable vector graphic (SVG) plots that can be displayed in web browsers. These third-party applications are sandboxed with the software to maintain integrity and minimalize failure. All functions are linearly implemented to solve a particular task, and the links between functions are through input and output variables. This simple design allows users to extend the functionality, as well as to implement and integrate their own functions into the software as SBE-plugins more easily. For example, we developed a plugin for a network motif detection tool, mfinder using built-in plugin management tools. Helper functions of the software that communicate to different external programs can be customized and reused to incorporate other external programs according to each user’s necessity.

Network evolution

Using SBEToolbox, users can simulate the evolution of a network as a stochastic process involving node duplication, node loss, and edge rewiring (Fig. 3). The users control the simulated evolutionary process with three parameters: (1) the number of generations, g, which is the number of nonoverlapping steps for which the simulation should run; (2) evolutionary rate, r, which is the rate of node duplication (or node loss), where r is the total number of nodes – or in case of edge rewiring, it is the total number of edges; and (3) fixation probability, p, which is the likelihood of a designated evolutionary event (for example, node duplication, node loss, or edge rewiring) becoming fixed per generation.
Figure 3

Four modes of network evolution implemented in the SBEToolbox: (1) preferential attachment; (2) node loss; (3) node duplication; and (4) edge-rewiring.

Feature comparison between network analysis toolboxes

Many valuable software tools have been developed both within and outside the discipline of systems biology. Some of the similar toolboxes to SBEToolbox that have been developed in Matlab are: Functional Genomics Assistant (FUGA),14 Brain Connectivity Toolbox (BCT),15 and Mathworks Bioinformatics Toolbox (MBT). The current version of MBT has a few basic graph theory algorithms, but it does not have functions for any kind of statistical analysis. BCT and FUGA have a good number of statistical analysis functions, and the latter infers network through expression analysis and provides annotation. However, both of these toolboxes lack a unified solution for integrating graphic user interfaces, network evolution, and the ability for users to prototype and share custom functions through plugin distribution. A major feature comparison between these toolboxes is given in Table 1. Several non-Matlab-based tools also exist for network analysis and visualization. We believe that Matlab’s built-in functions allow for rapid prototyping of new algorithms, and its efficient handling of data manipulation characteristics can be easily leveraged and extended using SBEToolbox. A feature comparison between SBEToolbox and other non-Matlab-based network analysis tools is provided in Supplementary Table 1.
Table 1

Feature comparison between SBEToolbox and relevant Matlab-based toolboxes FUGA (Functional Genomics Assistant), BCT (Brain Connectivity Toolbox), and MBT (Mathworks Bioinformatics Toolbox).

SBEToolboxFUGABCTMBT
Centrality calculation
Module detection
Node (gene) annotation
Network evolution
GUI
Programmable plugins
Plugin management tools

Scalability and performance of SBEToolbox

Efficient implementation of SBEToolbox’s functions results in minimal usage of memory and disk space when working with a network file, which allows users to handle moderately large-scale network data on a standard desktop computer. To analyze a global human physical protein interaction network16 containing about 10,000 nodes and more than 80,000 edges, about 850 MB of memory was required and network file stored for the working session occupied about 300 bytes of disk space; conversely, for a random Erdös-Réyni network containing 10,000 nodes and more than 450,000 edges, about 2 GB memory was required to complete all the analyses, while the actual network file stored on the disk for the session occupied approximately 1 MB of disk space. For both the networks, all of the core functions finished in less than 10 minutes. For even larger networks, the parallel computing ability of Matlab can be easily leveraged to solve the problems caused by the high requirement of computational resources. We set out to compare SBEToolbox with similar Matlab toolboxes mentioned in Table 1 in terms of scalability and performance. Since these toolboxes have different profiles, a couple of common and important network topological metrics, betweenness centrality and clustering coefficient, were chosen to test memory usage and computation times on networks of varying node and edge sizes. MBT was excluded from the test as it does not have functions for computing the considered statistics. We noticed that SBEToolbox, FUGA, and BCT used similar amounts of memory for computing these two statistics, which varied approximately from a minimum of 40 MB to a maximum of 200 MB. With the help of built-in Matlab time functions, we saw a major difference in computation times between these toolboxes. In both these tests, SBEToolbox’s computation time was much faster than FUGA and BCT (Supplementary Fig. 1). BCT was not even able to finish computing betweenness centrality for a small network of about 1,000 nodes in reasonable time. All of the analyses and tests were run on a Macintosh OS X (10.7 Lion) laptop computer with 4 GB of RAM and a 1.7 GHz Intel 64-bit processor.

Numerical validation of native Matlab code for the MCL algorithm

Taking advantage of built-in matrix functions available in Matlab, we were able to implement the MCL algorithm13 natively in less than 50 lines of code. To validate our implementation, we compared the clustering results obtained by native code with those obtained by using the mcl program (based on C) available in the MCL-edge software (http://micans.org/mcl/index.html). The comparison was performed with a network of 330 nodes obtained from the study of Ideker et al.17 (This dataset is available in the example_dataset folder provided with SBEToolbox as a .sif format file, galFiltered_330_nodes.sif). First, the MCL-edge source code was downloaded and compiled on a Mac OSX with i64 architecture. The mcxload binary was used to convert the .sif format file to .mci format file to run mcl with default options, which resulted in a total of 97 clusters. Next, MCL was executed from SBEGUI, which also found exactly 97 clusters with a minor difference. The difference in the number of n-node clusters identified between the two versions of MCL algorithm implementation is indicated in Supplementary Table 2. Two implementations produced nearly identical results: SBEToolbox’s mcl.m resulted in 38 two-node clusters and 28 three-node clusters, whereas MCL-edge’s mcl resulted in 39 two-node clusters and 27 two-node clusters. SBEToolbox’s mcl.m found an extra three-node cluster consisting of nodes YER079W, YKL204W, and YNL154C. Node YER079W is also shared with another three-node cluster, which contains nodes YER079W, YHR135C, and YNL116W, which was also identified by MCL-edge’s mcl (Supplementary Fig. 2). The two-node cluster (YNL154C, YKL204W) identified by MCL-edge’s mcl was absent from SBEToolbox’s mcl.m results. Apart from this minor difference, the number of n-node clusters and all the nodes participating in each cluster were exactly identified in both cases.

Example application of SBEToolbox in characterizing disease genes

Genes that underlie human inherited diseases are important subjects in systems biology research. We have previously demonstrated in detail, using SBEToolbox in analyzing the human protein–protein interaction data,16 that in order to reveal important network properties of disease genes, providing new insights to the origin and etiology of disease is necessary.6 Here, we briefly summarize the key points of our discoveries. We introduced a new statistical measure named the brokering coefficient and used this statistic to discern between disease and nondisease genes based on their distinct network properties. The brokering coefficient is a composite metric (ie, for each node, it is calculated as log(d) − log(c), which is the difference between the log-transformed degree, d, and the clustering coefficient, c). In a network, a node (or gene) with a large brokering coefficient tends to have more neighbor nodes, while the number of connections between these neighbor nodes themselves tends to be small. Based on our analysis, disease genes have unusually higher degrees and lower clustering coefficients (ie, larger brokering coefficient) than nondisease genes.6 Thus, disease genes are more likely to be broker genes in networks, in that they connect many other proteins that would not be connected otherwise.

Availability and System Requirements

All versions of SBEToolbox can be freely downloaded from https://github.com/biocoder/SBEToolbox/releases. Users can submit bugs and follow the development cycle of our toolbox at https://github.com/biocoder/SBEToolbox/issues. The minimum requirements for the software are: Matlab: The SBEToolbox has been developed in Matlab version R2012b and makes use of all the improvements made to the core Matlab. Although, the codebase works in previous versions as well, some new features may be incompatible. Disk space: Approximately 200 MB of disk space is needed for installation, most of which is due to sandboxed third-party applications and annotation databases. Memory: We recommend a minimum of 4 GB of random-access memory for faster computations, although this is not mandatory. Central processing unit: 1.5 GHz processor or better. Windows XP or newer, Mac OS X 10.6 or newer with i64 architecture, Linux.

Conclusion

SBEToolbox is flexible, easy-to-use, and highly customizable from our point of view, and it provides researchers with an interactive tool to explore biological networks, as well as to compute centralities and topological statistics for the networks. The output of a function is displayed in the output window and can be saved as a file, copied to clipboard, or exported as a variable to the Matlab workspace. The extensive plotting ability of Matlab allows users to create publication-quality plots. We strongly believe that the extensibility of the software through a standardized SBE-plugin protocol will increase the functionality of the toolbox and will be a great resource for the systems biology research community using the Matlab system to develop new algorithms and generate hypotheses from network datasets. While currently SBEToolbox only supports undirected networks, in an ongoing effort, we plan to add support for weighted networks.

Supplementary Materials

Supplementary Data: SupplementaryData.pdf. Supplementary Table 1: Feature comparison between SBEToolbox and non-Matlab-based network analysis software. Supplementary Table 2: Comparison of MCL results obtained by using native Matlab code and MCL-edge program. Supplementary Figure 1: Comparison of computation times between SBEToolbox, BCT and FUGA for betweenness centrality and clustering coefficient. Supplementary Figure 2: Demonstration of different clustering of 5 nodes in SBEToolbox’s mcl.m versus MCL-edge.
  12 in total

1.  Specificity and stability in topology of protein networks.

Authors:  Sergei Maslov; Kim Sneppen
Journal:  Science       Date:  2002-05-03       Impact factor: 47.728

2.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

3.  Functional cartography of complex metabolic networks.

Authors:  Roger Guimerà; Luís A Nunes Amaral
Journal:  Nature       Date:  2005-02-24       Impact factor: 49.962

4.  Network clustering coefficient without degree-correlation biases.

Authors:  Sara Nadiv Soffer; Alexei Vázquez
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2005-05-13

5.  Complex network measures of brain connectivity: uses and interpretations.

Authors:  Mikail Rubinov; Olaf Sporns
Journal:  Neuroimage       Date:  2009-10-09       Impact factor: 6.556

6.  Detecting overlapping protein complexes in protein-protein interaction networks.

Authors:  Tamás Nepusz; Haiyuan Yu; Alberto Paccanaro
Journal:  Nat Methods       Date:  2012-03-18       Impact factor: 28.547

7.  An automated method for finding molecular complexes in large protein interaction networks.

Authors:  Gary D Bader; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-01-13       Impact factor: 3.169

8.  Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks.

Authors:  Ignat Drozdov; Christos A Ouzounis; Ajay M Shah; Sophia Tsoka
Journal:  BMC Res Notes       Date:  2011-10-28

9.  Information flow analysis of interactome networks.

Authors:  Patrycja Vasilyev Missiuro; Kesheng Liu; Lihua Zou; Brian C Ross; Guoyan Zhao; Jun S Liu; Hui Ge
Journal:  PLoS Comput Biol       Date:  2009-04-10       Impact factor: 4.475

10.  Tissue specificity and the human protein interaction network.

Authors:  Alice Bossi; Ben Lehner
Journal:  Mol Syst Biol       Date:  2009-04-07       Impact factor: 11.429

View more
  14 in total

1.  A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection.

Authors:  Michelle Carey; Juan Camilo Ramírez; Shuang Wu; Hulin Wu
Journal:  Stat Methods Med Res       Date:  2018-07       Impact factor: 3.021

2.  Identify potential drugs for cardiovascular diseases caused by stress-induced genes in vascular smooth muscle cells.

Authors:  Chien-Hung Huang; Jin-Shuei Ciou; Shun-Tsung Chen; Victor C Kok; Yi Chung; Jeffrey J P Tsai; Nilubon Kurubanjerdjit; Chi-Ying F Huang; Ka-Lok Ng
Journal:  PeerJ       Date:  2016-09-28       Impact factor: 2.984

3.  Population-level expression variability of mitochondrial DNA-encoded genes in humans.

Authors:  Gang Wang; Ence Yang; Ishita Mandhan; Candice L Brinkmeyer-Langford; James J Cai
Journal:  Eur J Hum Genet       Date:  2014-01-08       Impact factor: 4.246

4.  Exploiting aberrant mRNA expression in autism for gene discovery and diagnosis.

Authors:  Jinting Guan; Ence Yang; Jizhou Yang; Yong Zeng; Guoli Ji; James J Cai
Journal:  Hum Genet       Date:  2016-04-30       Impact factor: 4.132

5.  Graph theory and stability analysis of protein complex interaction networks.

Authors:  Chien-Hung Huang; Teng-Hung Chen; Ka-Lok Ng
Journal:  IET Syst Biol       Date:  2016-04       Impact factor: 1.615

6.  A Rich-Club Organization in Brain Ischemia Protein Interaction Network.

Authors:  Ali Alawieh; Zahraa Sabra; Mohammed Sabra; Stephen Tomlinson; Fadi A Zaraket
Journal:  Sci Rep       Date:  2015-08-27       Impact factor: 4.379

7.  Molecular Architecture of Spinal Cord Injury Protein Interaction Network.

Authors:  Ali Alawieh; Mohammed Sabra; Zahraa Sabra; Stephen Tomlinson; Fadi A Zaraket
Journal:  PLoS One       Date:  2015-08-04       Impact factor: 3.240

8.  Increased segregation of brain networks in focal epilepsy: An fMRI graph theory finding.

Authors:  Mangor Pedersen; Amir H Omidvarnia; Jennifer M Walz; Graeme D Jackson
Journal:  Neuroimage Clin       Date:  2015-05-22       Impact factor: 4.881

9.  Transcription factor and microRNA-regulated network motifs for cancer and signal transduction networks.

Authors:  Wen-Tsong Hsieh; Ke-Rung Tzeng; Jin-Shuei Ciou; Jeffrey Jp Tsai; Nilubon Kurubanjerdjit; Chien-Hung Huang; Ka-Lok Ng
Journal:  BMC Syst Biol       Date:  2015-01-21

10.  Methods for analyzing neuronal structure and activity in Caenorhabditis elegans.

Authors:  Scott W Emmons; Eviatar Yemini; Manuel Zimmer
Journal:  Genetics       Date:  2021-08-09       Impact factor: 4.562

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.