Literature DB >> 31015793

CatbNet: A Multi Network Analyzer for Comparing and Analyzing the Topology of Biological Networks.

Ehsan Pournoor¹, Naser Elmi¹, Ali Masoudi-Nejad¹.

Abstract

BACKGROUND: Complexity and dynamicity of biological events is a reason to use comprehen-sive and holistic approaches to deal with their difficulty. Currently with advances in omics data genera-tion, network-based approaches are used frequently in different areas of computational biology and bio-informatics to solve problems in a systematic way. Also, there are many applications and tools for net-work data analysis and manipulation which their goal is to facilitate the way of improving our under-standings of inter/intra cellular interactions.
METHODS: In this article, we introduce CatbNet, a multi network analyzer application which is prepared for network comparison objectives. RESULT AND
CONCLUSION: CatbNet uses many topological features of networks to compare their structure and foundations. One of the most prominent properties of this application is classified network analysis in which groups of networks are compared with each other.

Entities: Chemical

Keywords: Bioinformatics; Biological researches; Network biology; Network comparing; Python; Topological features

Year: 2019 PMID： 31015793 PMCID： PMC6446483 DOI： 10.2174/1389202919666181213101540

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

Introduction

Nowadays, the great technological advances in sequencing methods make a ubiquitous shift of biological researches from traditional low-throughput methods to massively parallel techniques in the fields such as genomics and proteomics. But, at the same time the booming amount of generated data raises some problems in extracting and interpretation of meaningful information out of a wealth of raw data. On the other hand, it is a well-established fact that using network structure for representing the huge amount of biological data is not only useful but also vital in many cases. Generally, using networks to represent and analyze biological data is called network biology. In network biology, vertices of the network can be different elements such as genes, proteins, metabolites, disease and etc. Also edges between different nodes can be based on a wide spectrum of relations such as correlation, physical interaction, reaction and etc. For instance, in disease interactome network, nodes are proteins which are associated to that disease and edges between proteins are physical interaction between them. Finding the most influential or important node in a huge biological network has been a vital question for many researchers in this field. It is noteworthy that, different terms such as hubs, key players, essentiality, lethality and centrality have been used to discuss this concept. Historically, the origin of this concept brings back to social science, but it has been extended to biology since previous decades [1-3]. Up to now, many different centrality measures has been generated to calculate the importance of each node within the network from different perspectives. For instance, in a great article by Jalili et al. [4] a wide variety of centrality measures has been gathered and described in a web server (CentiServer) which is accessible for everyone and also includes the most recent centrality measures. In fact, using this server or the R package of CentiServer, different centrality measures can be calculated for each network. Due to the importance and ever-increasing usage of centrality measures in different applications such as biomarker discovery, disease pathogenesis discovery and drug repurposing, many packages and software have been developed to compute centrality measures for networks in the recent years. Undoubtedly, the java based software Cytoscape [5] is among the most prevalent and user-friendly software among biologists and bioinformaticians. Until now, many plugins have been generated for Cytoscape to broaden its ability for applications of network biology in computational biology. For example, Cytoscape plugin NetworkAnalyzer [6], computes and visualizes a comprehensive set of topological parameters such as network diameter, density, heterogeneity, clustering coefficient, shortest path lengths and etc. Another Cytoscape plugin is cytoHubba [7] which ranks nodes in a network by their features. Another plugin is CytoNCA [8] which supports eight different centrality measures and accept both weighted and unweighted biological networks as input. It is also possible to upload biological information for nodes/edges and integrate them by topological parameters of the network to detect specific nodes. In addition to these Cytoscape plugins, some standalone java programs such as CentiBiN [9] have been generated for ranking vital nodes of biological networks based on some centrality measures and visualizing the network and centrality measures distribution. Also, due to the pervasive usage of R language all over the world, many R packages have been developed in recent years to calculate network topological parameters and visualize networks in the great fashion. For instance, CentiServer [4] R package which has been introduced in 2015, can calculate 55 centrality measures in R environment or using its webserver (http://www.centiserver.org). It is worth mentioning that this webserver includes valuable and comprehensive information about more than a hundred centrality measures and available packages/tools to visualize or analyze networks. Recently another R package CINNA [10] was created to decipher the central informative nodes in the network by integrating different centrality measures. Newly, a comprehensive webtool which is called Network Analysis Provider (NAP) has been developed using R and Shiny to automate network construction and intra/inter network topology comparison between a pair of networks [11]. This webtool can rank nodes and edges based on different centrality measures and also it can be used to compare two networks and extract their intersection and provide high quality plots. NetworkX [12], A library of Python is among the most powerful libraries to analyze big networks in considerable time. This package originally was born in 2002 by Hagberg et al., but the first public release was in 2005. NetworkX is a package for creations, manipulation, analysis of the structure, dynamic and function of complex networks. Using this package, importing a wide variety of standard/nonstandard data formats can be carried out. Furthermore, generating different types of random and classic networks, analyzing their structure, designing new network algorithms, drawing networks and etc. are easily possible. Here, to develop CatbNet, we used Python language and version 1.9.1 of Networkx library for network manipulation and analysis. As it has been mentioned until now, one of the important questions in network biology is how to find the most prominent or vital node. But, this is not the most essential application of network biology. In fact, different biological networks have different topological features which are the cornerstone of the new term in systems biology which is called network medicine [13, 14]. In network medicine, the features of biological networks such as human proteins interactome [15], human disease interactome network [16], PPI or gene co-expression network of disease stages can be widely used to segregate them to meaningful sets. For instance, researchers showed that, network features between case and control samples or different stages of tumors were significantly different [17-19]. Hence, some new studies have been mainly focused on applications of network mining and integrating it with machine learning methods for early diagnosis and also the prognosis the disease. For instance, Jalili et al. [20] used some centrality measures to discriminate Alzheimer's disease patients from healthy controls based on their EEG network using machine learning algorithms. More recently, in a great review [21], the applications of network-based measures and machine learning techniques for precision oncology has been studied. It has been stated that how graph theory algorithms in integration with omics data and machine learning techniques can be used for decoding tumor specific molecular mechanism, finding candidate targets and drug repositioning. In another study, Integration of network features, sequence features and functional features with classification algorithms carried out to identify novel Alzheimer genes [22]. Certainly, for acquiring an acceptable performance and accuracy for machine learning approaches, a large number of samples and intellectually selected features are needed. Hence, there is an unavoidable need for a network manipulation and analyzing tools to analyze a number of networks automatically and generate some statistical measures to select appropriately discriminating features among numerous number of topological features. Furthermore, that is not covered to anybody that, all of the researchers in this field are not mainly programming and network science experts; hence there is a clear need for a tool with a user friendly graphical user interface to facilitate usage of network biology power in biological problems in the omics era. To cover such requirements in network biology, we created a graph mining tool which is able to compute most important network topological features, from node-based measures such as: different centralities to whole network-based measures such as: network density, diameter, for analytical and comparative purposes. We developed CatbNet which contains graphical user interface for biologists with options to choose from input data to output result data. With the aim of CatbNet it is possible for users to analyze and compare multiple networks at once with further statistical and graphical analyses.

Methods and Implementation

Programming Language and Utilized Packages

Nowadays, most statistical tools and utilities are created using Very High-Level Programming Languages (VHLL). VHLLs are languages with strong abstraction of computer details and architecture. Programs written with VHLLs are mostly independent of a particular type of computer and easy to read, write and maintain. They are similar to human natural language and suitable for rapid application development. These languages are mainly used for specific tasks and purposes and often called domain-specific languages. In computational biology and bioinformatics researches, based on the goal and conditions, languages such as R, Perl and Python are used frequently to create packages and tools in the context of biology. Python (https://www.python.org/) is a very high-level programming language which is used in both academic and commercial applications. It is free, open source, platform independent, clean syntax, easy to learn, object-oriented and fast scripting language with large packages and libraries available for different areas of science. All these capabilities led us to use Python as our primary language and its libraries: Networkx [12], Numpy [23], Matplotlib [24], Pandas [25], Scipy [26], PyQt4 (https://pypi.python.org/pypi/PyQt4) and pandas to develop the CatbNet. Networkx is a powerful library created in python for complex network generation, analysis and manipulation. We used Networkx for network handling operations such as network load and topological calculations. Scipy is a python package for mathematics, science and engineering. For statistical and mathematical operations like One-way ANOVA test we used Numpy and Scipy libraries. Matplotlib is another package which is developed for graphical visualization tasks. It is used for 2-D plotting of different charts, histograms and diagrams. In our project, to visualize network features boxplots we used matplotlib integrated with Pandas data structures. Also, for creating the graphical user interface of our application, PyQt4 is used. PyQt4 is a comprehensive set of Python bindings for Digia’s Qt cross platform GUI toolkit.

Load and Initialization

In this application, we made the ability to load network data from four well-known network file formats; GML, GraphML, Edge list and Pajek. It is possible for users to import their data in these formats for further analysis. Imported networks could be weighted, directed or undirected networks, but in every runtime, it is recommended to import networks with the same properties (for example all networks be weighted). However, if there be multiple network types, only common features will be available in output results. In the load step, users must choose the directory in which all network files are stored. All files should be in the same file format and a user must select the format. In this phase, user must specify that networks are weighted, directed or not. In bioinformatics and systems biology tasks, it may occur that users have groups of networks. For example, one group could be patients’ reconstructed co-expression networks and the second could be networks of healthy control group. If user wants to explore these two groups, we created the capability of loading grouped networks. In this case, the option ‘network files are classified in groups’ must be checked. Furthermore, if grouped files are presented, the file naming convention should be observed. The file names must include the class name continued by three underscore characters and the network name (ex. Class1___network1, class2 ___network1). If file names do not obey this rule, networks will be considered alone with no grouping condition. Based on the files which are selected by the user, network data will be loaded and the analysis task will be initialized.

Topological Features of Networks

CatbNet calculates 24 network topological features (Fig. ) for every network and uses the result for its comparison. However, users can select among them for calculation based on prior knowledge or interest. Here we used many meaningful features in biological networks: Number of Nodes, Number of Edges, Largest Connected Component (LCC) size, Avg. Degree centrality, Avg. Closeness Centrality, Avg. Betweenness Centrality, Avg. Load Centrality, Avg. Communicability Centrality, Network Clustering Coefficient, Avg. Connected Component Size, Transitivity, Density, Max Clique size, Degree Assortativity Coefficient, Avg. Degree Connectivity, Avg. Edge Betweenness Centrality, Network Edge Connectivity, Avg. Katz Centrality, Number of Connected Components, Network Diameter, Avg. Eccentricity, Radius, Avg. PageRank and Avg. Shortest Path. As mentioned before, if there be networks in different types (for instance: Multigraph, Directed Graph, Undirected Graph, Bipartite Graph and etc.) only common features will be computed and compared for these networks. Also, there may be cases in which some properties will not be available (for example, it is possible that network be disconnected and network diameter be infinite). In such cases, remainder networks which have valid values will be accounted for.

Examinations

The main goal of CatbNet is to compare many networks with each other and bold their differences. To achieve that, in this application, two different analyses are provided; ordinary and group analyses (next two subsections). Furthermore, there are two types of measures mentioned in the previous section: network-based measures (ex. Network density, diameter) and node-based measures (ex. Node betweenness centrality, node degree). To compare networks with each other, all measures including network-based and node-based measures should be computed.

Ordinary Analysis

In the ordinary analysis, each network is compared with all other networks. To compare two networks, all measures should be calculated for both, then compare their values. Network-based measures are already comparable, but node-based measures are not. For this reason, we convert those to network-based measures by calculating the mean of node based features by arithmetic average. Suppose that m is a node-based measure. For network N, the resulted network-based measure will be: Where m is node-based measure value for node i and n is the number of nodes in network N. By these average values, now it is possible to compare networks.

Group Analysis

If we import grouped data files and consequently have groups of networks (for example, group of patients PPI network and group of healthy PPI networks) and we want to check if there are differences in any of the measures between these two groups, we can use group analysis. In this case, we compare measures for one group against the other. Therefore, for network-based measures we should consider average of network values and for node-based values we consider average of average values. Consider G as a group including p number of networks, for network-based measure of m: Where m is network-based feature value of network i in group G. For the node-based measure m', the mean within group of average node-based measure of the network must be calculated such as following: Where m' is node-based value for node i in network j, q is the number of nodes in network j and p is the number of networks in group G. It is worth mentioning that the standard deviation for node-based and network based measures can be calculated simply.

Output

After CatbNet runtime, depending on the user selection, result files will contain: 1- a *.tsv (tab separated values) file for common node-based measures in all networks (Fig. ). In this file, for all nodes in all networks, node-based features values are presented. 2- Another *.tsv file for all features of networks including network-based and mean of node-based topological measures. In this file, for every network, all common measures are represented (Fig. ). It is possible to have both directed and undirected networks in the input file simultaneously. Therefore, to overcome inconsistency problems we calculate common attributes of networks. Using these files, users can compare networks. 3- Boxplots for each measure in common node-based features (Fig. ). CatbNet creates a boxplot for each feature in common node-based features of all networks. Boxplot is a meaningful graphical tool to show data dispersion of a feature. 4- Boxplots for all measures of groups. For every measure, data dispersion of group networks is plotted in a boxplot. 5- One-way ANOVA test for each node-based measure. For example, to examine if closeness centrality difference for networks is statistically significant or not, CatbNet provides a text file in which for all measures of node-based features One-way ANOVA test results are presented. 6- One-way ANOVA test for group comparison. Note that, before application execution, a user can specify which outputs are useful and interesting for further studies. In order to clarify the steps in CatbNet, a schematic of it has been depicted in Fig. ( as the following.

Fig. (6)

Schematic of steps in CatbNet, from loading data to saving the results.

Applications

For case studies, the disease interactome network was utilized from [16] which contains subnetworks for diseases in the human interactome network. This set of network contains a diverse range of networks with different sizes which all are in the edge list format. For instance, the number of nodes was from 245 up to 3746 and the number of edges from 238 to 6548. These disease networks were divided into three distinct classes with different class numbers. This set of data is downloadable from the GitHub page of this application (https://github.com/LBBSoft/CatbNet/blob/master/test-data.rar.)

Extracting Topological Features for a Set of Networks

Through importing the input file, which contained a set of networks, and selecting the understudy list of features from the options tab of CatbNet (Fig. ), and running the program, the calculated features magnitudes attained. For our case study, a small part of the result has been illustrated in Figs. ( and ).

Fig. (2)

Node-based features of networks nodes in sample execution. In this file for each node in every network, common node-based features between all networks are represented (in the snapshot just some nodes of class1___net1 are observable).

Fig. (3)

All common features of networks. Network-based features are calculated and node-based features are indicated as average values of nodes (for example Avg. Betweenness Centrality).

Comparing Different Networks or Network Groups based on Topological Feature

For our case study, in Fig. ( the closeness centrality dispersion of 24 different networks was used to depict a boxplot. Based on this figure, closeness centrality distribution was different for those networks and the significance of the difference was tested by one-way ANOVA test of the CatbNet. Also, the distribution the four groups is plotted in Fig. ( via boxplot. It is obvious that Class 4 has bigger values for Avg. Closeness Centrality.

Fig. (4)

Different networks data dispersion comparison for closeness centrality using boxplot representation. CatbNet creates such boxplot charts for every common node-based feature.

Fig. (5)

Group comparison for measure Avg. Closeness Centrality. If loaded network data be classified, CatbNet will provide group comparison boxplot. In this case, all network-based and node-based attributes will be compared.

Conclusion

CatbNet is a user-friendly multi network analyzer application. It has been developed using a set of python packages for network analysis and comparison purposes. This application has a graphically GUI to help researchers of different fields to analyze networks especially computational biologists. It calculates 24 centrality measures and topological features for a set of networks simultaneously. Furthermore, it accepts various network formats including GML, GraphML, edge list and Pajek and different types of networks, such as directed/undirected and weighted networks. There is no doubt that, there are numerous valuable packages and applications for calculating different topological features and centrality measures, but to the best of our knowledge, there is no package to simultaneously get a number of networks or a group of networks as input and calculate their features. Another prominent novelty of this application is comparing different networks or group of networks by each other based on the different features within its feature list and calculating statistical significance using ANOVA test and calculating P-value for each comparison. The significance level of each feature in separating networks or network classes can be used as a guide to choose appropriately segregating features among numerous features to distinguish them accurately. In fact, these unique properties made CatbNet a very suitable and useful application for network mining, network medicine and machine learning applications in systems biology.

18 in total

1. Lethality and centrality in protein networks.

Authors: H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal: Nature Date: 2001-05-03 Impact factor: 49.962

2. Exploration of biological network centralities with CentiBiN.

Authors: Björn H Junker; Dirk Koschützki; Falk Schreiber
Journal: BMC Bioinformatics Date: 2006-04-21 Impact factor: 3.169

3. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

4. Computing topological parameters of biological networks.

Authors: Yassen Assenov; Fidel Ramírez; Sven-Eric Schelhorn; Thomas Lengauer; Mario Albrecht
Journal: Bioinformatics Date: 2007-11-15 Impact factor: 6.937

5. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks.

Authors: Yu Tang; Min Li; Jianxin Wang; Yi Pan; Fang-Xiang Wu
Journal: Biosystems Date: 2014-11-15 Impact factor: 1.973

Review 6. Network medicine: a network-based approach to human disease.

Authors: Albert-László Barabási; Natali Gulbahce; Joseph Loscalzo
Journal: Nat Rev Genet Date: 2011-01 Impact factor: 53.242

7. A proteome-scale map of the human interactome network.

Authors: Thomas Rolland; Murat Taşan; Benoit Charloteaux; Samuel J Pevzner; Quan Zhong; Nidhi Sahni; Song Yi; Irma Lemmens; Celia Fontanillo; Roberto Mosca; Atanas Kamburov; Susan D Ghiassian; Xinping Yang; Lila Ghamsari; Dawit Balcha; Bridget E Begg; Pascal Braun; Marc Brehme; Martin P Broly; Anne-Ruxandra Carvunis; Dan Convery-Zupan; Roser Corominas; Jasmin Coulombe-Huntington; Elizabeth Dann; Matija Dreze; Amélie Dricot; Changyu Fan; Eric Franzosa; Fana Gebreab; Bryan J Gutierrez; Madeleine F Hardy; Mike Jin; Shuli Kang; Ruth Kiros; Guan Ning Lin; Katja Luck; Andrew MacWilliams; Jörg Menche; Ryan R Murray; Alexandre Palagi; Matthew M Poulin; Xavier Rambout; John Rasla; Patrick Reichert; Viviana Romero; Elien Ruyssinck; Julie M Sahalie; Annemarie Scholz; Akash A Shah; Amitabh Sharma; Yun Shen; Kerstin Spirohn; Stanley Tam; Alexander O Tejeda; Shelly A Trigg; Jean-Claude Twizere; Kerwin Vega; Jennifer Walsh; Michael E Cusick; Yu Xia; Albert-László Barabási; Lilia M Iakoucheva; Patrick Aloy; Javier De Las Rivas; Jan Tavernier; Michael A Calderwood; David E Hill; Tong Hao; Frederick P Roth; Marc Vidal
Journal: Cell Date: 2014-11-20 Impact factor: 41.582

1. Network Pharmacological Study on the Mechanism of Cynanchum paniculatum (Xuchangqing) in the Treatment of Bungarus multicinctus Bites.

Authors: Linsheng Zeng; Jingjing Hou; Cuihong Ge; Yanjun Li; Jianhua Gao; Congcong Zhang; Chengbin Li; Yuxiang Liu; Zhongyi Zeng
Journal: Biomed Res Int Date: 2022-07-05 Impact factor: 3.246

1 in total