| Literature DB >> 27542402 |
Peng Zhang, Lin Tao, Xian Zeng, Chu Qin, Shangying Chen, Feng Zhu, Zerong Li, Yuyang Jiang, Weiping Chen, Yu-Zong Chen.
Abstract
The genetic, proteomic, disease and pharmacological studies have generated rich data in protein interaction, disease regulation and drug activities useful for systems-level study of the biological, disease and drug therapeutic processes. These studies are facilitated by the established and the emerging computational methods. More recently, the network descriptors developed in other disciplines have become more increasingly used for studying the protein-protein, gene regulation, metabolic, disease networks. There is an inadequate coverage of these useful network features in the public web servers. We therefore introduced upto 313 literature-reported network descriptors in PROFEAT web server, for describing the topological, connectivity and complexity characteristics of undirected unweighted (uniform binding constants and molecular levels), undirected edge-weighted (varying binding constants), undirected node-weighted (varying molecular levels), undirected edge-node-weighted (varying binding constants and molecular levels) and directed unweighted (oriented process) networks. The usefulness of the PROFEAT computed network descriptors is illustrated by their literature-reported applications in studying the protein-protein, gene regulatory, gene co-expression, protein-drug and metabolic networks. PROFEAT is accessible free of charge at http://bidd2.nus.edu.sg/cgi-bin/profeat2016/main.cgi.Entities:
Keywords: biological network; network descriptor; network feature; web server
Mesh:
Substances:
Year: 2017 PMID: 27542402 PMCID: PMC5862332 DOI: 10.1093/bib/bbw071
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
List of the network descriptors provided by the existing publically accessible tools that do not require programing skill
| Cytoscape (23) [ | Degree, in/out-degree, number of self-loops, clustering coefficient, topological coefficient, neighborhood connectivity, avg shortest path length, eccentricity, radiality, closeness centrality, betweenness centrality, stress | Number of nodes/edges/self-loops, density, diameter, radius, centralization, heterogeneity, avg number of neighbors, characteristic path length |
| NAViGaTOR (13) [ | Clustering coefficient, degree centrality, betweenness centrality | Number of nodes/edges, density, min/avg/max degree, diameter, avg clustering coefficient, characteristic path length |
| Gephi (10) [ | Degree, clustering coefficient, betweenness centrality, closeness centrality, eigenvector centrality, PageRank centrality, HITS | Diameter, density, avg clustering coefficient, avg shortest path length |
| VANESA (10) [ | Degree, avg/max shortest path length | Min/avg/max degree, avg shortest path length, density, centralization, clustering coefficient |
| Pajek (9) [ | Degree, avg shortest path length, degree centrality, closeness centrality, betweenness centrality | Diameter, degree centralization, closeness centralization, betweenness centralization |
| SpectralNET (9) [ | Degree, clustering coefficient, min/avg/max shortest path length | Number of nodes, diameter, avg clustering coefficient, avg shortest path length |
| PINA (8) [ | Degree, shortest path length, clustering coefficient, closeness centrality, betweenness centrality, degree centrality, eigenvector centrality | Diameter |
| Hubba (6) [ | Degree, bottleneck, subgraph centrality, edge percolation component, max neighborhood component, density of max neighborhood | N.A. |
| GraphWeb (4) [ | Betweenness centrality | Number of nodes/edges, density |
| tYNA (4) [ | Degree, clustering coefficient, eccentricity, betweenness centrality | N.A. |
| VisANT (3) [ | Degree, shortest path length, clustering coefficient | N.A. |
List of the node-level descriptors provided by PROFEAT and their typical applications in systems biology studies
| Connectivity to immediate neighbors | Degree, scaled connectivity, number of self-loops, number of triangles, Z score | Clustering coefficient used to illustrate the hierarchical architecture of metabolism [ |
| Connectivity to next immediate neighbors | Clustering coefficient, neighborhood connectivity, topological coefficient, interconnectivity, bridging coefficient | |
| Distance relationships to all other nodes | Average shortest path length, distance sum, eccentricity, eccentric, deviation, distance deviation, radiality | Eccentricity and distance deviation used to prioritize the metabolic biomarkers in obesity [ |
| Centrality measure based on distance to all other nodes | Closeness centrality (avg, sum) eccentricity centrality, harmonic centrality, residual centrality | Betweenness centrality, degree centrality, bridging centrality and other centrality measures used to expose the relationship between network topology and system function of proteins [ |
| Centrality measure based on shortest paths passing thru the studied node | Stress centrality, betweenness centrality, normalized betweenness, bridging centrality | |
| Centrality measure based on degree or/and neighbors’ centrality | Degree centrality, page rank centrality, eigenvector centrality | PageRank centrality used to identify protein target in metabolic networks [ |
| Edge-weighted descriptor | Strength, assortativity, disparity, geometric mean of triangles, edge-weighted local clustering coeff (Barrat's, Onnela's, Zhang's, Holme's) | The strength of the associations between genes was used as the edge weight in gene co-expression analysis [ |
| Node-weighted descriptor | Node weight, node-weighted cross degree, node-weighted local clustering coeff. | |
| Directed and unweighted descriptor | In-degree, out-degree, directed local clustering coefficient, neighborhood connectivity (only-in), neighborhood connectivity (only-out), neighborhood connectivity (in and out), average directed neighbor degree | In/out-degree have been applied for five directed biological networks, to identify and ranks the regulators in the networks [ |
List of the network-level descriptors provided by PROFEAT and their typical applications in systems biology studies
| Global connectivity profiles | Number of nodes/edges/self-loops, max/min connectivity, avg number of neighbors, total adjacency, network density, average clustering coefficient, transitivity, heterogeneity, degree centralization, central point dominance | Density, heterogeneity, degree centralization and global clustering coefficient used to compare and study the PPI networks between drosophila and yeast [ |
| Network measure based on shortest paths | Total distance, diameter, radius, shape coefficient, characterisitc path length, network eccentricity, avg eccentricity, network eccentric, eccentric connectivity, unipolarity, integration, variation, avg distance, mean distance deviation, centralization, global efficiency | Characteristic path length and global efficiency used to describe the brain neuro-connectivity network [ |
| Topological index based on connectivity | Edge complexity index, randic connectivity index, atom-bond connectivity index, Zagreb index (1, 2, modified, augmented, variable), Narumi index, Narumi geometric index, Narumi harmonic index, alpha index, beta index, pi index, eta index, hierarchy, robustness, medium articulation | Randic connectivity index and Zagreb indices applied to access the complexity in chemistry and biology [ |
| Topological index based on shortest paths | Complexity index (A, B), Wiener index, hyper-wiener, Harary index (1, 2), Compactness index, Superpendentic index, Hyper-distance-path index, BalabanJ index, BalabanJ-like indices (1, 2, 3), Geometric arithmetic indices (1, 2, 3), product of row sums, Topological index (Schultz, Gutman), Szeged index, efficiency complexity | Wiener index, BalabanJ index and Graph complexity index used to access the complexity in chemistry and biology [ |
| Entropy-based complexity | Shannon’s entropy-derived information content of (degree equality/edge equality/edge magnitude/distance degree/distance degree equality), radial centric information index, distance degree compactness, distance degree centric index, graph distance complexity, information layer index, Bonchev information index (1, 2, 3), Balaban-like information index (1, 2) | Radial centric information index used to classify the metabolic networks from three domains of life [ |
| Eigenvalue-based complexity | Graph energy, laplacian energy, spectral radius, Estrada index, Laplacian Estrada index, Quasi-Weiner index, Mohar index (1, 2), graph index complexity, 50 Dehmer’s eigenvalue properties based on matrices of (adjacency/Laplacian/distance/distance path/augmented vertex degree/extended adjacency/vertex connectivity/random walk Markov/weighted structural function 1/weighted structural function 2) | Dehmer proposed a set of 50 eigenvalue-based descriptors, which possess high discriminative power to capture structural information of graphs, to predict biological and pharmacological properties [ |
| Edge-weighted descriptors | Weighted transitivity, edge-weighted global clustering coeff (Barrat's, Onnela's, Zhang's, Holme's) | Weighted transitivity used to describe the brain neuro-connectivity network [ |
| Node-weighted descriptor | Total node weight, node-weighted global clustering coeff | |
| Directed and unweighted descriptor | In-degree (max, avg, min), out-degree (max, avg, min), directed global clustering coefficient | In/out-degree applied for directed biological networks, to identify and rank the regulators in the networks [ |
The number of network descriptors, the list of network types and visualization features of PROFEAT and other publically accessible tools
| PROFEAT | up to 313 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | x |
| NetworkX [33] | ∼100 | ✓ | ✓ | x | x | ✓ | x | ✓ | x |
| igraph [34] | ∼100 | ✓ | ✓ | x | x | ✓ | x | ✓ | x |
| QuACN [35] | ∼100 | ✓ | x | x | x | x | x | ✓ | x |
| Cytoscape [22] | ∼23 | ✓ | x | x | x | ✓ | x | x | ✓ |
| NAViGaTOR [23] | ∼13 | ✓ | ✓ | x | x | x | x | x | ✓ |
| Gephi [24] | ∼10 | ✓ | x | x | x | x | x | x | ✓ |
| VANESA [25] | ∼10 | ✓ | ✓ | x | x | ✓ | x | x | ✓ |
| Pajek [26] | ∼9 | ✓ | ✓ | x | x | x | x | x | ✓ |
| SpectralNET [27] | ∼9 | ✓ | ✓ | ✓ | ✓ | x | x | x | ✓ |
| PINA [28] | ∼8 | ✓ | x | x | x | x | x | x | ✓ |
| Hubba [29] | ∼6 | ✓ | ✓ | x | x | x | x | x | ✓ |
| GraphWeb [30] | ∼4 | ✓ | ✓ | x | x | ✓ | x | x | ✓ |
| tYNA [31] | ∼4 | ✓ | x | x | x | ✓ | x | x | ✓ |
| VisANT [32] | ∼3 | ✓ | x | x | x | x | x | x | ✓ |
Figure 1Graphic illustration of the network descriptors degree, triangle, clustering coefficient, closeness centrality and betweenness centrality in a hypothetic network. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 2The computational flowchart of PROFEAT network descriptors, where ‘node:’ gives the number of node-level descriptors and ‘net:’ gives the number of network-level descriptors. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 3The input and output of a sample undirected unweighted network, where (A, B … K) are the labels of individual nodes.
CPU time in computing the complete set of network descriptors for 30 GO biological process-specific human PPI networks of five different network types
| GO:0002376 | Immune system process | 52 | 51 | 0.006 | 0.010 | 0.007 | 0.011 | 0.004 |
| GO:0008219 | Cell death | 76 | 81 | 0.009 | 0.020 | 0.011 | 0.022 | 0.004 |
| GO:0030705 | Cytoskeleton intracellular transport | 107 | 123 | 0.014 | 0.044 | 0.019 | 0.048 | 0.005 |
| GO:0006091 | Generation of metabolites and energy | 125 | 134 | 0.022 | 0.070 | 0.030 | 0.077 | 0.005 |
| GO:0006259 | DNA metabolic process | 141 | 152 | 0.025 | 0.093 | 0.035 | 0.104 | 0.005 |
| GO:0006913 | Nucleocytoplasmic transport | 197 | 219 | 0.055 | 0.236 | 0.084 | 0.267 | 0.005 |
| GO:0048646 | Anatomical structure formation | 242 | 246 | 0.092 | 0.423 | 0.147 | 0.446 | 0.006 |
| GO:0006629 | Lipid metabolic process | 271 | 353 | 0.130 | 0.602 | 0.205 | 0.666 | 0.006 |
| GO:0000902 | Cell morphogenesis | 347 | 377 | 0.288 | 1.25 | 0.402 | 1.42 | 0.007 |
| GO:0009790 | Embryo development | 390 | 415 | 0.357 | 1.74 | 0.583 | 1.97 | 0.007 |
| GO:0007005 | Mitochondrion organization | 435 | 513 | 0.456 | 2.46 | 0.768 | 2.68 | 0.008 |
| GO:0048870 | Cell motility | 461 | 534 | 0.545 | 2.86 | 0.915 | 3.01 | 0.009 |
| GO:0006397 | mRNA processing | 489 | 681 | 0.679 | 3.66 | 1.12 | 3.93 | 0.009 |
| GO:0016192 | Vesicle-mediated transport | 494 | 630 | 0.670 | 3.49 | 1.13 | 3.75 | 0.009 |
| GO:0034641 | Cellular nitrogen metabolic process | 563 | 756 | 1.01 | 5.44 | 1.70 | 5.57 | 0.011 |
| GO:0006950 | Response to stress | 590 | 772 | 1.16 | 6.23 | 1.92 | 6.82 | 0.012 |
| GO:0007010 | Cytoskeleton organization | 606 | 799 | 1.23 | 6.70 | 2.09 | 7.53 | 0.012 |
| GO:0006464 | Cellular protein modification process | 627 | 751 | 1.40 | 7.60 | 2.32 | 8.24 | 0.013 |
| GO:0006605 | Protein targeting | 642 | 860 | 1.49 | 8.22 | 2.47 | 9.07 | 0.014 |
| GO:0006457 | Protein folding | 670 | 842 | 1.69 | 8.94 | 2.82 | 9.80 | 0.013 |
| GO:0006412 | Translation | 772 | 996 | 2.52 | 13.77 | 4.27 | 15.47 | 0.017 |
| GO:0006914 | Autophagy | 825 | 1001 | 2.94 | 16.33 | 5.03 | 18.71 | 0.018 |
| GO:0006810 | Transport | 872 | 1089 | 3.41 | 19.55 | 6.01 | 21.92 | 0.020 |
| GO:0005975 | Carbohydrate metabolic process | 1014 | 1329 | 5.44 | 30.78 | 9.41 | 35.32 | 0.026 |
| GO:0007267 | Cell–cell signaling | 1202 | 1737 | 8.99 | 50.74 | 15.52 | 56.93 | 0.033 |
| GO:0007049 | Cell cycle | 1513 | 2262 | 17.77 | 102.71 | 30.99 | 117.12 | 0.051 |
| GO:0007568 | Aging | 1692 | 2637 | 24.92 | 144.16 | 43.78 | 158.62 | 0.062 |
| GO:0030154 | Cell differentiation | 1752 | 2742 | 27.82 | 163.10 | 48.25 | 178.31 | 0.068 |
| GO:0007155 | Cell adhesion | 1865 | 3356 | 34.26 | 194.14 | 58.84 | 216.88 | 0.076 |
| GO:0008283 | Cell proliferation | 2616 | 4664 | 78.80 | 431.63 | 160.43 | 491.17 | 0.146 |
Figure 4CPU time (mins) in computing the complete set of network descriptors for the networks described in Table 5 with respect to the number of nodes (left) and the number of edges (right). A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Comparison of the computed network descriptor values and the job execution time for a PPI network GO:0030705 cytoskeleton intracellular transport (107 nodes and 123 edges) by PROFEAT and other popular tools NetworkX, Cytoscape and Gephi
| Computed network descriptor value | ||||
| Degree | 42 (TUBB) | 42 (TUBB) | 42 (TUBB) | 42 (TUBB) |
| Number of triangle | 4 (KIF5A) | 4 (KIF5A) | N.A. | 4 (KIF5A) |
| Local clustering coefficient | 1 (SDC3) | 1 (SDC3) | 1 (SDC3) | 1 (SDC3) |
| Closeness centrality | 0.486 (TUBB) | 0.485 (TUBB) | 0.485 (TUBB) | 0.484 (TUBB) |
| Betweenness centrality | 0.816 (TUBB) | 0.816 (TUBB) | 0.816 (TUBB) | 0.816 (TUBB) |
| Global clustering coefficient | 0.025 | 0.025 | 0.025 | 0.110 |
| Connectivity centralization | 0.382 | N.A. | 0.382 | N.A. |
| Heterogeneity | 2.046 | N.A. | 2.045 | N.A. |
| ∼5 s | ∼5 s | ∼30 s | ∼30 s | |
The first five descriptors are node-level properties, where the maximum values and the corresponding node’s gene names are given, like ‘max. Value (gene name)’. The next three descriptors are network-level properties, globally describing the entire network.