| Literature DB >> 27489569 |
Jennifer Chang1, Hui-Hsien Chou1,2, Hyejin Cho1.
Abstract
BACKGROUND: Heterogeneous biological data such as sequence matches, gene expression correlations, protein-protein interactions, and biochemical pathways can be merged and analyzed via graphs, or networks. Existing software for network analysis has limited scalability to large data sets or is only accessible to software developers as libraries. In addition, the polymorphic nature of the data sets requires a more standardized method for integration and exploration.Entities:
Keywords: 3D visualization; Biological pathway analysis; Graph mathematics; Heterogeneous data integration; Systems biology
Year: 2016 PMID: 27489569 PMCID: PMC4971676 DOI: 10.1186/s13040-016-0105-5
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Comparison of graph visualization software
| Software | Code | Graph analysis features | Visualization | Limitations |
|---|---|---|---|---|
| Cytoscape | Java | · Many algorithms for systems biology | · 2D predetermined layout | · Can only merge 2 graphs at a time |
| (v. 3.2.1) | · Can add GO or KEGG attributes | · 3D predetermined layout (via plug-in) | · 6 min to load a network with 4 M links | |
| · Plug-ins available | · | but no visual afterward | ||
| Gephi | Java | · Intuitive graph statistics | · 2D and 3D layouts but graphs cannot be | · Cannot display multiple graphs on one |
| (v. 0.8.2) | · Automated graph algorithm citation | rotated in 3D | screen | |
| · Generalized for all types of graphs | · Graph layout animation helps maintain | · Limited by JVM constraints; cannot load | ||
| · Plug-ins available | mental map | a network with 4 M links | ||
| GUESS | Java | · GYTHON, a language for graph analysis | · 2D layout only | · Cannot be run on MacOS 10.9, Windows |
| · Can map information attributes to visual | · Update with user commands | 7, or Redhat Linux 6.0 | ||
| attributes | ||||
| Graphviz | C | ·No graph analysis capabilities | ·Rich set of predetermined 2D layouts | ·Not an interactive system |
| · Streamlined command line interface | · Cannot efficiently handle graphs over | |||
| 100 nodes | ||||
| Neo4j | Java | ·Graph database system | ·Relies on JSON for visualization | ·Designed as a backend to database sup- |
| (v. 2.1.7) | · Cypher graph query language | · 2D layouts only | port rather than for visualization | |
| · Queries are based on a combination of | · Have to click a node or link to see its | · Nodes are only labeled by numbers | ||
| topology and attributes | attributes on a separate panel | · The whole database is one huge graph | ||
| Tulip | C++ | · A set of C++ libraries for graph analysis | · 2D visualization | · More useful to users who program C++ |
| (v. 4.6.1) | · Can also be run as stand-alone program | · 3D is available through plug in | or python directly | |
| · Plug-ins can be created in Python | · Had some 3D layout algorithms | · More analysis than visualization features | ||
| NetworkX | Python | · Python module for graph analysis | · Must export to other software or | · Useful only as an analysis tool |
| (v. 1.6.1) | · Rich set of network algorithms | modules for visualization | ||
| Mango | C++ | · Provides general graph mathematics | · Interactive 3D layouts and controls | · Does not yet have plug-in feature |
| (v. 1.10) | · Heterogeneous graph analysis with ease | · Real-time large graph visualization | · Does not yet use GPU speedup | |
| · Takes ∼30 s to load a 4M link network | · User customizable visual attributes | · Limited set of preset layouts |
Benchmarks were performed on a 2010 Mac mini that has 8 Gb RAM and runs 64-bit MacOS X 10.9 with a 2.4 GHz Intel Core 2 Duo processor. All software were run using their default configurations
Fig. 1Mango user interface. The main window is divided into four areas: data list (left), graph canvases (middle 3D visualizations), Gel editor (bottom left), and Gel command console (bottom right). Shown in the graph canvas area are the following networks: Left column: WGCNA correlation network, KEGG biological pathway network and their combined networks; Middle column: crown-plot of the intersection network between correlation and pathway networks and extracted hub genes sub-network; and Right column: hub and in-betweener genes laid out in a bipartite graph where nodes are labeled by gene names
Fig. 2System architecture. The Mango software is made up of multiple code layers seamlessly stacked up to form the stand-alone program. The GPU speedup layer is not included in some Mango versions
Fig. 3Graph Exploration Language examples. a Graphs A and B have different node attributes. Graph C is the result of attribute merging and promotion of A and B. b Graph mathematics. Given two graphs A and B, the dotted addition A.+ B combines nodes and links from graph A and graph B. The non-dotted addition A + B combines graph A with links of Graph B whose end nodes are already contained in graph A. Graph subtraction works similarly. Graph mathematic results depend on operand order; attribute merging and promotion are handled automatically as described in the main text but are not shown in this figure
Fig. 4KEGG Connect. (Left) The KEGG Connect dialog lists currently available organisms and pathways in the KEGG database. Users can fetch multiple pathways individually or merge them into one network by checking the “Merge Fetched Pathways” box. (Middle) Mango maintains the x-y coordinates from KEGG website drawing and colors nodes red (pathway map), green (enzymes), blue (compounds), and yellow (orthologs). (Right) Corresponding KEGG website drawing for the same pathway
Summary of 4 large heterogeneous biological networks for E. coli
| Network | Nodes | Links | Node attribute(s) | Link attribute(s) |
|---|---|---|---|---|
| corr | 4,454 | 4,408,269 | gene name | WGCNA correlation weight |
| path | 2,353 | 6,703 | gene name | none |
| go | 3,764 | 2,208,090 | gene name | count and string of shared GO terms |
| ppi | 2,042 | 3,888 | gene name | source of evidence (Y2H, LIT or both) |
Unconnected nodes and duplicate links have been removed from some of the networks. In all 4 networks, nodes are identified by gene names and differ in their link attributes
Fig. 5Biological network comparisons. Link intersections among the corr, path, go and ppi networks. The intersections were worked out using Gel commands. WGCNA is the gene-to-gene correlation network corr computed from E. coli microarray data. PPI is the protein-protein interaction network ppi of E. coli. GO is the network go that connects any two E. coli genes sharing at least one gene ontology term. KEGG is the entire KEGG biological pathway network path of E. coli
Benchmarking the speed of Gel mathematics on massive graphs
| Gel operation. | Time (in seconds) | Average |
|---|---|---|
| 4 | 0.92, 0.35, 0.27, 0.60, 0.56 | 0.54 |
| 8 | 1.25, 1.15, 1.03, 1.02, 1.02 | 1.09 |
| 4 | 0.52, 0.33, 0.62, 0.33, 0.25 | 0.41 |
| 8 | 1.09, 1.28, 1.09, 1.16, 1.19 | 1.16 |
| 4 | 0.69, 0.60, 0.57, 0.31, 0.40 | 0.51 |
| 8 | 12.06, 12.09, 12.05, 12.23, 12.32 | 12.15 |
| 4 | 0.55, 0.41, 0.25, 0.26, 0.32 | 0.36 |
| 8 | 0.90, 0.85, 0.83, 0.98, 0.74 | 0.86 |
| 4 | 22.94, 23.74, 23.35, 22.98, 23.03 | 23.21 |
| 8 | 36.75, 35.33, 35.23, 35.38 | 35.67 |
|
| 7.90, 7.76, 7.85, 7.73, 7.87 | 7.82 |
|
| 0.30, 0.52, 0.45, 0.34, 0.29 | 0.38 |
The 4 M link network is the gene correlation network generated by WGCNA. The 8 K link network is the combined KEGG pathway network. Benchmarks were performed consecutively on a 2010 Mac mini that has 8 Gb and runs 64-bit MacOS X 10.10 with a 2.4 GHz Intel Core 2 Duo processor. The time to copy the networks is also listed. All operations, including the copy operation, were performed using single thread in RAM
Fig. 6Gene expression combine with KEGG. A 3D KEGG network visualization comparing the E. coli gene expression values obtained under a treatment condition and a control condition. In addition to coloring and resizing the genes (i.e., node) of the network based on expression changes related to the control, pathway links are also highlighted in green or red depending on up or down expressed genes they connect in a pathway. The highlighted links allow a whole pathway to be easily discerned as up or down regulated