| Literature DB >> 27490697 |
Juli Petereit1, Sebastian Smith2, Frederick C Harris2, Karen A Schlauch2.
Abstract
BACKGROUND: Networks provide effective models to study complex biological systems, such as gene and protein interaction networks. With the advent of new sequencing technologies, many life scientists are grasping for user-friendly methods and tools to examine biological components at the whole-systems level. Gene co-expression network analysis approaches are frequently used to successfully associate genes with biological processes and demonstrate great potential to gain further insights into the functionality of genes, thus becoming a standard approach in Systems Biology. Here the objective is to construct biologically meaningful and statistically strong co-expression networks, the identification of research dependent subnetworks, and the presentation of self-contained results.Entities:
Keywords: Parameter-free algorithm; R; Scale-free; Small-world; Whole omics-approach
Mesh:
Year: 2016 PMID: 27490697 PMCID: PMC4977474 DOI: 10.1186/s12918-016-0298-8
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Sample network graph. Blue vertices (genes) are connected by an edge if a pre-defined association between vertices pairs is determined. A group of yellow vertices are highlighted, the genes corresponding to the yellow vertices have very similar expression profiles over 28 measurements
Fig. 2petal’s workflow. Illustration of computational pipeline implemented in R; grey coloured rectangles indicate data; white rectangles specify code; output files are writing in Courier New
Fig. 3petal’s Histogram and Q-Q plot. Presentation of the output of graphHistQQFromFile
Sample sorted measure table used to define threshold list
| Index | Gene1 | Gene2 | Measure value |
|---|---|---|---|
| 1 |
|
| 1 |
| 2 |
|
| 0.98 |
| ⋮ | ⋮ | ⋮ | ⋮ |
|
|
|
|
|
| ⋮ | ⋮ | ⋮ | ⋮ |
| 150× |
|
|
|
| ⋮ | ⋮ | ⋮ | ⋮ |
|
|
|
| -1 |
Sorted measure table for correlation values ranging between [−1,1], value of 1 represents the strongest correlation. m is the number of genes; p,q,r,s,t,u,w are values within [ 1,m]; t and t are the first and last threshold values of the threshold list, respectively
Sample network threshold table
| thresh |
| slope/ | mean | mean | %used | %bigComp |
|---|---|---|---|---|---|---|
| power | CC | Path | ||||
| 0.878 | 0.94 | -2.02 | 0.36 | 6.74 | 21 | 32 |
| 0.834 | 0.93 | -1.75 | 0.38 | 7.71 | 46 | 91 |
| 0.789 | 0.91 | -1.58 | 0.40 | 5.70 | 68 | 97 |
| 0.745 | 0.87 | -1.42 | 0.41 | 4.62 | 82 | 99 |
| 0.700 | 0.84 | -1.26 | 0.42 | 3.91 | 91 | 99 |
| 0.656 | 0.78 | -1.09 | 0.43 | 3.40 | 95 | 99 |
| 0.611 | 0.70 | -0.92 | 0.44 | 3.02 | 98 | 99 |
Each row represents a network model based on the threshold in Column 1, R 2 and slope/power are are used to determine scale-free, meanCC and meanPath are used to conclude small-word, %used indicates how much of the original dataset is maintained, and %bigComp gives the percentage of vertices that are in the biggest component of the network model
Sample vicinity network table
| VN | VNsize | numGoI | density |
|---|---|---|---|
| 1 | 2 | 1 | 1.00 |
| 2 | 36 | 1 | 0.53 |
| 3 | 8 | 1 | 0.82 |
| 4 | 20 | 1 | 0.52 |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 52 | 24 | 2 | 0.72 |
| 53 | 63 | 2 | 0.68 |
| 54 | 17 | 3 | 0.86 |
| 55 | 10 | 4 | 0.93 |
| 56 | 27 | 5 | 0.89 |
Each row represents a particular vicinity network (VN). Column 1 shows the index of the VN, VNsize gives the number of vertices within the VN, numGoI is the number of genes of interest within the VN, and density indicates how well the VN is intra-connected
Empirical evaluation of petal’s runtime and memory requirement
| Dimension of dataset | Metric | Runtime [hour] | Max memory [GB] |
|---|---|---|---|
| genes × measures | |||
| 5,000×7 | PE | 1.35 | 1.0 |
| 5,000×7 | SP | 2.07 | 1.0 |
| 11,342×16 | PE | 2.62 | 7.0 |
| 11,342×16 | SP | 4.42 | 7.5 |
| 15,137×12 | SP | 9.20 | 13.5 |
| 15,137×16 | SP | 9.13 | 15.0 |
| 15,137×28 | SP | 9.15 | 13.0 |
Each row is a separate run on a server with 2.5 GHz processors, of which petal used one and 256 GB RAM. Datasets of different sizes were supplied to createSWSFnet FromFile to monitor the runtime and memory usage of the function. In two runs PE was specified as the metric to demonstrate its fast computing time compare to SP: createSWSFnetFromFile(~myData.txt~, ~PE~)
NetworkStats.txt obtained from petal for the mountain pine beetle dataset
| thresh |
| slope/ | mean | mean | %used | %bigComp |
|---|---|---|---|---|---|---|
| power | CC | Path | ||||
| 0.956 | 0.84 | -1.71 | 0.44 | 6.89 | 21 | 22 |
| 0.919 | 0.90 | -1.62 | 0.37 | 11.13 | 50 | 85 |
| 0.882 | 0.89 | -1.45 | 0.38 | 7.19 | 72 | 94 |
| 0.845 | 0.86 | -1.24 | 0.40 | 5.66 | 86 | 97 |
| 0.808 | 0.82 | -1.05 | 0.42 | 4.71 | 94 | 99 |
| 0.771 | 0.77 | -0.93 | 0.44 | 4.04 | 98 | 100 |
| 0.734 | 0.71 | -0.85 | 0.47 | 3.55 | 99 | 100 |
Network parameters for each considered network model. Here, 0.808 constructs the ‘best’ network
Fig. 4Subnetwork and grouped gene expression profiles. The subnetwork represents the 28 genes of interest extracted from a genomic network of the mountain pine beetle. Purple gene expression profiles are the intersection of VN11 and VN12. Orange gene expression profiles are the intersection of VN6 and VN13