| Literature DB >> 28629365 |
Wenbin Guo1,2, Cristiane P G Calixto2, Nikoleta Tzioutziou2, Ping Lin3, Robbie Waugh2,4, John W S Brown2,4, Runxuan Zhang5.
Abstract
BACKGROUND: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown.Entities:
Keywords: Gene co-expression networks; Gene regulatory networks; Network method evaluation; Partial correlation; Synthetic data
Mesh:
Year: 2017 PMID: 28629365 PMCID: PMC5477119 DOI: 10.1186/s12918-017-0440-2
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Indirect edge and RLowPC network. a An indirect association from X 1 → X 2 could arise from a regulatory structure of X 1 → X 3 → X 2. b RLowPC network inference. In an RLowPC network, firstly only the top ranked edges are kept in a pre-inferred PC network and then for each pair of genes, only the immediate neighbours will be regressed for PC calculation. In this example only the top 6 of 10 edges with highest correlations are kept and PC between X 1 and X 2 is re-calculated by removing the effects from two immediate neighbouring nodes (X 3, X 5). The correlation values are represented by the thickness of the edges
Summaries of the evaluated network inference methods
| Category | Methods | Cor-based | MI-based | Ref. |
|---|---|---|---|---|
| Deal with indirect edges explicitly | RLowPC | Yes | ||
| PC | Yes | [ | ||
| PCIT | Yes | Yes | [ | |
| MRNET | Yes | [ | ||
| MRNETB | Yes | [ | ||
| ARACNE | Yes | [ | ||
| Not deal with indirect edges | Cor | Yes | ||
| CLR | Yes | [ | ||
| Random |
Nine correlation-based, MI-based and random network inference methods have been compared and evaluated in this study. The methods are classified into two main groups: Deal with indirect edges explicitly and Not deal with indirect edges
Source network structures and synthetic datasets
| Network name | TF-gene networks | Gene No. | Edge No. | Network density | Data generator | Data type | Ref. | |
|---|---|---|---|---|---|---|---|---|
| GNW100 | GNW100_1 | DREAM4 in Silico size 100 | 100 | 176 | 0.0356 | The TF-gene reference networks were subsets of source networks in GNW. In each dataset, 1/3 genes were randomly selected and perturbed. Each experiment was sampled at 21 time points. 3 replicates were generated by adding different amount of noises. The noises are simulated by GNW. All the parameter settings were defaults in GNW. | Time-series data with multifactorial perturbation | [ |
| GNW100_2 | 100 | 249 | 0.0503 | |||||
| GNW100_3 | 100 | 195 | 0.0394 | |||||
| GNW100_4 | 100 | 211 | 0.0426 | |||||
| GNW100_5 | 100 | 193 | 0.0390 | |||||
| GNW500 | GNW500_1 |
| 500 | 1365 | 0.0109 | |||
| GNW500_2 | 500 | 867 | 0.0069 | |||||
| GNW500_3 | 500 | 1107 | 0.0089 | |||||
| GNW500_4 | 500 | 947 | 0.0076 | |||||
| GNW500_5 | 500 | 1272 | 0.0102 | |||||
| GNW1000 | GNW1000_1 |
| 1000 | 2337 | 0.0047 | |||
| GNW1000_2 | 1000 | 2455 | 0.0049 | |||||
| GNW1000_3 | 1000 | 2089 | 0.0042 | |||||
| GNW1000_4 | 1000 | 2171 | 0.0043 | |||||
| GNW1000_5 | 1000 | 2249 | 0.0045 | |||||
| GNW2000 | GNW2000_1 | Yeast | 2000 | 4738 | 0.0024 | |||
| GNW2000_2 | 2000 | 4467 | 0.0022 | |||||
| GNW2000_3 | 2000 | 5055 | 0.0025 | |||||
| GNW2000_4 | 2000 | 5283 | 0.0026 | |||||
| GNW2000_5 | 2000 | 4817 | 0.0024 | |||||
| GNW3000 | GNW3000_1 | Yeast | 3000 | 7515 | 0.0017 | |||
| GNW3000_2 | 3000 | 7998 | 0.0018 | |||||
| GNW3000_3 | 3000 | 7626 | 0.0017 | |||||
| GNW3000_4 | 3000 | 8075 | 0.0018 | |||||
| GNW3000_5 | 3000 | 7333 | 0.0016 | |||||
A number of directed network structures were generated from source networks provided by GNW. The network names, gene and edge numbers for each structure are listed in the table. Network density is defined as the true edges divided by all possible edges. The network structures were used to simulate the time-series datasets using GNW
Fig. 2Comparison of pAUPR values for different methods and different network structures. Each bar in the plots represents mean of pAUPR values from the top 1000 edge predictions. Error bars represent standard error. The differences of pAUPR values between different methods were determined using a Student t-test in pairs between RLowPC and the other eight methods. P-values are shown on the top of the bars if it is less than 0.05
Fig. 3Precisions within different groups of the top 1000 predicted edges. The top 1000 predicted edges are divided into three groups, top 1–100, 101–500 and 501–1000. Each bin depicts the precision distribution of the method matched to the group and the network structures
Fig. 4Evaluation of network analysis methods within co-expression modules by WGCNA on GNW3000 networks a Barplots of average pAUPR for different methods. Error bars represent standard errors of the pAUPR values across the top 1000 predictions. A Student t-test was carried out to determine the significance of the difference of pAUPR values between RLowPC and the other eight methods. P-values are shown on the top of the bars if it is less than 0. b Box plots of precisions in different groups of top 1000 edge predictions. The means of precision within modules by WGCNA (0.0057) and before clustering using WGCNA (0.0031) are shown as red and blue dashed lines
Fig. 5Other factors that influence the precision for network inference a Boxplots on the precision of PC and RLowPC methods inferred from datasets with 1, 2, 3, 5, 10 and 20 experiments with different perturbations. b relationships between the average precisions of all network inference methods used in this study and network density. The network names shown on the plot can be found in Table 2
Average computational time of different sizes of reduction space using RLowPC
| Top weighted edges | 1500 | 2000 | 3000 | 5000 | 8000 | 10,000 | 50,000 | 100,000 |
|---|---|---|---|---|---|---|---|---|
| Time | 4.71 | 6.69 | 11.42 | 22.62 | 42.00 | 54.39 | 12.97 | 53.09 |
| Units | secs | secs | secs | secs | secs | secs | mins | mins |
The computational time is calculated based on Dell, Windows 7, 64-bit Operating system with 16.0GB RAM and Intel(R) Core (TM) i7–4790 CPU @ 3.60GHz 3.60 GHz processor