| Literature DB >> 25984725 |
Saurabh Vashishtha1, Gordon Broderick2, Travis J A Craddock3, Mary Ann Fletcher4, Nancy G Klimas4.
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.Entities:
Mesh:
Year: 2015 PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Impact of sample size, experimental noise and algorithm selection on network recovery.
| Effect | Sum Sq. | d.f. | Median Sq. | F | Null p |
|---|---|---|---|---|---|
|
| |||||
| Time points | 0.0224 | 2 | 0.0112 | 2.66 | 0.07 |
| Method | 0.0772 | 4 | 0.0193 | 4.57 |
|
| Method x Time points | 0.0313 | 8 | 0.0039 | 0.93 | 0.49 |
|
| |||||
| Time points | 0.0046 | 2 | 0.0023 | 0.377 | 0.69 |
| Method | 0.1986 | 4 | 0.0497 | 8.055 |
|
| Method x Time points | 0.0494 | 8 | 0.0062 | 1.001 | 0.44 |
Results of a two-way ANOVA for the F score obtained by applying the 5 reverse engineering methods to single time course simulations of 20 different sparse and modular 10-node biological networks sampled at 10, 25 and 50 time points, both with and without 20% experimental noise (S1 Fig)
Impact of network edge density and algorithm selection on network recovery.
| Effect | Sum Sq. | d.f. | Median Sq. | F | Null p |
|---|---|---|---|---|---|
| Method | 0.662 | 4 | 0.165 | 61.14 |
|
| Node degree/Edge density | 9.312 | 5 | 1.862 | 688.44 |
|
| Method x Node degree | 0.598 | 20 | 0.029 | 11.06 |
|
Two-way ANOVA of F score values corresponding to the recovery of random reference biological networks composed of 5,10,15, 20, 30 and 50 nodes with 40, 21,11, 8, 5 and 3% edge densities respectively, all with the similar properties. Each network was used to generate 20 simulated time course experiments, sampled at 50 time points, where 20% Gaussian noise was added mimic experimental noise (S2 Fig, S2 Table)
Fig 1Recovery of an example 10-node network.
(A) Median inference by TD-ARACNE using single time course, (B) best possible inference by TD-ARACNE using a group of 15 simulated time courses for the same network, (C) Median inference by broken stick using single time course and (D) best possible inference by broken stick using a group of 15 simulated time courses for the same network.
Fig 2Median performance calculated across subsets from 10 different 10-node simulated networks recovered using (A) Broken stick (B) Bartlett’s and (C) TSNI integral methods applied to groups of time courses
. Each network was used to generate groups of 1, 5, 10, 15 and 20 simulated time courses each sampled at 50 time points. All simulations included 20% experimental noise.
Impact of time course aggregation into subject groups.
| Effect | Sum Sq. | d.f. | Median Sq. | F | Null p |
|---|---|---|---|---|---|
| Method | 0.0726 | 2 | 0.0363 | 4.96 |
|
| Group size | 0.3306 | 4 | 0.083 | 11.3 |
|
| Method x Group size | 0.0245 | 8 | 0.003 | 0.42 | 0.91 |
Two way ANOVA of F score values corresponding to the recovery of 10 different networks of 10 nodes with similar properties of sparseness and modularity and with edge densities similar to those of biological networks (10–20%) using groups of 1, 5, 10, 15 and 20 time courses. Each time course was simulated at 50 time points and 20% Gaussian noise was added to mimic experimental noise
Fig 3Median F scores calculated on 10 different simulated 10-node networks recovered using (A) Broken stick (B) Bartlett’s and (C) TSNI integral methods on group of expression profiles
. Each network was used to generate groups of 1, 5, 10, 15 and 20 simulated time courses each sampled at 5 (blue), 10 (red) and 50 (black) time points. All simulations included 20% experimental noise.
Recovering a 10-node network from a DREAM-3 data set.
| E.coli1 (11 interactions) | |||||
|---|---|---|---|---|---|
| Method | Predicted | Correct | PPV | Recall | F score |
| Yip et. al. Noise model | 11 | 7 | 0.64 | 0.64 | 0.64 |
| Yip et al. linear/nonlinear model | 6 | 1 | 0.16 | 0.09 | 0.12 |
| Broken stick | 70 | 9 | 0.13 | 0.82 | 0.22 |
| Bartlett's method | 77 | 10 | 0.13 | 0.91 | 0.23 |
| TSNI integral | 34 | 4 | 0.11 | 0.36 | 0.17 |
|
| |||||
| Yip et. al. Noise model | 16 | 12 | 0.75 | 0.8 | 0.77 |
| Yip et al. linear/nonlinear model | 5 | 1 | 0.2 | 0.07 | 0.1 |
| Broken stick | 73 | 12 | 0.16 | 0.8 | 0.27 |
| Bartlett's method | 82 | 14 | 0.17 | 0.9 | 0.28 |
| TSNI integral | 31 | 5 | 0.16 | 0.33 | 0.22 |
|
| |||||
| Yip et. al. Noise model | 11 | 9 | 0.82 | 0.9 | 0.86 |
| Yip et al. linear/nonlinear model | 5 | 0 | 0 | 0 | 0 |
| Broken stick | 72 | 8 | 0.11 | 0.8 | 0.19 |
| Bartlett's method | 83 | 10 | 0.12 | 1 | 0.22 |
| TSNI integral | 26 | 3 | 0.11 | 0.25 | 0.15 |
|
| |||||
| Yip et. al. Noise model | 13 | 9 | 0.69 | 0.36 | 0.47 |
| Yip et al. linear/nonlinear model | 5 | 1 | 0.2 | 0.04 | 0.07 |
| Broken stick | 71 | 19 | 0.26 | 0.74 | 0.38 |
| Bartlett's method | 83 | 23 | 0.28 | 0.9 | 0.42 |
| TSNI integral | 29 | 9 | 0.33 | 0.36 | 0.34 |
|
| |||||
| Yip et. al. Noise model | 12 | 8 | 0.67 | 0.36 | 0.47 |
| Yip et al. linear/nonlinear model | 5 | 4 | 0.8 | 0.18 | 0.29 |
| Broken stick | 70 | 14 | 0.2 | 0.61 | 0.3 |
| Bartlett's method | 80 | 18 | 0.22 | 0.8 | 0.34 |
| TSNI integral | 31 | 7 | 0.23 | 0.32 | 0.27 |
Comparison of the performance obtained in inferring a 10-node network using the generic methods presented here versus the best performing methods in the DREAM 3 sub-challenge, namely the basic noise model and the combined linear/ nonlinear model.
Reconstruction of 5-node synthetic Yeast IRMA network.
| Switch on data | Switch off data | |||||
|---|---|---|---|---|---|---|
| Method | PPV | Recall | F score | PPV | Recall | F score |
| Broken stick | 0.40(0.5) | 0.67(0.67) | 0.50(0.57) | 0.67(0.67) | 0.33(0.33) | 0.44(0.44) |
| Bartlett | 0.60(0.75) | 0.50(0.5) | 0.60(0.55) | 0.56(0.56) | 0.83(0.83) | 0.67(0.67) |
| TSNI integral | 0.40(0.63) | 0.83(0.83) | 0.53(0.71) | 0.29(0.44) | 0.67(0.67) | 0.40(0.53) |
| TSNI in Cantone et al. | - (1.00) | - (0.67) | - (0.80) | - (0.75) | - (0.50) | - (0.60) |
Broken stick, Bartlett’s and TSNI integral were evaluated on the dynamic data of Yeast 5 node synthetic IRMA network against TSNI performance reported in Cantone et al. 2009 [62]. Values in parentheses show the performance when self-regulation is not considered in the inference as in DREAM 3 challenge.
Fig 4Reconstruction of human HeLa cell cycle network.
Directed graphs recovered using (i) Broken stick (ii) Bartlett’s feature selection and (iii) TSNI integral methods applied to the BIOGRID reference network reported in Sambo et al. 2008 and Lozano et al. 2009 (A and B). Solid lines represent correctly inferred interactions (true positives) where as dash lines represent incorrectly inferred connections (False positives).
Reconstruction of 9-gene BIOGRID network related to Human HeLa cell cycle.
| Network | PPV | Recall | F score |
|---|---|---|---|
|
| |||
|
| 0.36 | 0.44 | 0.40 |
| Broken stick method | 0.36 | 0.44 | 0.40 |
| Bartlett's method | 0.50 | 0.44 | 0.47 |
| TSNI Integral | 0.10 | 0.44 | 0.17 |
|
| |||
|
| 0.50 | 0.72 | 0.59 |
| Broken stick method | 0.63 | 0.50 | 0.56 |
| Bartlett's method | 0.65 | 0.55 | 0.60 |
| TSNI Integral | 0.22 | 0.30 | 0.26 |
Recovery of 9-gene BIOGRID network involved in human HeLa cell cycle by applying broken stick and Bartlett’s feature selection methods compared to TSNI integral. These two versions of BIOGRID network were used to assess proposed methods in Sambo et al. 2008 and Lozano et al. 2009 respectively. Reported performance of these methods is also included.