| Literature DB >> 17299415 |
Mukesh Bansal1, Vincenzo Belcastro, Alberto Ambesi-Impiombato, Diego di Bernardo.
Abstract
Inferring, or 'reverse-engineering', gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverse-engineering algorithms for which ready-to-use software was available and that had been tested on experimental data sets. We show that reverse-engineering algorithms are indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbation experiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithms for the purpose of finding regulatory interactions among genes, and, although further improvements are needed, have reached a discreet performance for being practically useful.Entities:
Mesh:
Year: 2007 PMID: 17299415 PMCID: PMC1828749 DOI: 10.1038/msb4100120
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Figure 1Flowchart to choose the most suitable network inference algorithms according to the problem to be addressed. (*): check for independence of time points (see text for details); (BN): Bayesian networks; (DBN): Dynamic Bayesian Networks.
Features of the network inference algorithms reviewed in this tutorial
| Software | Download | Data type | Command line | Notes |
|---|---|---|---|---|
| BANJO | S/D | java-jar banjo.jar setting- File=mysettings.txt | Good performance if large datasets is available (M≫N) | |
| ARACNE | S/D | arance-i inputfile-o outputfile [options] | Good performance even for M⩽N. Not useful for short time series | |
| NIR/MNI | S | MATLAB | NIR: good performance but requires knowledge of perturbed genes/MNI: good performance for inferring targets of a perturbation | |
| Hierarchical clustering | S/D | GUI | Useful for finding coexpressed genes, but not for network inference |
Abbreviations: D: dynamic time-series; N: number of genes; M: number of experiments; S: steay-state.
aPredicts only targets of a perturbation (see text for details).
Figure 2Bayesian networks: A is conditionally independent of D and E given B and C; information-theoretic networks: mutual information is 0 for statistically independent variables, and Data Processing Inequality helps pruning the network; ordinary differential equations: deterministic approach, where the rate of transcription of gene A is a function (f) of the level of its direct causal regulators.
Results of the application of network inference algorithms on the simulated data set
| ARACNE | BANJO | NIR | Clustering | Random | |||||
|---|---|---|---|---|---|---|---|---|---|
| PPV | Se | PPV | Se | PPV | Se | PPV | Se | PPV | |
| 10 × 10 | 0.37 | 0.40 | 0.41 | 0.49 | 0.34 | 0.71 | 0.40 | 0.38 | 0.36 |
| 0.25 | 0.17 | 0.18 | 0.45 | 0.20 | |||||
| 0.16 | 0.05 | 0.09 | 0.22 | 0.10 | |||||
| 10 × 100 | 0.37 | 0.44 | 0.36 | 0.70 | 0.36 | 0.36 | 0.36 | ||
| 0.20 | 0.46 | 0.20 | |||||||
| 0.09 | 0.21 | 0.10 | |||||||
| 100 × 10 | 0.19 | 0.11 | 0.19 | 0.04 | 0.18 | 0.09 | 0.19 | 0.11 | 0.19 |
| 0.10 | 0.02 | 0.10 | 0.05 | 0.10 | |||||
| 0.06 | 0.00 | 0.05 | 0.02 | 0.05 | |||||
| 100 × 100 | 0.19 | 0.17 | 0.19 | 0.19 | 0.19 | 0.11 | 0.19 | ||
| 0.10 | 0.10 | 0.10 | |||||||
| 0.05 | 0.05 | 0.05 | |||||||
| 100 × 1000 | 0.19 | 0.26 | 0.20 | 0.19 | 0.19 | 0.11 | 0.19 | ||
| 0.10 | 0.09 | 0.10 | |||||||
| 0.05 | 0.05 | 0.05 | |||||||
| 1000 × 1000 | 0.02 | 0.10 | — | — | — | — | 0.02 | 0.01 | 0.02 |
| 10 × 10 | 0.41 | 0.50 | 0.39 | 0.38 | 0.36 | ||||
| 0.25 | 0.18 | 0.20 | |||||||
| 0.15 | 0.05 | 0.10 | |||||||
| 100 × 100 | 0.19 | ||||||||
| 0.10 | |||||||||
| 0.05 | |||||||||
| 1000 × 1000 | — | — | — | — | 0.02 | ||||
| 10 × 10 | 0 | 0.39 | 0.36 | 0.35 | — | — | 0.35 | 0.33 | 0.36 |
| 0.22 | 0.21 | 0.20 | |||||||
| 0.00 | 0.00 | 0.10 | |||||||
| 10 × 100 | 0.35 | 0.43 | 0.36 | 0.29 | — | — | 0.35 | 0.33 | 0.36 |
| 0.21 | 0.16 | 0.20 | |||||||
| 0.25 | 0.00 | 0.10 | |||||||
| 100 × 10 | 0.19 | 0.10 | 0.18 | 0.08 | — | — | 0.19 | 0.12 | 0.19 |
| 0.10 | 0.04 | 0.10 | |||||||
| 0.06 | 0.00 | 0.05 | |||||||
| 100 × 100 | 0.19 | 0.15 | 0.19 | 0.05 | — | — | 0.19 | 0.11 | 0.19 |
| 0.10 | 0.02 | 0.10 | |||||||
| 0.04 | 0.00 | 0.05 | |||||||
| 100 × 1000 | 0.19 | 0.19 | 0.19 | 0.04 | — | — | 0.19 | 0.11 | 0.19 |
| 0.10 | 0.02 | 0.10 | |||||||
| 0.05 | 0.00 | 0.05 | |||||||
| 1000 × 1000 | 0.02 | 0.10 | — | — | — | — | 0.02 | 0.01 | 0.02 |
Abbreviations: PPV: positive predicted value; Se: sensitivity.
In bold are the algorithms that perform significantly better than random, using as a random model a Binomial distribution.
Experimental data sets used as examples
| ID | Cell/organism | Type | Samples | Genes | Reference | True network |
|---|---|---|---|---|---|---|
| A | HumanBcells | S | 254 | 7907 | ( | Twenty-six Myc targets ( |
| B | S | 300 | 6312 | ( | Eight hundred and forty-four TF–gene interactions ( | |
| C | HumanBcells | S | 254 | 23 | ( | 11 Myc targets+11 non-targets ( |
| D | S | 300 | 90 | ( | Subset of TF–gene interactions ( | |
| E | S | 9 | 9 | ( | Nine-gene network ( | |
| F | T | 6 | 9 | Nine-gene network ( |
Abbreviations: S: steady-state; T: time-series.
Results of the application of network inference algorithms on the experiment data sets
| Data sets | ARACNE | BANJO | NIR | Clustering | Random | ||||
|---|---|---|---|---|---|---|---|---|---|
| PPV | Se | PPV | Se | PPV | Se | PPV | se | PPV | |
| A | — | — | — | — | 0.02 | 1.00 | 0.00 | ||
| B | 0.00 | 0.01 | — | — | — | — | 0.00 | 0.21 | 0.00 |
| C | 0.60 | 0.27 | — | — | 0.45 | 0.91 | 0.48 | ||
| D | — | — | |||||||
| E | 0.69 | 0.34 | 0.78 | 0.44 | 0.8 | 0.63 | 0.71 | ||
| 0.67 | 0.24 | 0.63 | |||||||
| 0.50 | 0.02 | 0.32 | |||||||
| F | 0.75 | 0.37 | 0.73 | 0.69 | — | — | 0.71 | ||
| 0.61 | 0.39 | 0.63 | |||||||
| 0.00 | 0.00 | 0.32 | |||||||
Abbreviations: PPV: positive predicted value; Se: sensitivity.
In bold are the algorithms that perform significantly better than random (P-value⩽0.1) using as a random model a Binomial distribution.