| Literature DB >> 26823190 |
Séverine Affeldt1,2, Louis Verny1,2, Hervé Isambert3,4.
Abstract
BACKGROUND: The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking of alternative Bayesian networks, or rely on the identification of structural independencies, which correspond to missing edges in the underlying network. Bayesian inference methods typically require heuristic search strategies, such as hill-climbing algorithms, to sample the super-exponential space of possible networks. By contrast, constraint-based methods, such as the PC and IC algorithms, are expected to run in polynomial time on sparse underlying graphs, provided that a correct list of conditional independencies is available. Yet, in practice, conditional independencies need to be ascertained from the available observational data, based on adjustable statistical significance levels, and are not robust to sampling noise from finite datasets.Entities:
Mesh:
Year: 2016 PMID: 26823190 PMCID: PMC4959376 DOI: 10.1186/s12859-015-0856-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Inference of v-structures versus non-v-structures by 3-point information from observational data. a Isolated v-structures are predicted for I(x;y;z) < 0, and (b–d) isolated non-v-structures for I(x;y;z) > 0. e Generalized v-structures are predicted for I(x;y;z|{u }) < 0 and (f–h) generalized non-v-structures for I(x;y;z|{u }) > 0. In addition, as I(x;y;z|{u }) are invariant upon xyz permutations, the global orientation of v-structures and non-v-structures also requires to find the most likely base of the xyz triple. Choosing the base xy with the lowest conditional mutual information, i.e., I(x;y|{u })= minxyz(I(s;t|{u })), is found to be consistent with the Data Processing Inequality expected for (generalized) non-v-structures in the limit of infinite dataset, see main text. In practice, given a finite dataset, the inference of (generalized) v-structures versus non-v-structures can be obtained by replacing 3-point and 2-point information terms I(x;y|{u }) and I(x;y;z|{u }) by shifted equivalents, I ′(x;y|{u }) and I ′(x;y;z|{u }), including finite size corrections, see text (Eqs. 23 & 24)
Fig. 2CHILD network. [20 nodes, 25 links, 230 parameters, Average degree 2.5, Maximum in-degree 2]. Precision, Recall and F-score for skeletons (dashed lines) and CPDAGs (solid lines). The results are given for Aracne (black), PC (blue), Bayesian Hill-Climbing (green) and 3off2 (red)
Fig. 3ALARM network. [37 nodes, 46 links, 509 parameters, Average degree 2.49, Maximum in-degree 4]. Precision, Recall and F-score for skeletons (dashed lines) and CPDAGs (solid lines). The results are given for Aracne (black), PC (blue), Bayesian Hill-Climbing (green) and 3off2 (red)
Fig. 4INSURANCE network. [27 nodes, 52 links, 984 parameters, Average degree 3.85, Maximum in-degree 3]. Precision, Recall and F-score for skeletons (dashed lines) and CPDAGs (solid lines). The results are given for Aracne (black), PC (blue), Bayesian Hill-Climbing (green) and 3off2 (red)
Fig. 5BARLEY network. [48 nodes, 84 links, 114,005 parameters, Average degree 3.5, Maximum in-degree 4]. Precision, Recall and F-score for skeletons (dashed lines) and CPDAGs (solidlines). The results are given for Aracne (black), PC (blue), Bayesian Hill-Climbing (green) and 3off2 (red)
Fig. 6HEPAR II network. [70 nodes, 123 links, 1,453 parameters, Average degree 3.51, Maximum in-degree 6]. Precision, Recall and F-score for skeletons (dashed lines) and CPDAGs (solid lines). The results are given for Aracne (black), PC (blue), Bayesian Hill-Climbing (green) and 3off2 (red)
Fig. 7Hematopoietic subnetwork reconstructed by 3off2. The dataset [36] concerns 18 transcription factors, 597 single cells, 5 different hematopoietic progenitor types. Red and blue edges correspond to experimentally proven activations and repressions, respectively as reported in the literature (Table 1), while grey links indicate regulatory interactions for which no clear evidence has been established so far. Thinner arrows underline 3off2 misorientations
Interactions reconstructed by 3off2 and alternative methods for a subnetwork of hematopoiesis regulation. → indicates a successfully recovered interaction including its direction as reported in the literature (see References). corresponds to a successfully recovered interaction, however, with an opposite direction as reported in the literature. ⌿ stipulates that no direct regulatory interaction has been inferred, while — corresponds to an undirected link. Note in particular that Aracne does not infer edge direction. See Additional file 1: Table S1 for supplementary statistics
| 11 known Regulatory | 3off2 | PC | PC | MMHC | MMHC | Bayes hc | Bayes hc | Aracne | |
|---|---|---|---|---|---|---|---|---|---|
| interactions | References |
|
|
|
|
|
|
|
|
| Gata2 → Gfi1b | [ | → |
| — | ⌿ | ⌿ | → | ⌿ | ⌿ |
| Gfi1 → Gata2 | [ | → | → | — | → |
| → |
| — |
| Gfi1b | [ |
|
| — |
|
|
|
| — |
| Gfi1 → PU.1 | [ | → | → | ⌿ | ⌿ | ⌿ | → | → | — |
| Lyl1 → Gfi1 | [ | → |
| ⌿ | ⌿ | ⌿ | → |
| — |
| Ldb1 → Meis1 | [ | → | ⌿ | ⌿ | ⌿ | ⌿ |
| ⌿ | ⌿ |
| Ldb1 → Lyl1 | [ | → | ⌿ | ⌿ | ⌿ | ⌿ | ⌿ | ⌿ | ⌿ |
| Erg → Lyl1 | [ | → |
| — | → | → | → |
| — |
| Gata2 → Scl | [ |
| → | — | → | → | → | → | — |
| Gfi1b → Meis1 | [ |
|
| — | → | → | → | → | — |
| Gata1 → Gata2 | [ |
|
| — | → | → | → | → | — |
| Correct edges (out of | (→/ |
|
|
|
|
|
|
|
|
| - Correct orientations | (→) | 7 | 3 | 0 | 5 | 4 | 8 | 4 | 0 |
| - Mis/non-orientations | ( | 4 | 6 | 7 | 1 | 2 | 2 | 4 | 8 |
| Missing links | (⌿) |
|
|
|
|
|
|
|
|