| Literature DB >> 26310492 |
Weiwei Yin1,2, Swetha Garimalla3, Alberto Moreno4, Mary R Galinski5, Mark P Styczynski6.
Abstract
BACKGROUND: There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist.Entities:
Mesh:
Year: 2015 PMID: 26310492 PMCID: PMC4551520 DOI: 10.1186/s12918-015-0194-7
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Representation of the topological constraints of two tree-like Bayesian networks. a The topology of the previous tree-like Bayesian network classifier (TN-BL) was constrained to three levels: a root, children of the root, and the terminal grandchildren of the root (leaf nodes). Construction of this network did not account for conditional mutual information between siblings. b The proposed tree-like Bayesian structure learning algorithm (TL-BSLA) has no constraints on maximum depth of the network and considers the mutual information and conditional mutual information between siblings when creating the network structure
Tree-like Bayesian Structure Learning Algorithm (TL-BSLA)
|
|
RootSelection subroutine of TL-BSLA
|
|
StructureLayerNode subroutine of TL-BSLA
|
|
Fig. 2TL-BSLA performs consistently better than SCA in four example systems. The true positive rate (TPR), false positive rate (FPR), and positive predictive value (PPV) are shown for four representative networks. Black lines show performance of TL-BSLA, blue lines show performance of SCA. Dashed lines represent calculations without considering the direction of connections when assessing their correctness. TL-BSLA is almost universally better than SCA, with the exception of TPR for the Asia and Alarm networks where the directionality is not accounted for in assessing correctness. In these cases, the much higher FPR of SCA outweighs its potentially better coverage of true positives, as evidenced in the superior PPV curves for TL-BSLA. For PPV, all performance metrics across all networks (directed and undirected) are statistically significant (p < 0.05, two-tailed t-test) except for the 50 and 150 sample sizes for the Asia network for the undirected case. Error bars are one standard deviation
Fig. 3Sensitivity analysis of ffc shows less significant impact than random variability. The Child network was analyzed with 100 samples, 10 times each for ffc parameter values ranging from 0.2 to 0.4. The variability induced by changing ffc (range of TPR and FPR across all parameter values) is smaller than the variability from different random datasets being used for structure learning (error bars for any given ffc value). This suggests that there is a broad optimum of ffc values and that the value used in this work is a reasonable one (and perhaps not even optimal). TPR: true positive rate; FPR: false positive rate. Error bars represent one standard deviation
Fig. 4The tree-like Bayesian Structure Learning Algorithm can select a root for structure learning in tree-like or non-tree-like networks. Roots were selected automatically for two representative networks across a range of sample size limitations: a the tree-like Child network and b the non-tree-like Alarm network. Any node ever selected as a root has a red outline, where increasing line width indicates increasing frequency of selection as a root. Nodes never selected as a root have blue outlines of fixed width. c A quantitative summary of the root nodes selected, as a function of sample size. Selection from a tree-like structure is straightforward and consistent; from a non-tree-like structure there is increased variability, but reasonable roots (excluding directionality) are typically chosen. Feature 24 was used as the root for previous Alarm network learning work. It is worth noting that selection of a better root could improve the TL-BSLA’s TPR and PPV even further
Features in the child network selected for model inclusion using a dimensional-reduction screening procedure, with node 2 (selected automatically) as the root
| Sample size | Indices of features identified as significant | Fraction of real features selected | Fraction of noisy features selected |
|---|---|---|---|
| 50 | Real features: 3,4,5,6,7,8,9,12,14,15,20 | 58 % | 2.5 % |
| Noisy features: 96,113,119,174,175 | |||
| 100 | Real features: 1,4,5,6,7,8,9,11,12,14,15,20 | 63 % | 0.5 % |
| Noisy features: 161 | |||
| 150 | Real features: 3,4,5,6,7,8,9,11,12,14,15,20 | 63 % | 2 % |
| Noisy features: 70,79,116,208 | |||
| 200 | Real features: 3,4,5,6,7,8,9,11,12,14,15,20 | 63 % | 1 % |
| Noisy features: 75,94 | |||
| 300 | Real features: 3,4,5,6,7,8,9,10,11,12,14,15,20 | 68 % | 3.5 % |
| Noisy features: 121,126,198,205,207,209,211 | |||
| 400 | Real features: 1,3,4,5,6,7,8,9,10,11,12,14,15,20 | 74 % | 2.5 % |
| Noisy features: 31,55,94,113,195 | |||
| 500 | Real features: 1,3,4,5,6,7,8,9,10,11,12,13,14,15,19,20 | 84 % | 3.5 % |
| Noisy features: 22,122,152,157,166,192,218 |
Fig. 5Tree-like Bayesian networks learned from transcriptional data of a malaria challenge experiment in Macaca mulatta. Networks were learned using blood informative transcripts [21] to focus on potential Axes of variation in the transcriptional data. a Using the ten blood informative transcripts as originally published, two branches emerge that best describe the root (selected automatically and which is from Axis 3), consisting of other genes from Axis 3 and a combination of multiple genes from Axes 2, 4, and 7. b Using the top 25 genes from each Axis to build a network based on the same root, the relationship between the Axes becomes even more evident, as both Axis 2 and Axis 7 contribute the dominant genes in parallel branches of the tree, suggesting significant but distinct mutual information with their parent and ultimately with the root. These relationships were not evident using standard multivariate and clustering analyses, and were not expected a priori based on previous descriptions of the axes of variation and the fact that the gene lists were derived from whole blood, not bone marrow aspirate, transcriptional profiling analyses