| Literature DB >> 23935862 |
Abstract
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined--there are more parameters than data points--and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.Entities:
Mesh:
Year: 2013 PMID: 23935862 PMCID: PMC3720774 DOI: 10.1371/journal.pone.0068358
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Algorithm results comparison for the DREAM4 networks.
| Algorithm | Data set 1 | Data set 2 | Data set 3 | Data set 4 | Data set 5 | |
|
|
|
|
| 0.72 | 0.81 | 0.88 |
|
|
|
| 0.72 | 0.81 | 0.88 | |
|
| 0.73 | 0.64 | 0.68 |
|
| |
|
| 0.73 | 0.66 |
| 0.80 | 0.84 | |
|
|
|
| 0.36 |
| 0.49 | 0.57 |
|
|
| 0.36 |
| 0.49 | 0.57 | |
|
| 0.37 | 0.34 | 0.45 |
|
| |
|
| 0.38 |
| 0.49 | 0.46 | 0.64 |
The area under the receiver operating characteristic (AUROC) curve and area under precision-recall (AUPR) curve for each of the five data sets. Here, we included BACON without clustering in order to establish that the plain DBN algorithm is generally as good as the other two DBN algorithms. The scores for G1DBN and VBSSM were taken from [7]. The best score for each data set is shown in bold.
Results of BACON on individual DREAM4 time series.
| Time-series | Data set 1 | Data set 2 | Data set 3 | Data set 4 | Data set 5 | |
|
| 1 | 0.68 ( | 0.61 (0.61) | 0.64 (0.64) | 0.62 (0.62) | 0.57 (0.57 |
| 2 |
| 0.70 (0.70) |
| 0.77 (0.77) | 0.64 (0.64) | |
| 3 |
| 0.62 (0.62) |
|
| 0.65 (0.65) | |
| 4 |
| 0.66 (0.66) | 0.59 (0.59) |
|
| |
| 5 |
| 0.64 (0.64) | 0.59 (0.59) | 0.76 (0.76) | 0.78 (0.78) | |
|
| 1 | 0.24 ( | 0.24 (0.24) | 0.32 (0.32) | 0.19 (0.19) | 0.23 (0.23) |
| 2 |
| 0.34 (0.34) | 0.28 ( | 0.26 (0.26) | 0.32 (0.32) | |
| 3 |
| 0.21 (0.21) |
| 0.15 ( | 0.24 (0.24) | |
| 4 |
| 0.38 (0.38) | 0.21 (0.21) |
|
| |
| 5 |
| 0.20 (0.20) | 0.17 (0.17) | 0.34 (0.34) | 0.33 (0.33) |
For each of five individual time-series in each of the five data sets, the area under the receiver operating characteristic (AUROC) curve and area under precision-recall (AUPR) curve. For each time series, we give two of each score, one for BACON with clustering and one for BACON without clustering (in parentheses). The higher of the two scores appears in bold. If the two scores are identical, neither is in bold.
Results comparison: with vs without clustering.
| Higher AUPR | ||||
| with clustering | equal | without | ||
| with clustering | 7 | 0 | 2 | |
|
| equal | 0 | 15 | 0 |
| without | 0 | 0 | 1 | |
Among the five individual time-series in each of the five data sets (25 total time series), here we give a tally of how many times BACON with clustering outperformed BACON without clustering, or vice versa, or if the AUROC and AUPR scores are equal.