| Literature DB >> 22898396 |
Kenneth Lo1, Adrian E Raftery, Kenneth M Dombek, Jun Zhu, Eric E Schadt, Roger E Bumgarner, Ka Yee Yeung.
Abstract
BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22898396 PMCID: PMC3465231 DOI: 10.1186/1752-0509-6-101
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Overview of iBMA-prior with a highlight of our main contributions.
Different regression-based methods applied to the time-series gene expression data to construct gene regulatory networks
| iBMA-prior | Gene expression + external data | Our proposed methodology that incorporates prior model probabilities in BMA. These prior probabilities were computed using external data sources. |
| iBMA-shortlist | Gene expression + external data | Iterative BMA that uses external knowledge to shortlist |
| Network A from Yeung et al. [ | Gene expression + external data | This method is the same as in iBMA-shortlist, but using the old version of supervised step described in Yeung et al. [ |
| LASSO-shortlist | Gene expression + external data | LASSO [ |
| LAR-shortlist | Gene expression + external data | LAR [ |
| iBMA-size | Gene expression data only | A simplified version of iBMA-prior that disregards external knowledge, except for setting |
| iBMA-noprior | Gene expression data only | Iterative BMA without any use of external knowledge. |
| LASSO-noprior | Gene expression data only | LASSO without any use of external knowledge. |
| LAR-noprior | Gene expression data only | LAR without any use of external knowledge. |
Summary of the assessment result for different network construction methods on the time-series gene expression data
| iBMA-prior | Gene expression + external data | 21951 | <1.00E-320 | 18.00 | 19282 | 593 | 4.11 |
| iBMA-shortlist | Gene expression + external data | 67440 | <1.00E-320 | 12.78 | 24673 | 1287 | 2.92 |
| Network A from Yeung et al. | Gene expression + external data | 65122 | 1.68E-111 | 9.98 | 22485 | 662 | 2.28 |
| LASSO-shortlist | Gene expression + external data | 255293 | <1.00E-320 | 11.07 | 46482 | 4169 | 2.53 |
| LAR-shortlist | Gene expression + external data | 242495 | <1.00E-320 | 11.28 | 44765 | 4017 | 2.57 |
| iBMA-size | Gene expression data only | 17202 | 5.75E-56 | 16.84 | 17622 | 114 | 3.84 |
| iBMA-noprior | Gene expression data only | 63026 | 1.75E-23 | 8.85 | 18903 | 186 | 2.02 |
| LASSO-noprior | Gene expression data only | 564321 | 2.56E-10 | 5.20 | 38399 | 1231 | 1.19 |
| LAR-noprior | Gene expression data only | 194687 | 1.38E-40 | 7.71 | 22777 | 511 | 1.76 |
The p-value of Pearson’s chi-square test measures the strength of association between an inferred network and the Yeastract database.
True positive rate (TPR) is defined as the proportion of inferred regulatory relationships that are documented in Yeastract.
The number of misclassified cases is the sum of false positives and false negatives.
The O/E ratio is the number of folds the observed number of recovered relationships (i.e., TP) in excess of the expected count of recovery by chance.
Number of transcription factors with gene sets containing their known binding sites enriched by the different methods in comparison
| iBMA-prior | Gene expression + external data | 38 |
| iBMA-shortlist | Gene expression + external data | 30 |
| LASSO-shortlist | Gene expression + external data | 41 |
| LAR-shortlist | Gene expression + external data | 44 |
| iBMA-size | Gene expression data only | 4 |
| iBMA-noprior | Gene expression data only | 9 |
| LASSO-noprior | Gene expression data only | 13 |
| LAR-noprior | Gene expression data only | 10 |
FDR was controlled at 10%.
Comparison of iBMA-prior, iBMA-shortlist and Lirnet in network construction on the Brem data
| iBMA-prior | 8000 | 7.75E-65 | 15.62 | 10198 | 323 | 2.41 |
| iBMA-shortlist | 35995 | 1.02E-59 | 10.99 | 14581 | 818 | 1.70 |
| Lirnet | 10491 | 1.90E-03 | 8.42 | 10080 | 132 | 1.30 |
The p-value of Pearson’s chi-square test measures the strength of association between an inferred network and the Yeastract database.
True positive rate (TPR) is defined as the proportion of inferred regulatory relationships that are documented in Yeastract.
The number of misclassified cases is the sum of false positives and false negatives.
The O/E ratio is the number of folds the observed number of recovered relationships (i.e., TP) in excess of the expected count of recovery by chance.
Assessment result for the different methods applied to data sets generated in the stimulation study
| iBMA-prior | Generated data + prior probability matrix | 14011 | <1.00E-320 | 71.13 | 16029 | 9966 |
| iBMA-shortlist | Generated data + prior probability matrix | 30753 | <1.00E-320 | 47.23 | 23652 | 14526 |
| iBMA-size | Generated data only | 9349 | <1.00E-320 | 20.31 | 27503 | 1899 |
| iBMA-noprior | Generated data only | 29393 | <1.00E-320 | 8.55 | 46317 | 2513 |
The p-value of Pearson’s chi-square test measures the strength of association between an inferred network and the true network for the simulation study.
True positive rate (TPR) is defined as the proportion of correctly inferred regulatory relationships.
The number of misclassified cases is the sum of false positives and false negatives.
Remark: The values reported in the table were averaged across the 20 replications. The true network for the simulation study contained a total of 21951 edges.
Figure 2The expected number of regulators per target gene in accordance with external knowledge. Histogram of the expected number of regulators per target gene in the A. absence / B. presence of a proper measure to account for the difference in sampling rates for positive and negative examples respectively at the supervised learning stage.