| Literature DB >> 26072483 |
Francesca Petralia1, Pei Wang1, Jialiang Yang1, Zhidong Tu1.
Abstract
MOTIVATION: Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26072483 PMCID: PMC4542785 DOI: 10.1093/bioinformatics/btv268
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.iRafNet schematics. For each gene , we determine a ranked list of potential regulators via iRafNet. Based on each data , we derive weights measuring the prior belief of regulatory relationships . Using expression data, we run random forest to find genes regulating gj. At each node, instead of sampling a random subset of genes from the entire set of genes; we randomly choose an integer and we sample genes according to weights . The final network is derived by ranking potential regulators based on the random forest importance score
Comparison between iRafNet, GENIE3 and the best performer in the challenge in terms of the AUC and AUPR for experiments from the DREAM 4 in-silico size 100 challenge
| Method | GENIE3 | iRafNet | ||||
|---|---|---|---|---|---|---|
| AUC | AUPR | AUC | AUPR | AUC | AUPR | |
| Net 1 | 0.864 | 0.338 | 0.901 (0.870,0.932) | 0.552 (0.548,0.556) | 0.914 | 0.536 |
| Net 2 | 0.748 | 0.309 | 0.799 (0.765,0.834) | 0.337 (0.333,0.341) | 0.801 | 0.377 |
| Net 3 | 0.782 | 0.277 | 0.835 (0.798,0.873) | 0.414 (0.410,0.418) | 0.833 | 0.39 |
| Net 4 | 0.808 | 0.267 | 0.847 (0.813,0.881) | 0.421 (0.417,0.426) | 0.842 | 0.349 |
| Net 5 | 0.720 | 0.114 | 0.792 (0.751,0.832) | 0.298 (0.294,0.301) | 0.759 | 0.213 |
For iRafNet, 95% confidence intervals are provided under the corresponding AUC and AUPR values in brackets.
Comparison between iRafNet, GENIE3, Meta 1 and COMMUNITY in terms of the AUC and AUPR with corresponding 95% confidence intervals for synthetic experiments from the DREAM 5 challenge
| Method | Data | Network 1 | Network 3 | ||
|---|---|---|---|---|---|
| GENIE3 | Exp | 0.815 (0.807,0.823) | 0.291 (0.289,0.295) | 0.617 (0.607,0.627) | 0.093 (0.091,0.106) |
| Meta 1 | KO | 0.736 (0.727,0.745) | 0.276 (0.274,0.277) | 0.614 (0.604,0.624) | 0.087 (0.085,0.089) |
| Community | Exp, KO, TS | 0.809 (0.801,0.817) | 0.327 (0.326,0.329) | 0.65 (0.639,0.660) | 0.09 (0.090,0.105) |
| iRafNet | Exp, KO | 0.812 (0.804,0.82) | 0.364 (0.361,0.364) | 0.638 (0.629,0.651) | 0.113 (0.110,0.115) |
| Exp, KO, TS | 0.813 (0.804,0.819) | 0.364 (0.360,0.366) | 0.641 (0.63,0.651) | 0.112 (0.109,0.114) | |
Fig. 2.ROC curves resulting from various methods for the estimation of Network 1 and Network 3 from the DREAM 5 challenge. Community is an ensemble algorithm which derives a consensus network by integrating predictions of GENIE3 and the other 34 teams participating in the challenge. iRafNet infers GRN by integrating all knockout, time-series and steady-state gene expression data
Networks output from GENIE3 and iRafNet
| No of edges | No of directed edges | No of shared edges | No of shared directed edges | No of enriched GO terms | ||
|---|---|---|---|---|---|---|
| 0.05 | 0.01 | |||||
| GENIE3 | 156 359 | 200 000 | 102 501 | 126 009 | 51 | 44 |
| iRafNet | 163 886 | 200 000 | 102 501 | 126 009 | 61 | 51 |
For both GENIE3 and iRafNet, we consider the set of 200 000 highest scored directed edges, referred to as . As shown, the number of unique undirected edges a − b was 156 359 and 163 886 for GENIE3 and iRafNet, respectively. For each method, we show the number of GO categories with significant enrichment for different P-value thresholds (0.05 and 0.01).
Prediction performance of TF regulations
| Method | Data | AUC | AUPR |
|---|---|---|---|
| GENIE3 | Expression | 0.547 (0.537,0.566) | 0.542 (0.537,0.548) |
| iRafNet | Multiple weights | 0.624 (0.613,0.636) | 0.565 (0.561,0.569) |
| Expression and KO | 0.657 (0.645,0.673) | 0.567 (0.562,0.574) | |
| Expression and TS | 0.543 (0.528,0.557) | 0.536 (0.530,0.541) | |
| Expression and PPI | 0.574 (0.562,0.591) | 0.557 (0.551,0.561) |
For each model, the AUC and the AUPR and corresponding 95% confidence intervals are reported.
Prediction of TF regulations using different cutoffs
| Method | ||||||
| iRafNet | 64 | 49 | 85 | 64 | 103 | 77 |
| GENIE3 | 28 | 7 | 34 | 11 | 44 | 13 |
Cardinality (ne, nd) of sets (Re, Rd). Let be the set of the first th directed edges with highest scores, with th = {60 000; 80 000; 100 000}. Then, is defined as the set of directed edges found to be significant (P < 0.01) by Lee , while is defined as the set of directed edges in Re for which the opposite direction is not included in set .
GRN inference performance using different numbers of trees in random forest learning
| Number of trees | 500 | 1000 | 5000 |
|---|---|---|---|
| AUC | 0.810 | 0.815 | 0.813 |
| AUPR | 0.290 | 0.291 | 0.294 |
Network 1 from the DREAM 5 challenge is considered and performance measured in terms of AUC and the AUPR.