| Literature DB >> 21884587 |
Abstract
BACKGROUND: Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21884587 PMCID: PMC3203352 DOI: 10.1186/1471-2105-12-359
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The framework of our BN modeling to incorporate the quantitative information in prior knowledge.
Implemention of the new BN structure learning algorithm
| n: number of nodes in the network. |
| D: discretized expression data matrix. |
| BurnIn: number of steps to take before drawing sample networks for evaluation. Default value: 50 times the size of the sampling reservoir. |
| n_iteration: number of iterations. Default value: 80 times the size of the sampling reservoir. |
| Δ_samples: interval of sample networks being collected from the chain after burn-in. Default |
| value: 1000. |
| maxFanIn: maximum number of parents of a node. |
| A set of DAGs after reaching the max iteration step. |
| An average DAG in the form of a matrix. |
| 1. Create a sampling edge reservoir based on |
| 2. Set all elements of the adjacency matrix for the initial DAG to 0. |
| 3. for loop_index = 1: n_iteration do |
| (1) randomly select a element edge(i,j) from the edge sampling reservoir, corresponding to gene pair (i,j). |
| (2) if edge(i,j) exists in the current DAG, delete the edge; else if edge(j,i) exists in the current DAG, reverse edge(j,i) to edge(j,i); else add edge(i,j). We name these operations as "delete", "reverse" and "add", respectively. |
| (3) check whether the newly proposed DAG remains acyclic and satisfy the maxFanIn rules to nodes (i,j). If not, keep the current DAG and give up proposed DAG, go to (1). |
| (4) calculate log value of the marginal likelihood (LL)* of the expression data D of node j and its parents given the current DAG (LL_old) or the proposed DAG (LL_new) and define bf1 = exp(LL_new - LL_old). |
| (5) if the operation is "delete" or "add", bf2 = 1; if the operation is "reverse", calculate bf2 for node i in same way as for node j in (4). |
| (6) calculate the prior probability* of current DAG (prior_old) and propose DAG (prior_new); calculate the Metropolis-Hastings ratio ( |
| (7) when loop_index>BurnIn and (loop_index-BurnIn) is exactly divisible by |
| 4. End of loop, calculate the average DAG in the form of a matrix, where the elements are given by the averaged edges of all recorded DAGs weighted by their posterior probabilities. |
*Details of the definition of marginal likelihood, and how to calculate LL, prior probability of DAG, can be found in [10,31].
GO and PubMed citation contain information of functional linkage
| interval | GO similarity LLS, yeast | LLS, mouse | interval | -log10(pPubMed)LLS, yeast | LLS, mouse |
|---|---|---|---|---|---|
| [1, 1] | 1.51 | 1.62 | (4 ∞) | 0.25 | 0.37 |
| [0.2, 1) | -0.71 | -0.99 | (3 4] | 0.13 | 0.14 |
| [0, 0.2) | -1.61 | -2.2 | (1 3] | 0.07 | 0.19 |
| [0 1] | -3.4 | -3.6 |
Log Likelihood Scores of functional linkage in yeast and mouse, for gene pair in different value interval of GO similarity and PubMed co-citation significance. Gene pairs with higher GO similarity or significance of co-citation are more likely to be functionally linked.
Figure 2Distribution of functional linkage probability for all possible gene pairs, and for predicted interactions with and without prior knowledge.
Figure 3ROC curve indicating that functional linkage contains information for interaction. Plotted is the performance of pas a classifier to identify yeast TF-target pairs defined by ChIP-chip.
Figure 4Convergence of simulation. (A) Acceptance ratio versus the number of MCMC steps (B) scatter plot of the marginal posterior probabilities of the edges, obtained from two separate MCMC simulations of the yeast cell cycle data.
The improvement in network modeling with the addition of prior knowledge
| Data set | Simulated data | Yeast cell cycle study, benchmark from BIND | Yeast cell cycle study, benchmark from ChIP-chip | Mouse pancreas study |
|---|---|---|---|---|
| Number of genes | 76 | 107 | 107 | 36 |
| Number of established regulations | 124 | 114 | 190 | 24 |
| Number of possible regulations | 76*75 = 5700 | 107*106/2 = 5671* | 9*106 = 954 | 36*35 = 1260 |
| Number of known regulations recovered with (without) prior knowledge | 21 (14) | 26 (13) | 23 (11) | 12 (6) |
| Total number of regulations predicted, with (without) prior knowledge | 503 (440) | 436 (387) | 58 (33) | 322 (297) |
| Improvement over plain BN | χ 2 = 0.36, | χ 2 = 2.28, p < 0.13 | χ 2 = 0.04, p~0.84 | χ 2 = 0.98, |
| Improvement: over random selection | χ 2 = 7.32, | χ 2 = 24.5, p < 0.001 | χ 2= 6.71, p < 0.01 | χ 2 = 2.87, |
| Plain BN over random | χ 2 = 1.58, | χ 2 = 2.42, p~0.11 | χ 2 = 1.6, p~0.2 | χ 2 = 0.01, p~0.8 |
* We ignored edge direction with comparing to BIND since it contains both directed and undirected interactions.
Predicted yeast gene regulatory relationships that are annotated in BIND
| CLN2→CLN3 | |||
| ASF1→HHF1 | GAS1→KRE6 | CLN3→CLB6 | CDC14→SIC1 |
| SWI4→MBP1 | MSH6→POL30 | CLB6→CLN1 | SWI4→CHS3 |
| KAR3→NUM1 | HHF1→HHT1 | MOB1→DBF2 | RFA1→RFA3 |
| CLB1→CLB3 | CLN1→CLN3 | CDC45→CDC6 | CLB1→CLB5 |
| HHF1→HTB2 | HPR5→RAD54 | ||
| CLN3→CLN2 | |||
| DBF4→CDC5 | CDC8→CIK1 | CDC6→CDC45 | CLB3→CDC6 |
Relationships in bold font are predicted both with and without prior knowledge.
Predicted yeast gene regulatory relationships that are confirmed by ChIP-chip
| FKH1→SWE1 | FKH2→CDC6 | FKH2→SWI4 | SWI4→PSA1 |
| SWI6→HHT1 | SWI5→ASH1 | SWI6→CLN2 | FKH2→SWE1 |
| FKH2→HPR5 | SWI6→RAD54 | FKH1→RAD51 | SWI6→HHF1 |
| SWI6→AGA1 | SWI4→AGA1 | SWI4→MBP1 | |
| FKH1→CDC6 | FKH1→CDC20 | ||
Relationships in bold font are predicted both with and without prior knowledge.
Figure 5ROC curves for the network modeling of the yeast cell cycle data using plain BN (A), Werhli and Husmeier's (B), and our algorithm (C). ChIP-chip binding data were used as benchmark. Adding prior knowledge significantly improved BN performance at identifying the TF-target pairs.
Established pancreas gene regulatory relationships that are identified by BN modeling
| Known regulatory relationship | Identified by BN modeling with prior knowledge | Identified by the plain BN without prior knowledge |
|---|---|---|
| Hes1→Neurog3 | √ | |
| Hnf4a→Tcf1 | √ | √ |
| Pdx1→Gck | √ | |
| Pdx1→Hnf4a | ||
| Pdx1→Iapp | ||
| Pdx1→Ins2 | √ | |
| Pdx1→Nr5a2 | √ | √ |
| Mafb→Ins2 | ||
| Mafb→Pdx1 | ||
| Neurog3→Nkx2-2 | √ | |
| Nkx2-2→Gck | √ | √ |
| Nkx2-2→Iapp | ||
| Nkx2-2→Ins2 | ||
| Onecut1→Pdx1 | ||
| Onecut1→Neurog3 | ||
| Onecut1→Tcf1 | √ | |
| Pax6→Gck | ||
| Pax6→Iapp | √ | √ |
| Pax6→Ins2 | √ | √ |
| Pax6→Pdx1 | √ | √ |
| Tcf1→Hnf4a | ||
| Tcf1→Pdx1 | ||
| Tcf1→Pklr | ||
| Tcf1→Slc2a2 | √ |
BN with prior knowledge can recover half of the experimentally confirmed transcriptional regulations during mouse pancreas development, two times more than the plain BN without prior knowledge.
Figure 6The pancreas development network already established by existing experiments (A), predicted by the plain BN (B), and by BN + prior knowledge (C). The bold edges in (B) and (C) are those that overlap with the edges in (A).
Figure 7ROC curves for the pancreas development data with plain BN (A), Werhli and Husmeier's algorithm (B), and our approach (C). 24 experimentally confirmed interactions were used as benchmark.
The 107 Yeast cell cycle genes that were simulated for their network structure
| ACE2 (850822) | CLB6 (853003) | HHF2 (855701) | MSH6 (851671) | RFA3 (853266) |
| AGA1 (855780) | CLN1 (855239) | HHT1 (852295) | MST1 (853640) | RME1 (852935) |
| ASE1 (854223) | CLN2 (855819) | HHT1 (855700) | NDD1 (854554) | RNR1 (856801) |
| ASF1 (853327) | CLN3 (851191) | HHT2 (852295) | NUM1 (851727) | RNR3 (854744) |
| ASF2 (851330) | CTS1 (850992) | HHT2 (855700) | PCL1 (855427) | SED1 (851649) |
| ASH1 (853650) | CWP1 (853766) | HO (851371) | PCL2 (851430) | SIC1 (850768) |
| CDC14 (850585) | CWP2 (853765) | HSL1 (853760) | PCL9 (851375) | SPC42 (853824) |
| CDC20 (852762) | DBF2 (852984) | HTA1 (851811) | PDS1 (851691) | SPO12 (856557) |
| CDC21 (854241) | DBF4 (851623) | HTA2 (852283) | PMS1 (855642) | SST2 (851173) |
| CDC45 (850793) | DPB2 (856305) | HTB1 (851810) | POL1 (855621) | STE2 (850518) |
| CDC5 (855013) | DPB3 (852580) | HTB2 (852284) | POL12 (852245) | SWE1 (853252) |
| CDC6 (853244) | EGT2 (855389) | KAR3 (856263) | POL2 (855459) | SWI4 (856847) |
| CDC8 (853520) | FAR1 (853283) | KAR4 (850303) | POL30 (852385) | SWI5 (851724) |
| CDC9 (851391) | FKH1 (854675) | KIN3 (851273) | PRI1 (854825) | SWI6 (850879) |
| CHS1 (855529) | FKH2 (855656) | KRE6 (856287) | PRI2 (853821) | TEC1 (852377) |
| CHS3 (852311) | FKS1 (851055) | MBP1 (851503) | PSA1 (851504) | TIP1 (852359) |
| CIK1 (855238) | FUS1 (850330) | MCD1 (851561) | RAD17 (854550) | TIR1 (856729) |
| CLB1 (853002) | GAS1 (855355) | MCM1 (855060) | RAD27 (853747) | UNG1 (854987) |
| CLB2 (856236) | GIC2 (851904) | MFA2 (855577) | RAD51 (856831) | YRO2 (852343) |
| CLB3 (851400) | HHF1 (852294) | MNN1 (856718) | RAD54 (852713) | |
| CLB4 (850907) | HHF1 (855701) | MOB1 (854700) | RFA1 (851266) | |
| CLB5 (856237) | HHF2 (852294) | MSH2 (854063) | RFA2 (855404) |
In parenthesis are the corresponding gene IDs.
The 36 mouse genes chosen to reconstruct interaction networks during pancreas development and growth
| Acvr1 (11477) | Hes1 (15205) | Nfe2l2 (18024) | Pdx1 (18609) |
| Anxa4 (11746) | Hnf4a (15378) | Nfkb1 (18033) | Pklr (18770) |
| Bmp2 (12156) | Iapp (15874) | Nfkbia (18035) | Ppib (19035) |
| Cbfb (12400) | Ins2 (16334) | Nkx2-2 (18088) | Psen2 (19165) |
| Chuk (12675) | Isl1 (16392) | Npm1 (18148) | Rps4x (20102) |
| Cryz (12972) | Mafb (16658) | Nr5a2 (26424) | Slc2a2 (20526) |
| Foxa2 (15376) | Myo6 (17920) | Nrp1 (18186) | Stat3 (20848) |
| Foxa3 (15377) | Nckap1 (50884) | Onecut1 (15379) | Tcf1 (21405) |
| Gck (103988) | Neurog3 (11925) | Pax6 (18508) | Ugcg (22234) |
In parenthesis are the corresponding gene IDs.