| Literature DB >> 20840731 |
Fei Luo1, Juan Liu, Jinyan Li.
Abstract
BACKGROUND: Proteins interacting with each other as a complex play an important role in many molecular processes and functions. Directly detecting protein complexes is still costly, whereas many protein-protein interaction (PPI) maps for model organisms are available owing to the fast development of high-throughput PPI detecting techniques. These binary PPI data provides fundamental and abundant information for inferring new protein complexes. However, PPI data from different experiments do not overlap very much usually. The main reason is that the functions of proteins can activate only on certain environment or stimulus. In a short, PPI is condition-specific. Therefore specifying the conditions on when complexes are present is necessary for a deep understanding of their behaviours. Meanwhile, proteins have various interaction ways and control mechanisms to form different kinds of complexes. Thus the discovery of a certain type of complexes should depend on their own distinct biological or topological characteristics. We do not attempt to find all kinds of complexes by using certain features. Here, we integrate transcription regulation data (TR), gene expression data (GE) and protein-protein interaction data at the systems biology level to discover a special kind of protein complex called conditional co-regulated protein complexes. A conditional co-regulated protein complex has three remarkable features: the coding genes of the member proteins share the same transcription factor (TF), under a certain condition the coding genes express co-ordinately and the member proteins interact mutually as a complex to implement a common biological function.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20840731 PMCID: PMC2982691 DOI: 10.1186/1752-0509-4-S2-S4
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Schematic overview of conditional co-regulated protein complex identification. Given a condition, the active TFs are identified first and find out all target genes from TR data for each active TF and their protein products. The proteins and protein-protein interaction form a sub-network in the global PPI network. This sub-network will be decomposed into several weakly connected components (WCCs). A greedy searching way would be implemented to find out possible candidate groups, which is a linked sub-network in the WCC. We calculate a coherent score for candidates. The higher the score is, the more likely the candidate is to become a co-regulated complex. Finally, a relationship between the activity of TF and coherence of candidates with the conditions’ changing is investigated, which can be used to judge TF impact on their products’ function. Candidates with high coherent scores and their nodes exceeding a threshold are used as seeds to search additional extra members and predict new TRs.
Figure 2Inferring new TR interactions. From the existing TR interaction data, genes a, b, c, and d are known to be target genes of a TF, and A, B, C, and D stands for their proteins. From the PPI network and GE, if we observe that the proteins A, B, C, D and an additional protein E interact intensively one another, and that their coding genes a, b, c, d and e express co-ordinately, we can predict that the TF also regulate e under the same condition. The reason is that E is so similar to the co-regulated protein group of A, B, C, D at the levels gene expression and protein-protein interaction that it can be inferred that e also has TR interactions associated with TF just as a, b, c and d do, although the known TR dataset does not indicate this.
Pseudo codes of our search method
| Input: | |
| Output: | |
| step1: | |
| step2: | for |
| step2.1: | Calculates |
| step2.2: | |
| step2.3 | randomly choose a vertex |
| step2.4: | if ( |
| step2.5: | Calculates |
| step2.6: | Δ = |
| step2.7: | If(Δ > 0) |
| step3: | end |
Figure 3Distribution of active TFs over the three conditions. In the total 142 TFs, 88, 76 and 75 active TFs are determined in Cell Cycle, Diauxic Shift and DNA Damaging conditions respectively.
conditional co-regulated complex seeds under three conditions
| YHR047C,YHR128W , | 0.13 | 1.93 | 1.92 | 500.40.20 | F | T | T | |
| YOL108C | 0.71 | 3.17 | 2.08 | 170 | T | T | T | |
| 3.18 | -0.49 | 0.25 | 550.1.212 | T | F | F | ||
| YER159C, | 1.90 | 0.82 | 0.80 | 550.1.196 | T | F | F | |
| YDR501W | 2.55 | 3.51 | 1.65 | 550.1.109 | T | F | F | |
| YEL009C | YOR108W, | 1.93 | 2.31 | -0.4 | 550.2.327 | T | T | T |
| YKR070W, | 2.74 | 1.12 | 0.06 | 360.10.10 | T | F | F | |
| 2.42 | 3.07 | 1.92 | 420.3 | T | T | T | ||
| YDL056W | 4.48 | 1.29 | 2.08 | 475.05 | T | T | T | |
| YKL062W | 2.55 | 2.37 | -1.1 | 550.1.29 | T | T | T | |
| 2.03 | 2.08 | 1.91 | 310 | T | T | T | ||
| YDL056W | 5.21 | 1.95 | -0.6 | 550.1.202 | T | T | T | |
| 4.30 | 4.64 | 2.36 | 360.10.20 | T | T | T | ||
| 3.35 | 0.56 | -0.1 | 470.30.10 | T | F | F | ||
| YML007W | YGR209C,YLR043C,YLR109W, | 3.04 | 2.36 | 0.58 | 550.1.41 | T | T | T |
| YKL112W | 3.58 | 1.91 | 1.48 | 500.60.10 | T | T | T | |
| YEL037C, | 3.03 | 3.15 | 1.95 | 550.1.41 | T | T | T | |
| YNL255C,YNR038W, | 7.84 | 5.22 | 4.54 | 550.1.149 | T | T | T | |
| 3.05 | 1.35 | 1.00 | 320 | T | F | F | ||
| YNL216W | YEL054C, | 3.78 | 1.5 | 0 | 500.40.10 | T | F | F |
| YPR104C | YEL054C, | 3.78 | 1.5 | 0 | 500.40.10 | T | F | F |
| YER111C | 3.26 | 2.38 | 1.89 | 320 | T | F | F | |
| YEL009C | 2.83 | 1.96 | 0.22 | 550.1.195 | T | T | T | |
| YGL073W | YAL005C,YLL024C, | 11.1 | 5.75 | -0.4 | 550.2.360 | T | T | T |
| YBR049C | 4.31 | 1.12 | 1.92 | 510.1 | T | T | T | |
| YBR049C | YDL213C,YGL120C, | 2.75 | 0.88 | 1.91 | 550.1.103 | T | T | T |
| YBL099W,YDR298C, | 6.35 | 0.67 | 4.09 | 420.3 | T | F | T | |
| 3.03 | 2.79 | 1.91 | 550.1.213 | T | T | T | ||
| YDL141W,YDR412W, | 3.76 | 1.91 | 1.03 | 550.1.125 | T | T | F | |
| YBR049C | 3.49 | 1.25 | 2.57 | 500.40.10 | T | T | T | |
Figure 4The distribution histogram. T (L2edges_random) in 10000 random sampling and the corresponding expression profiles of the complex seed consisting of YDL156W, YAR007C, and YJL115W with 2 PPI edges. (a) Cell Cycle, (b) DNA damaging,(c) Diauxic Shift. In the left panels, the X axis is the T() score and each interval is 0.1. Y axis is the frequency of the T () in each interval. The right panels are the expression profiles YDL156W, YAR007C, and YJL115W under corresponding conditions.
Figure 5The variation of Score( (a) Tstart=2, Tnd=0.01, N=3000, (b) Tstart=1, Tend=0.01, N=3000, (c) Tstart=0.05, Tend=0.01, N=3000
Figure 6The topology of seed and final result of Hsf1 (YGL073W). The upper panel (a) is the topology of the seed and the final result of the extensive search. ’Hexagon’ is an added node in the final result and ’Circle’ is the node in the seed. The lower panel (b) and (c) shows the coding genes’ expression profiles of the proteins in the seed and the added proteins in the final result of Cell Cycle and DNA Damaging respectively.
Figure 7Conserved motifs found in the seed and predicted target genes. Two motifs predicted by MEME [36] from the upstream 600p of the five coding genes in the seed. The first motif is consistent with a known Consensus Motif (GAAXXTTCXXGAA) for Hsf1 in TRANSFAC.
Predicted TRs(extra members) for perfect complex
| Condition | TF | Predicted Target Genes |
|---|---|---|
| c2,c3 | YLR441C, YIL148W | |
| c1 | YOL090W,YPL153C,YDR097C,YER095W, YMR078C,YNL312W,YMR200W,YPR080W, YJL173C | |
| c1 | YML063W1,YGL048C2,YMR078C3, YOL012C3, YBR111W-A3 | |
| c1,c2,c3 | YEL024W, YJR121W, YGR183C | |
| c1,c2,c3 | YDR394W, YOR261C, YIL075C, YMR276W | |
| c1 | YKL178C | |
| c1,c2,c3 | YLR421C1,YFR052W1,YOL041C2, YER006W2, YGR103W2, YNL061W2, YER126C2, YMR128W2, YKL009W2, YDL150W3 | |
| c1,c3 | YKR065C, YCR012W | |
| c1,c2 | YPL093W, YLR222C | |
| c1,c2,c3 | YNL178W |