| Literature DB >> 20140075 |
Haoyu Cheng1, Lihua Jiang, Maoying Wu, Qi Liu.
Abstract
How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method.Entities:
Keywords: ChIP; P-value threshold; cooperativity; hypergeometric distribution; knock-out data; regulatory interaction; transcription factor
Year: 2009 PMID: 20140075 PMCID: PMC2808186 DOI: 10.4137/bbi.s3445
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Schematic diagram of the method. The starting point for this method depends on ChIP binding data and TF knockout data (the data sources showed on the left). For each TF, two thresholds are selected for the ChIP binding data and TF deletion data, respectively. When the binding P value of a single gene is less than the binding threshold, this gene is considered to be the binding target. Similarly, if the effectual P value of a single gene in a deletion experiment is less than its assigned threshold, then this gene is defined as the affected target. Both of the two thresholds are set in the range from 0.001 to 0.05 with an increment of 0.001. A value called overlapping significance is calculated based on the binding target set, the affected target set and the intersection of them (the intersecting ovals in the middle). This process is reiterated for all possible combinations of thresholds so that the maximal overlapping significance is obtained (procedures and formulas are showed on the right).
Figure 2.Comparison with YEASTARCT. For the 30 TFs, number of the target genes identified with the stringent P-value threshold pair (Pb = 0.001, Pe = 0.001) (blue), number of the target genes inferred with the optimal threshold pair (Pb*, Pe*) by our method (green), and the number of our predictions supported in YEASTARCT are shown (red).
Figure 3.Comparison with high-quality ChIP-chip data. Oval nodes are for genes identified with stringent P-value cutoffs (Pb = 0.001, Pe = 0.001), while rectangular nodes are for additional genes identified using optimal relaxed threshold pair by our method. Nodes with red solid border are for relations supported by YEASTRACT, otherwise with black dash border. Solid nodes are for the genes supported by high-quality ChIP-chip data. A) 126 identified target genes of RAP1. 56 additional target genes are identified (rectangular), while 51 (rectangular with red solid border) are supported by YEASTRACT and 34 (solid rectangular) are supported by high-quality ChIP-chip data. B) We have identified 48 target genes of SWI4 including SWI4 itself. SWI4-SWI4 self-regulation is showed by the arrow pointed back to SWI4 itself in the figure. Among SWI4 and other 37 additional target genes identified using optimal relaxed threshold pair by our method (rectangular), as many as 28 (rectangular with red solid border) and SWI4 are supported by YEASTRACT; 16 (solid rectangular) and SWI4 are supported by high-quality ChIP-chip data.
List of SWI6 targets with computational evidence.
| SWI6 | YBR071W | x | x | ||
| SWI6 | CHA1 | x | x | ||
| SWI6 | HTA1 | x | x | x | |
| SWI6 | YER079W | ||||
| SWI6 | PUP3 | x | x | x | |
| SWI6 | SWI4 | x | x | ||
| SWI6 | FTR1 | ||||
| SWI6 | CIS3 | ||||
| SWI6 | RPS4A | x | x | ||
| SWI6 | HMS2 | x | x | ||
| SWI6 | CWP2 | x | x | x | |
| SWI6 | EXG1 | x | x | x | |
| SWI6 | YOX1 | x | x | x | x |
| SWI6 | YMR144W | ||||
| SWI6 | SCW10 | x | x | x | x |
| SWI6 | PLB3 | ||||
| SWI6 | HTZ1 | ||||
| SWI6 | SKM1 | x | x | x | |
| SWI6 | SRL1 | x | x | x | x |
| SWI6 | YOR248W | ||||
| SWI6 | OPY2 | x | x | x | x |
List of DIG1 targets with computational evidence.
| DIG1 | UBC4 | x | x | x | |
| DIG1 | TEC1 | x | x | ||
| DIG1 | KAR4 | x | x | x | |
| DIG1 | YDR042C | ||||
| DIG1 | YDR210C-D | ||||
| DIG1 | MFA1 | ||||
| DIG1 | STE2 | x | x | x | |
| DIG1 | AGA2 | ||||
| DIG1 | BAR1 | x | x | x | |
| DIG1 | ARO7 | x | x | x |
The notion ‘X’ denotes “overlapped results”. The last column combines the left three columns, indicating whether there is any evidence from YEASTRACT, MacIsaac KD et al53 and Pham TH et al.54
List of some enriched GO annotations.
| High level transcriptional activation of genes encoding ribosomal proteins and glycolytic enzymes | 126 | (86/126) structural constituent of ribosome | 2.66E–100 | |
| (91/126) translation | 8.79E–83 | |||
| Mitotic repression of middle sporulation-specific genes, general replication initiation | 45 | (16/45) sporulation | 9.93E–15 | |
| Regulates the transcription of genes encoding enzymes involved in branched-chain amino acid synthesis | 10 | (6/10) branched chain family amino acid biosynthetic process | 1.45E-13 | |
| Transcriptional regulator of early meiotic genes, transcriptional regulation of genes involved in arginine catabolism | 44 | (2/43) arginine catabolic process | 0.00571 | |
| (7/43) meiosis | 0.00797 | |||
| Required for full Ty1 expression, Ty1-mediated gene activation | 20 | (17/20) transposition, RNA-mediated | 3.85E-25 | |
| Involved in the regulation of arginine-responsive genes | 9 | (6/9) arginine metabolic process | 2.78E-13 | |
| Controls expression of many ribosome biogenesis genes in response to nutrients and stress, regulates G2/M transitions during mitotic cell cycle and DNA-damage response | 66 | (42/66) structural constituent of ribosome | 8.62E-45 | |
| (46/66) translation | 2.90E-39 | |||
| Involved in DNA damage and replication checkpoint pathway | 6 | (3/6) deoxyribonucleotide biosynthetic process | 2.18E-07 | |
| Transcriptional activators of glycolytic genes | 50 | (10/50) glycolysis | 5.94E-14 | |
| Involved in the expression of genes encoding enzymes acting in the histidine, purine, and pyrimidine biosynthetic pathways | 12 | (4/12) purine ribonucleoside monophosphate biosynthetic process | 1.28E-07 | |
| Involved in regulation of arginine-responsive genes | 5 | (4/5) arginine biosynthetic process | 7.63E-10 | |
| Involved in the regulation of mating-specific genes, inhibits pheromone-responsive transcription | 13 | (8/13) sexual reproduction | 1.89E-09 | |
| (8/13) response to pheromone | 2.93E-10 | |||
| Overexpression confers hyperfilamentous growth | 23 | (15/23) cytosolic part | 1.02E-15 | |
| Activates transcription of genes expressed in the G1 phase | 12 | (4/12) cytokinesis, completion of separation | 8.06E-08 | |
| Activates the transcription of anti-oxidant genes in response to oxidative stress | 8 | (4/8) response to oxidative stress | 5.32E-05 | |
| Negative regulation of phospholipid biosynthetic genes | 3 | (2/3) fatty acid synthase complex | 2.54E-06 | |
| Cytosolic and nuclear protein involved in osmotic and oxidative stress responses | 9 | (2/9) structural constituent of cell wall | 0.00094 | |
| transcriptional activator in Ty1-mediated gene expression, binds E-boxes of glycolytic genes and contributes to their activation | 24 | (9/24) transposition, RNA-mediated | 7.74E-08 | |
| (4/24) glycolysis | 9.21E-05 | |||
| involved in transcriptional regulation in response to galactose | 6 | (4/6) galactose metabolic process | 1.86E-09 | |
| transcriptional activators of glycolytic genes | 6 | (6/6) glycolysis | 5.08E-14 | |
| involved in sterol uptake; involved in induction of hypoxic gene expression | 13 | (3/13) structural constituent of cell wall | 1.37E-05 | |
| required for derepression of phospholipid biosynthetic genes in response to inositol depletion | 5 | (4/5) lipid biosynthetic process | 2.77E-05 |
Functional description of regulators is from the Saccharomyces Genome Database.
Gene Ontology analysis done using GO Term Finder in SGD in Aug 31, 2008; 5952 genes were included in the background set with P-value cut-off < 0.01.