| Literature DB >> 27623007 |
Abstract
In eukaryotic cells, transcriptional regulation of gene expression is usually achieved by cooperative transcription factors (TFs). Therefore, knowing cooperative TFs is the first step toward uncovering the molecular mechanisms of gene expression regulation. Many algorithms based on different rationales have been proposed to predict cooperative TF pairs in yeast. Although various types of rationales have been used in the existing algorithms, functional coherence is not yet used. This prompts us to develop a new algorithm based on functional coherence and similarity of the target gene sets to identify cooperative TF pairs in yeast. The proposed algorithm predicted 40 cooperative TF pairs. Among them, three (Pdc2-Thi2, Hot1-Msn1 and Leu3-Met28) are novel predictions, which have not been predicted by any existing algorithms. Strikingly, two (Pdc2-Thi2 and Hot1-Msn1) of the three novel predictions have been experimentally validated, demonstrating the power of the proposed algorithm. Moreover, we show that the predictions of the proposed algorithm are more biologically meaningful than the predictions of 17 existing algorithms under four evaluation indices. In summary, our study suggests that new algorithms based on novel rationales are worthy of developing for detecting previously unidentifiable cooperative TF pairs.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27623007 PMCID: PMC5021274 DOI: 10.1371/journal.pone.0162931
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The rationales of 17 existing algorithms.
| Authors | The rationale of the existing algorithm for predicting cooperative TF pairs (CTFPs) | # of predicted CTFPs |
|---|---|---|
| Banerjee and Zhang [ | For a CTFP, the genes bound by both TFs should be more co-expressed than are the genes bound by either TF alone. | 31 |
| Harbison et al. [ | For a CTFP, their binding sites should co-occur more often within the same promoters than would be expected by chance. | 94 |
| Nagamine et al. [ | For a CTFP, the genes bound by both TFs should be closer in the protein-protein interaction network than are the genes bound by either TF alone. | 24 |
| Tsai et al. [ | For a CTFP, their interaction effect (estimated using ANOVA) should significantly influence the expression of genes bound by both TFs. | 18 |
| Chang et al. [ | A stochastic system model is developed to assess TF cooperativity. | 55 |
| He et al. [ | The multivariate statistical method, ANOVA, is used to test whether the expressions of the target genes were significantly influenced by the cooperative effect of their TFs. | 30 |
| Wang [ | Pairwise mixed graphical models or Gaussian graphical models are used for identifying combinatorial regulation of TFs. | 14 |
| Yu et al. [ | An algorithm called Motif-PIE is developed for predicting interacting TF pairs based on the co-occurrence of their binding motifs and the distance between the motifs in promoter sequences. | 300 |
| Elati et al. [ | A data mining technique called LICORN is developed for deriving cooperative regulations. | 20 |
| Datta and Zhao [ | Log-linear models are used to study cooperative bindings among TFs. | 25 |
| Chuang et al. [ | For a CTFP, the distance between their binding sites (in the promoter of their common target genes) should be significantly closer than expected by chance. | 13 |
| Wang et al. [ | A Bayesian network framework is presented to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. | 159 |
| Yang et al. [ | CTFPs are predicted by identifying the most statistically significant overlap of target genes regulated by two TFs in ChIP-chip data and TF knockout data. | 186 |
| Chen et al. [ | A method called simTFBS is developed for inferring TF-TF interactions by incorporating motif discovery as a fundamental step when detecting overlapping targets of TFs based on ChIP-chip data. | 221 |
| Lai et al. [ | For a CTFP, (i) the two TFs should have a significantly higher number of common target genes than random expectation and (ii) their binding sites (in the promoters of their common target genes) should tend to be co-depleted of nucleosomes in order to make these binding sites simultaneously accessible to TF binding. | 27 |
| Wu and Lai [ | For a CTFP, the overlap of the targets (defined by TF binding and TF perturbation data) of these two TFs should be higher than random expectation. | 50 |
| Spivak and Stormo [ | For a CTFP, the distribution of nucleotide spacings between their binding sites should be deviated significantly from random expectation. | 1399 |
Fig 1The proposed two-step procedure of calculating the cooperativity score of a TF pair (TF1-TF2).
Five types of validation of the 40 PCTFPs from the proposed algorithm.
| PCTFP | Evidence of the cooperativity between TF1 and TF2 | |||||
|---|---|---|---|---|---|---|
| TF1 | TF2 | Algorithm Evidence | Physical/Genetic Evidence | Co-citations | # of Common GO Terms | # of Common Targets |
| Arg80 | Arg81 | 7 | 5 | 47 | 5 | 8 |
| Ifh1 | Sfp1 | 1 | 0 | 16 | 4 | 82 |
| Met28 | Met31 | 2 | 1 | 29 | 8 | 11 |
| Hap2 | Hap4 | 5 | 3 | 100 | 8 | 18 |
| Met32 | Met4 | 5 | 6 | 43 | 6 | 30 |
| Met31 | Met32 | 6 | 8 | 54 | 14 | 18 |
| Hap3 | Hap5 | 5 | 5 | 65 | 10 | 4 |
| Met31 | Met4 | 5 | 5 | 42 | 6 | 14 |
| Met28 | Met4 | 3 | 7 | 35 | 10 | 13 |
| 0 | 0 | 6 | 5 | 2 | ||
| Met28 | Met32 | 3 | 0 | 33 | 9 | 14 |
| Mig1 | Mig2 | 3 | 7 | 67 | 14 | 4 |
| Ifh1 | Rap1 | 1 | 1 | 22 | 4 | 105 |
| Gcr1 | Gcr2 | 3 | 10 | 26 | 7 | 8 |
| Hap3 | Hap4 | 1 | 3 | 93 | 8 | 7 |
| Fhl1 | Ifh1 | 1 | 7 | 33 | 7 | 26 |
| Rap1 | Sfp1 | 7 | 0 | 36 | 6 | 113 |
| Hap2 | Hap3 | 3 | 4 | 118 | 9 | 7 |
| Hap4 | Hap5 | 3 | 2 | 59 | 8 | 6 |
| Aft1 | Aft2 | 5 | 9 | 63 | 6 | 15 |
| Stp1 | Stp2 | 2 | 5 | 40 | 9 | 2 |
| Mbp1 | Swi6 | 13 | 12 | 147 | 7 | 14 |
| 0 | 0 | 13 | 3 | 2 | ||
| Gal4 | Gal80 | 3 | 34 | 185 | 6 | 2 |
| Gcr2 | Tye7 | 1 | 5 | 7 | 3 | 6 |
| Pdr1 | Pdr3 | 5 | 12 | 187 | 10 | 30 |
| Dal81 | Stp2 | 1 | 1 | 17 | 3 | 2 |
| Ino2 | Ino4 | 6 | 13 | 117 | 10 | 10 |
| Cbf1 | Met4 | 5 | 5 | 39 | 7 | 24 |
| Ace2 | Swi5 | 12 | 3 | 99 | 9 | 30 |
| Oaf1 | Pip2 | 4 | 6 | 57 | 13 | 13 |
| Cbf1 | Met32 | 5 | 1 | 38 | 8 | 23 |
| Dal80 | Dal81 | 1 | 0 | 28 | 7 | 4 |
| Dal81 | Gln3 | 2 | 0 | 23 | 7 | 9 |
| Msn2 | Sok2 | 2 | 2 | 36 | 6 | 150 |
| Ste12 | Tec1 | 6 | 12 | 114 | 9 | 171 |
| Msn2 | Yap1 | 3 | 2 | 114 | 7 | 143 |
| 0 | 0 | 13 | 6 | 3 | ||
| Swi4 | Swi6 | 14 | 29 | 256 | 7 | 21 |
| Bas1 | Pho2 | 3 | 7 | 52 | 6 | 7 |
A PCTFP in boldface means that it is a novel CTFP predicted by the proposed algorithm. “Algorithm Evidence” provides the number of existing algorithms which predict the PCTFP. “Physical/Genetic Evidence” provides the number the experimental papers which suggest that the two TFs of the PCTFP have physical or genetic interaction. “Co-citations” provides the number of experimental papers which study the biological roles of both TFs of the PCTFP. More details could be seen at http://cosbi2.ee.ncku.edu.tw/40TFI/.
Fig 2The performance comparison of the proposed algorithm and 17 existing algorithms in the literature.
Performance comparison of the proposed algorithm and 17 existing algorithms using four existing evaluation indices. The performance comparison results using (a) index 1, (b) index 2, (c) index 3 and (d) index 4 are shown, where Rj means that the algorithm is ranked j among the 18 compared algorithms. For example, the proposed algorithm ranks first (R1) using the evaluation index 4 since the proposed algorithm has the largest score calculated using index 4. (e) The average rank is used to give the overall performance of an algorithm under four different evaluation indices. The average rank of an algorithm is the average of the ranks of an algorithm under four evaluation indices. For example, the average rank of the proposed algorithm is 1.5 = (2+2+1+1)/4 and the average rank of WangY’s algorithm is 4 = (1+4+6+5)/4. The smaller the average rank is, the better the performance of an algorithm is. Since the proposed algorithm has the smallest average rank, the overall performance of the proposed algorithm is the best among all the 18 compared algorithms.
Fig 3Robustness analysis of the proposed algorithm.
The average rank of the proposed algorithm using top N, where (a) N = 30, (b) N = 35, (c) N = 45, and (d) N = 50, TF pairs of the ranked list of 11325 TF pairs as the PCTFPs from the proposed algorithm. It can be seen that no matter which value of N is used, the proposed algorithm always has the smallest average rank. That is, the PCTFPs from the proposed algorithm are always more biologically meaningful than those from the 17 existing algorithms. This suggests that the proposed algorithm is robust against the number of chosen PCTFPs.
Fig 4The scores of the four evaluation measures (shown in (a), (b), (c), (d)) with different top N (N = 30, 35, 40, …) chosen for the proposed algorithm.