Literature DB >> 21245946

Identifying a Transcription Factor's Regulatory Targets from its Binding Targets.

Fred Lai1, Julie S Chang, Wei-Sheng Wu.   

Abstract

ChIP-chip data, which shows binding of transcription factors (TFs) to promoter regions in vivo, are widely used by biologists to identify the regulatory targets of TFs. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop computational methods which can extract a TF's regulatory targets from its binding targets. We developed a method, called REgulatory Targets Extraction Algorithm (RETEA), which uses partial correlation analysis on gene expression data to extract a TF's regulatory targets from its binding targets inferred from ChIP-chip data. We applied RETEA to yeast cell cycle microarray data and identified the plausible regulatory targets of eleven known cell cycle TFs. We validated our predictions by checking the enrichments for cell cycle-regulated genes, common cellular processes and common molecular functions. Finally, we showed that RETEA performs better than three published methods (MA-Network, TRIA and Garten et al's method).

Entities:  

Keywords:  ChIP-chip data; binding targets; regulatory targets; transcription factors

Year:  2010        PMID: 21245946      PMCID: PMC3020039          DOI: 10.4137/GRSB.S6458

Source DB:  PubMed          Journal:  Gene Regul Syst Bio        ISSN: 1177-6250


Introduction

A cell responds to environmental and physiological changes through reorganization of genomic expression. This kind of regulation is realized by transcriptional regulatory networks (TRNs), which are mainly controlled by transcription factors (TFs). Therefore, identifying the sophisticated architecture of TRNs would reveal the fundamental aspects of the mechanisms involved in the maintenance of life and adaptation to new environments.1–5 The first step toward reconstructing TRNs is to identify the target genes of known TFs.6–10 Genome-wide transcription factor binding analysis, also called ChIP-chip analysis, was developed to fulfill this goal.11,12 ChIP-chip analysis can be used to identify physical interactions between TFs and the promoter regions which they bind to. Simon et al13 performed ChIP-chip experiments to find out the binding targets of nine major cell cycle TFs. Lee et al14 performed ChIP-chip experiments to investigate how the yeast 106 TFs bind to promoter sequences across genome. Harbison et al15 conducted genome-wide transcription factor binding assays for 203 TFs in yeast to construct an initial map of the yeast’s transcriptional regulatory code. All these three studies are experiment-based approaches. They provided direct evidence of TF-promoter binding relationships. However, TF-promoter binding relationships are not equal to TF-gene regulatory relationships. A TF may bind to the promoter of a gene but has no regulatory effect on that gene’s expression. Hence, additional information is required to solve this ambiguity inherent in ChIP-chip data. Gene expression data were widely used to solve this problem. Exploiting the additional information provided by gene expression data, several algorithms have been developed to identify a TF’s regulatory targets from its binding targets (inferred from the ChIP-chip analysis). For instance, Garten et al’s method6 used co-expression analysis, MA-Network9 used multivariate regression analysis, and TRIA7 used time-lagged correlation analysis on gene expression data to classify a TF’s binding targets (inferred from the ChIP-chip analysis) into regulatory and non-regulatory targets. In this paper, we develop a new method, called REgulatory Targets Extraction Algorithm (RETEA), which applies partial correlation analysis between a TF and all those pairs of its binding targets which are highly co-expressed. Partial correlation analysis has been widely used to determine whether the association between two variables is due to the effect of the third variable.16,17 Here partial correlation is used to measure the residual correlation between two co-expressed binding targets of a TF after removing the TF’s regulatory effect. Low partial correlation means that the co-expression between the two binding targets of the TF is mainly due to that TF’s regulatory effect. That is, this co-expressed binding target pair of the TF can be regarded as the co-regulation pair of the TF. Therefore, RETEA assigns a pair of the TF’s binding targets as the TF’s regulatory targets if these two binding targets have high correlation but low partial correlation. The flowchart of RETEA could be seen in Figure 1.
Figure 1

The flowchart of RETEA.

Note: In the figure, g1 to g5 represent the five binding targets of Abf1. Among them, only g1, g2 and g3 are identified by RETEA as the regulatory targets of Abf1.

Methods

Datasets

Four data sources were used in this study. First, the ChIP-chip data of the cell cycle TFs in the rich media growth condition were downloaded from Harbison et al’s paper.15 Second, the gene expression data of the yeast cell cycle process were downloaded from Paramila et al’s paper.18 Samples for all genes in the yeast genome are collected every 5 minutes for 25 time points, which cover two cell cycles. Third, the mutant data of the TFs under study are downloaded from Hu et al’s paper.8 They grew each of 263 TF knockout strains as replicates and compared mRNA expression of each of these strains with a wild-type strain using microarrays to identify the target genes whose expression profiles are affected when a TF has been knocked out. Fourth, the genome-wide distribution of the high-confidence TFBSs of many TFs in yeast was downloaded from MacIsaac et al’s paper.19 The high-confidence TFBSs were derived by using six motif discovery methods, with the requirement for conservation across at least two of four related yeast species.

REgulatory Targets Extraction Algorithm (RETEA)

We first define B+ as the set of genes that are significantly bound by a TF. Three previous papers13–15 used a statistical error model to assign a P-value to the binding relationship of a TF-promoter pair. They found that if P-value ≤0.001, the binding relationship of a TF-promoter pair is of high confidence and can usually be confirmed by promoter-specific PCR. Therefore, we include a gene in the set B+ if the P-value indicating that a TF would bind to the promoter of the gene is ≤0.001. Then RETEA is used to classify B+ (binding targets of a TF) into B+R+ (regulatory targets of a TF) and B+R− (non-regulatory targets of a TF). Two genes in B+ are assigned into B+R+ if they have high expression correlation but low partial expression correlation (ie, low residual expression correlation after removing the regulatory effect of the TF). Those genes in B+ that are not belong to B+R+ are assigned into B+R−. The details of RETEA are as follows. Let ⟶ = (x1,..., x) and &yrarr; = (y1,..., y) be the gene expression time profiles of two genes x and y retrieved from the cell cycle microarray data.18 Let &zrarr; = (z1,..., z) be the protein activity time profile of TF z. Since the protein activity profiles of TFs are not available in the public domain, they need to be estimated by computational methods. In this study, we combine the mutant and gene expression data to do this task. The protein activity time profile of TF z is estimated by using the average of the gene expression time profiles of all the genes whose expressions are affected by the deletion of the TF z (inferred from the mutant data).8 Assume that the genes x and y are in the set B+ of TF z. Compute the Pearson correlation r between genes x and y and the partial correlation r| between the genes x and y given the TF z as follows where , r (or r) is the Pearson correlation between the gene expression time profile of gene x (or y) and the protein activity time profile of TF z. Then the genes x and y are assign to B+R+ of TF z if r > Th1 & r < Th2, where Th1 and Th2 are the given thresholds. That is, the genes x and y are regarded as the regulatory targets of TF z if they have high expression correlation but low residual expression correlation after removing the regulatory effect of the TF z. Those genes in B+ that are not belong to B+R+ are assigned into B+R−. We claim that the genes in B+R+ are more likely to be the TF’s regulatory targets than are the genes in B+R−.

Results

Only a subset of a TF’s binding targets are identified as its regulatory targets

Since cell cycle process is one of the most well-investigated cellular processes in yeast, we applied our method to identify the plausible regulatory targets of known cell cycle TFs (according to MIPS database).20 Eleven cell cycle TFs whose sizes of B+ greater than 65 are considered in this study. The number of genes in B+R+ and B+R− is listed in Table 1. On average, 60% of a TF’s binding targets are identified as its regulatory targets, which is similar to the results of MA-Network9 (58%) and TRIA7 (55%). The following two analyses were performed to validate our results.
Table 1

The numbers of genes in B+, B+R+ and B+R− for each of the eleven cell cycle TFs under study.

TFB+B+R+B+R
Abf121312984
Swi41348450
Swi61348846
Cin51275572
Rap11258738
Fkh11168333
Mbp11148826
Fkh21076641
Ume61003961
Swi5904545
Mcm1673532

First validation: Enrichment for cell cycle-regulated genes in B+R+ and B+R−

Since the function of a cell cycle TF is to regulate the expression of the cell cycle-regulated genes, the regulatory targets of a cell cycle TF should be enriched in cell cycle-regulated genes. Therefore, our predictions are validated if the cell cycle-regulated genes are more enriched in B+R+ than in B+R−. We first compute the proportions of genes of B+R+ and B+R− that belong to the 666 cell cycle-regulated genes identified by Pramila et al.18 We then test whether the enrichment of the cell cycle-regulated genes in B+R+ is statistically higher than that in B+R−. The cumulative hypergeometric distribution is used to assign a P-value for determining the statistical significance (see Appendix for details). In most cases (9/11), except for Rap1 and Ume6, the cell cycle-regulated genes are more enriched in B+R+ than in B+R− with P-value <0.005 (see Table 2). This result suggests that our criterion for distinguishing regulatory from non-regulatory targets of a cell cycle TF is reliable.
Table 2

The enrichment of the cell cycle-regulated genes in B+R+ and B+R−.

TFB+R+B+RP-value
Abf123/1294/843.37E-03
Swi455/846/505.73E-10
Swi661/887/461.32E-09
Cin516/553/721.05E-04
Rap112/874/384.28E-01
Fkh141/835/334.63E-04
Mbp156/881/261.70E-08
Fkh249/666/411.00E-09
Ume68/399/613.14E-01
Swi522/458/451.65E-03
Mcm126/357/321.83E-05

Second validation: Enrichment for the common cellular processes and common molecular functions in B+R+ and B+R−

Because genes in B+R+ are regulated by the same TF, they are likely to be involved in the same cellular process or even have the same molecular function. Therefore, our predictions are validated if B+R+ is more enriched than B+R− for the common cellular processes and common molecular functions. Using GO term finder in SGD21 with FDR <0.05, we found that in all cases (11/11), the number of enriched common cellular processes in B+R+ is larger than that in B+R− (see Fig. 2A). Besides, using GO term finder in SGD with FDR <0.05, we found that in most cases (9/11), except for Fkh1 and Fkh2, the number of enriched common molecular functions in B+R+ is larger than that in B+R− (see Fig. 2B). This result suggests that our criterion for distinguishing a TF’s regulatory from non-regulatory targets is reliable because co-regulated genes should have a greater probability to have the common cellular processes and common molecular functions than non-co-regulated genes.
Figure 2

Testing for the enrichment for the common cellular processes and common molecular functions in B+R+ and B+R− for eleven cell cycle TFs.

Taken together, the two validations mentioned above convincingly demonstrate that RETEA is capable of extracting a TF’s regulatory targets from its binding targets.

Discussions

Performance comparison with three published methods

To identify the regulatory targets of a TF, Gao et al9 developed MA-Network that used multivariate regression analysis on gene expression data and Wu et al7 developed TRIA that identified a temporal relationship between a TF and its target genes. Besides, Garten et al6 developed a method to identify a TF’s regulatory targets by integrating the ChIP-chip, promoter sequence, and gene expression data. In their approach, gene i is said to be regulated by TF j if it is a binding target of the TF j (inferred from the ChIP-chip data) and it also has the following four kinds of evidence strengthening this assignment: 1) significant expression coherence in at least one condition, 2) TFBS-containment in the promoter of gene i, 3) significant colocalization of the TF j with another TF where gene i is the binding target of both TFs, and 4) synergy of TF j with another TF where gene i is the binding target of both TFs. Since our method and the three published methods mentioned above are developed to do the same task, a performance comparison of these methods should be done. Since a TF has to bind to its regulatory targets in order to regulate their expressions, enrichment of the high-confidence TFBS among the identified regulatory targets of that TF can be used as a criterion for performance comparison. The high-confidence TFBS were downloaded from the MacIsaac et al’s paper,19 which were derived using six binding motif discovery methods, also including the requirement for conservation across at least two of the four related yeast species. The details of the performance comparison are as follows. Let S1 (S2, S3) be the set of regulatory targets of a TF that are identified by RETEA but not by MA-Network (TRIA, Garten et al’s method) and T1 (T2, T3) be the set of regulatory targets of a TF that are identified by MA-Network (TRIA, Garten et al’s method) but not by RETEA. We tested overrepresentation of the high-confidence TFBS in S and T for j = 1,2,3. The cumulative hypergeometric distribution is used to assign a P-value to the TFBS enrichment (see Appendix for details). Since only five TFs (Abf1, Fkh2, Mbp1, Mcm1 and Swi4) were investigated in both RETEA and MA-Network, we used these five TFs for performance comparison. We found that in all of the five (5/5) cases the high-confidence TFBS are enriched in S1 with P-value <0.001 but only three of the five (3/5) cases are enriched in T1 (see Table 3). This result shows that RETEA has a much better ability to identify the regulatory targets of a TF than does MA-Network. Similarly, as shown in Tables 4 and 5, RETEA is demonstrated to be better than TRIA (5/8 vs. 4/8) and Garten et al’s method (7/8 vs. 6/8) in extracting regulatory targets from the binding targets of a TF.
Table 3

Performance comparison of RETEA with MA-Networker using TFBS data.

TFExpectedObserved S1P-valueObserved T1P-value
Abf1870/622942/53<1.00E-1256/62<1.00E-12
Fkh2916/622913/247.13E-0613/191.59E-07
Mbp1792/622916/401.29E-057/173.20E-03
Mcm1148/62296/156.82E-0713/221.71E-11
Swi41731/622927/331.51E-1011/244.46E-02
Table 4

Performance comparison of RETEA with TRIA using TFBS data.

TFExpectedObserved S2P-valueObserved T2P-value
Abf1870/622942/513.36E-1256/668.56E-12
Cin5986/622910/231.50E-0314/379.59E-04
Fkh11431/622917/233.04E-0725/363.69E-09
Fkh2916/62296/112.37E-0312/353.02E-03
Rap1515/622920/411.20E-1123/363.54E-12
Swi41731/622913/205.62E-0412/202.51E-03
Swi52918/622913/264.48E-0116/232.35E-02
Swi62206/622933/482.47E-066/95.65E-02
Table 5

Performance comparison of RETEA with Garten et al’s method using TFBS data.

TFExpectedObserved S3P-valueObserved T3P-value
Cin5986/622915/348.35E-0521/472.52E-06
Fkh2916/622920/343.30E-099/161.33E-04
Mbp1792/622946/772.74E-122/135.06E-01
Mcm1148/62296/132.43E-0715/241.29E-11
Rap1515/622919/308.06E-1243/669.36E-12
Swi41731/622928/345.93E-1121/343.44E-05
Swi52918/622916/271.35E-0121/413.42E-01
Swi62206/622943/577.14E-1024/292.12E-07

Determination of the thresholds used in correlation and partial correlation analysis

The threshold Th1 is determined as follows. We compute the Pearson correlations of all possible gene pairs in the yeast genome to form a distribution of the expression correlation between two genes. Then the threshold Th1 is chosen as the correlation value that is at the top 1% of the distribution. Similarly, the threshold Th2 is determined as follows. We choose all the gene pairs whose correlations are larger than Th1. For each of these gene pair, we compute the partial correlation between the gene pair and each of the 203 TFs in yeast. Then we collect all the computed partial correlations to form a distribution. The threshold Th2 is chosen as the partial correlation value that is at the top 10% of the distribution. The threshold values are determined by the following procedure. We ran RETEA using 12 different settings of the correlation threshold (Th1) and partial correlation threshold (Th2) values. The result is summarized in Table 6. In the table, 9/11 means that for nine of the eleven cell cycle TFs the B+R+ is more enriched in the cell cycle-regulated genes (with P-value <0.005) than that of B+R−. It could be seen that RETEA performs well when Th1 is chosen at the top 1% of the correlation distribution no matter which Th2 is used. However, when Th1 is chosen at the top 3% or 5%, the performance of RETEA is bad. Therefore, we used Th1 (top 1%) and Th2 (top 10%) as the default parameter setting for RETEA.
Table 6

Performance comparison of RETEA using different correlation threshold (Th1) and partial correlation threshold (Th2) values.

Th1 (top 1%)Th1 (top 3%)Th1 (top 5%)
Th2 (top 5%)9/114/110/11
Th2 (top 10%)9/114/110/11
Th2 (top 15%)9/114/110/11
Th2 (top 20%)8/112/110/11

Factors that affect the performance of RETEA

Two kinds of factors can affect the performance of RETEA. The first kind is the threshold values used in RETEA. We tried 12 different settings of the correlation threshold (Th1) and partial correlation threshold (Th2) values and found a good one (Th1 at the top 1% and Th2 at the top 10%) that can make RETEA capable of extracting the plausible regulatory targets from the binding targets of 11 cell cycle TFs. The other kind of factors that affects RETEA is the protein activity profiles of TFs. Since the protein activity profiles of TFs are not available in the public domain, they need to be estimated by computational methods. In this study, the protein activity time profile of a TF is estimated by using the average of the gene expression time profiles of all the genes whose expressions are affected by the deletion of that TF (inferred from the mutant data). Our way to estimate the protein activity profiles of TFs may not be optimal and there is still much room for improvement. However, this issue will become minor when the experimental technology for measuring the protein activity profiles is developed in the future.

Applying RETEA to identify plausible regulatory targets of oxidative stress-response TFs

In this paper, RETEA is applied to identify regulatory targets of eleven cell cycle TFs. For showing the generality of RETEA, we demonstrated that RETEA also performs well for cell-cycle irrelevant regulators. In this regard, we applied RETEA to identify regulatory targets of TFs that are involved in the oxidative stress response. The genome-wide gene expression and ChIP-chip data under the oxidative stress were downloaded from Gasch et al’s paper22 and Harbison et al’s,15 respectively. Using GO term finder in SGD21 with FDR <0.05, we found that in most cases (8/11), the number of enriched common cellular processes in B+R− is larger than that in B+R− (see Fig. 3A). Besides, using GO term finder in SGD with FDR <0.05, we found that in most cases (9/11), the number of enriched common molecular functions in B+R− is larger than that in B+R− (see Fig. 3B). This result suggests that RETEA performed well not only for cell cycle TFs but also for cell cycle-irrelevant TFs.
Figure 3

Testing for the enrichment for the common cellular processes and common molecular functions in B+R+ and B+R− for eleven oxidative stress-response TFs.

Conclusions

In this study, an algorithm called RETEA is developed to identify the plausible regulatory targets of a TF from its binding targets. Since the binding of a TF to a gene does not necessarily imply regulation, algorithms like RETEA are needed in solving this ambiguity. We validated the effectiveness of RETEA by checking the enrichments for cell cycle-regulated genes, the common cellular processes and common molecular functions. Besides, the performance of RETEA was shown to be better than three published methods (MA-Network, TRIA, and Garten et al’s method). In addition, we showed that RETEA performed well not only for cell cycle TFs but also for cell cycle-irrelevant TFs. Taken together, we are confident that RETEA has the ability to find biologically relevant results and can be useful in systems biology study.
  22 in total

1.  Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle.

Authors:  Tata Pramila; Shawna Miles; Debraj GuhaThakurta; Dave Jemiolo; Linda L Breeden
Journal:  Genes Dev       Date:  2002-12-01       Impact factor: 11.361

2.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.

Authors:  Eran Segal; Michael Shapira; Aviv Regev; Dana Pe'er; David Botstein; Daphne Koller; Nir Friedman
Journal:  Nat Genet       Date:  2003-06       Impact factor: 38.330

3.  Discovery of meaningful associations in genomic data using partial correlation coefficients.

Authors:  Alberto de la Fuente; Nan Bing; Ina Hoeschele; Pedro Mendes
Journal:  Bioinformatics       Date:  2004-07-29       Impact factor: 6.937

4.  Transcriptional regulatory code of a eukaryotic genome.

Authors:  Christopher T Harbison; D Benjamin Gordon; Tong Ihn Lee; Nicola J Rinaldi; Kenzie D Macisaac; Timothy W Danford; Nancy M Hannett; Jean-Bosco Tagne; David B Reynolds; Jane Yoo; Ezra G Jennings; Julia Zeitlinger; Dmitry K Pokholok; Manolis Kellis; P Alex Rolfe; Ken T Takusagawa; Eric S Lander; David K Gifford; Ernest Fraenkel; Richard A Young
Journal:  Nature       Date:  2004-09-02       Impact factor: 49.962

5.  Transcriptional regulatory networks in Saccharomyces cerevisiae.

Authors:  Tong Ihn Lee; Nicola J Rinaldi; François Robert; Duncan T Odom; Ziv Bar-Joseph; Georg K Gerber; Nancy M Hannett; Christopher T Harbison; Craig M Thompson; Itamar Simon; Julia Zeitlinger; Ezra G Jennings; Heather L Murray; D Benjamin Gordon; Bing Ren; John J Wyrick; Jean-Bosco Tagne; Thomas L Volkert; Ernest Fraenkel; David K Gifford; Richard A Young
Journal:  Science       Date:  2002-10-25       Impact factor: 47.728

6.  Genetic reconstruction of a functional transcriptional regulatory network.

Authors:  Zhanzhi Hu; Patrick J Killion; Vishwanath R Iyer
Journal:  Nat Genet       Date:  2007-04-08       Impact factor: 38.330

7.  Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle.

Authors:  Hong-Chu Chen; Hsiao-Ching Lee; Tsai-Yun Lin; Wen-Hsiung Li; Bor-Sen Chen
Journal:  Bioinformatics       Date:  2004-03-25       Impact factor: 6.937

8.  Extraction of transcription regulatory signals from genome-wide DNA-protein interaction data.

Authors:  Yael Garten; Shai Kaplan; Yitzhak Pilpel
Journal:  Nucleic Acids Res       Date:  2005-01-31       Impact factor: 16.971

9.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae.

Authors:  Kenzie D MacIsaac; Ting Wang; D Benjamin Gordon; David K Gifford; Gary D Stormo; Ernest Fraenkel
Journal:  BMC Bioinformatics       Date:  2006-03-07       Impact factor: 3.169

10.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data.

Authors:  Feng Gao; Barrett C Foat; Harmen J Bussemaker
Journal:  BMC Bioinformatics       Date:  2004-03-18       Impact factor: 3.169

View more
  3 in total

1.  Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions.

Authors:  Eric L Van Nostrand; Stuart K Kim
Journal:  Genome Res       Date:  2013-03-26       Impact factor: 9.043

2.  Roles of the developmental regulator unc-62/Homothorax in limiting longevity in Caenorhabditis elegans.

Authors:  Eric L Van Nostrand; Adolfo Sánchez-Blanco; Beijing Wu; Andy Nguyen; Stuart K Kim
Journal:  PLoS Genet       Date:  2013-02-28       Impact factor: 5.917

3.  Functional redundancy of transcription factors explains why most binding targets of a transcription factor are not affected when the transcription factor is knocked out.

Authors:  Wei-Sheng Wu; Fu-Jou Lai
Journal:  BMC Syst Biol       Date:  2015-12-09
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.