| Literature DB >> 17962303 |
Larry N Singh1, Li-San Wang, Sridhar Hannenhalli.
Abstract
A transcriptional module (TM) is a collection of transcription factors (TF) that as a group, co-regulate multiple, functionally related genes. The task of identifying TMs poses an important biological challenge. Since TFs belong to evolutionarily and structurally related families, TF family members often bind to similar DNA motifs and can confound sequence-based approaches to TM identification. A previous approach to TM detection addresses this issue by pre-selecting a single representative from each TF family. One problem with this approach is that closely related transcription factors can still target sufficiently distinct genes in a biologically meaningful way, and thus, pre-selecting a single family representative may in principle miss certain TMs. Here we report a method-TREMOR (Transcriptional Regulatory Module Retriever). This method uses the Mahalanobis distance to assess the validity of a TM and automatically incorporates the inter-TF binding similarity without resorting to pre-selecting family representatives. The application of TREMOR on human muscle-specific, liver-specific and cell-cycle-related genes reveals TFs and TMs that were validated from literature and also reveals additional related genes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17962303 PMCID: PMC2189735 DOI: 10.1093/nar/gkm885
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Procedure for computing the covariance of the percentile scores of two TFs (see Methods section).
Figure 2.A single iteration of the method for computing TMs. Starting with single PWMs (TM of size 1), in each iteration, the top scoring TMs are retained and all extensions are assessed in the next iteration. DP and DN refer to distances from vector V0 of positive and negative vectors. A Mann–Whitney test is performed with the null hypothesis that median (DP) ≤ median (DN).
Summary of the results for human cell cycle data
| TM-1 | TM-2 | TM-3,4 | ||
|---|---|---|---|---|
| TREMOR | • 41 significant TFs detected | • 98 TM-2s detected. | • 63 TM-3s detected. | |
| • 15 TFs have | • 35 include E2F and 36 include NF-Y. | • All include NF-Y and 15 include E2F. | ||
| • NF-Y is at rank 1, E2F at rank 2 and CREB at rank 12 | • Rank-1 TM-2—(NF-Y c- Myc:Max). These TFs form a complex in cell cycle regulation. | • 158 TM-4s detected. | ||
| • Of the remaining 12, 7 have evidence of potential cell cycle involvement: HOXA7 ( | • Rank-3 TM-2—(NF-Y E2F).These TFs are are known to interact (42). | • Of these, 150 include NF-Y, and 71 include both E2F and NF-Y. | ||
| • (NF-Y AP-2) has a | ||||
| OPOSSUM2 | TM-2 | TM-3 | ||
| • 20 significant ( | oPOSSUM2 detected 33 size-3 TMs (64 by TREMOR). Surprisingly the | |||
| • Only two include E2F, in combination with NF-Y and Gfi. Both combinations are detected by TREMOR. | three cell cycle regulators—E2F, NF-Y or CREB—were not included in any of the TMs returned by oPOSSUM2, while a majority of TREMOR TM-3s include these key TFs. | |||
| • (CREB E2F) not detected. | ||||
| • Only two TM-2s include NF-Y. | ||||
| • In general, oPOSSUM detects fewer and more varied TMs while TREMOR TMs revolve around primary TFs- E2F, NF-Y and CREB. | ||||
| CREME | • On our gene set, CREME yielded a total of two TM-3s—(AREB6, STAT4, TCF1P) and (SRY, STAT4, TCF1P, EGR). These do not include the three well-known cell cycle TFs—E2F, CREB and NFY. | |||
| • CREME was applied to a slightly different cell cycle data in the original paper ( | ||||
For oPOSSUM2 and CREME, the results are compared to TREMOR. Unless otherwise specified, for TREMOR we only mention the TMs whose P-values were lower than the lowest P-value for the corresponding randomly permuted set (see text).
Figure 3.Distributions (probability density functions) of expression coherence (at 95th percentile threshold; see Methods section) in cell cycle data. The plot (green) on the left based on random gene set provides a base line. The plot (black) to the right is for the top 100 target genes identified by each of the 158 significant cell cycle TMs of size 4. The blue plot in the middle is based on the target genes identified by the TMs using Euclidian distance.
Summary of the results for human liver expression data
| TM-1 | TM-2 | TM-3,4,5 | |
|---|---|---|---|
| TREMOR | • 17 TM-1s detected—HNF-1, HNF-3alpha, STAT5A, GFI1B, IRF1, MEIS1A, AIRE, NF-AT, Pbx1b, HNF-4alpha1, POU3F2, NF-Y, MEF-2, AFP1, SRF, C/EBP, TCF-4. | • 31 TM-2s detected (however, not significant after multiple testing correction) | • 217, 224 and 266 TM-3s, TM-4s and TM-5s |
| • Well-known liver TFs HNF1-3,4 and C/EBP are included among these. | • 1 involved HNF-3 and 25 involved HNF-1. | • 185, 222 and 266 involved HNF. | |
| • 8 of the remaining 11 TFs have evidence of involvement in transcription in liver—STAT5 ( | • Rank-1 TM-2 was (HNF-1 TATA). These TFs are known to interact ( | ||
| • TM-2s at rank 2, (HNF-1 HNF-4), and at rank 4, (HNF-1 NF-Y) are supported ( | |||
| OPOSSUM2 | TM-2 | TM-3 | |
| • 3 TM-2s detected | • 31 TM-3s detected TMs, but none of them include FORKHEAD TFs, while 188 of 234 TREMOR TM-3s include FORKHEAD factors. | ||
| • These combine zinc finger TF X2H2 with FoxA2, FoxD3 and FoxI1. | |||
| • TREMOR detects many more TMs but 81 of the 226 include FORKHEAD factors. | |||
| CREME | Even at the least stringent settings, namely, using the lowest matrix score threshold = 0.8 and the largest module length = 500 bp, as well as requiring only two TFs in the TM, CREME did not yield any TMs. | ||
For oPOSSUM2 and CREME, the results are compared to TREMOR. Unless otherwise specified, for TREMOR we only mention the TMs whose P-values were lower than the lowest P-value for the corresponding randomly permuted set (see text).
Summary of the results for human muscle expression data
| TM-1 | TM-2 | TM-3,4,5 | |
|---|---|---|---|
| TREMOR | • 19 TM-1s detected. | • 87 TM-2s detected | • 218 TM-3s detected. SRF, MEF-2 and MyoD are included in 119, 41 and 19 |
| • The top 2 TM-2s were SRF and MEF-2. | • 21 included SRF, 11 included MEF-2 and 7 included MyoD. | • Most core muscle TFs tend to group with non-core TFs in a TM. | |
| • MyoD was at rank 4. | • Rank-1 TM-2 is (SRF MEF-2). | • Rank 16 TM-3 is (SRF MEF-2 SMAD). | |
| • We did not detect SP1, probably because it is a ubiquitous signal. TEF-1 had a | • Rank-8 TM-2includes the two SRF PWMs (see text). | • Rank 28 TM-3 is (SRF MEF-2 MyoD). | |
| • Seven of the remaining 14 TFs have evidence of involvement in muscle gene regulation: SMAD ( | • TEF-1 is part of a significant TM-2 SRF and with SMAD. | • 244 TM-4s detected. SRF, MEF-2 and MyoD are part of 177, 75 and 17 TM-3s. | |
| • Additional factor RREB was also detected previously by oPOSSUM2 in a slightly different dataset ( | • 283 TM-5s detected. SRF is part of 232. | ||
| • TEF-1 yielded a | |||
| OPOSSUM2 | TM-2 | TM-3 | |
| • 293 TM-2s detected. SRF, MEF-2 and Myf (same as MyoD) were part of 37, 42 and 37. Compare with TREMOR above. | • In our dataset, 764 TM-3s were detected TMs. | ||
| • See text for additional comparative discussion | • In the original oPPOSSUM2 publication, using three different muscle datasets, the authors have reported top 5 TM-2 in each dataset. Greater than half of these were detected by TREMOR, notably (YY1 SRF), (YY1 Myf), (SRF E47) and (SRF MEF-2). | ||
| CREME | With the default setting, CREME did not yield any TMs. However, at the least stringent setting it detected 1 TM-2—(SRF SRF). TREMOR also detected a size-2 TM with two SRF motifs, in addition to several others. | ||
For oPOSSUM2 and CREME, the results are compared to TREMOR. Unless otherwise specified, for TREMOR we only mention the TMs whose P-values were lower than the lowest P-value for the corresponding randomly permuted set (see text).