| Literature DB >> 19073590 |
Helge G Roider1, Thomas Manke, Sean O'Keeffe, Martin Vingron, Stefan A Haas.
Abstract
MOTIVATION: A major challenge in regulatory genomics is the identification of associations between functional categories of genes (e.g. tissues, metabolic pathways) and their regulating transcription factors (TFs). While, for a limited number of categories, the regulating TFs are already known, still for many functional categories the responsible factors remain to be elucidated.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19073590 PMCID: PMC2642637 DOI: 10.1093/bioinformatics/btn627
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The PASTAA workflow.
Fig. 2.Cut-off space for the hypergeometric test. (A) The −log hypergeometric P-values (indicated by colour) for ABF1_01 and the Abf1 in vitro dataset depending on the cut-off combination employed for the predicted affinity and PBM binding values. The most significant target enrichment (P-value 7.3 × 10−253) is found when using the top 800 genes according to PBM and top 900 genes according to affinity. The steepest increase in −log P-values is found at the origin of the plot. (B) Same analysis as in (A) but for the factor PHO4_01 and the Pho4 ChIP–chip dataset (phosphate-deprived condition). According to the fact that Pho4 has far less targets than Abf1 an optimal hypergeometric P-value of 7.9 × 10−20 is found when using only the top 300 genes according to ChIP–chip data and top 100 genes according to affinity.
Top associated PFMs for the HNF and MYC target gene sets
Top ranking PFMs for the HNF1, HNF4 and the HNF6 ChIP–chip datasets and the cMYC ChIP–PET dataset. Matching PFMs are indicated in red. Matrices for E2F, a co-regulator of MYC genes, are indicated in yellow.
Result for tissues with Known TF associations
Top ranking PFMs according to PASTAA and three alternative approaches. Predictions corresponding to experimentally characterized TF–tissue associations are shown in red. Associations in blue correspond to matrices for the general factor SP1 and the basal TATA box. The last two columns indicate PASTAA's association scores as well as the corresponding resampling P-values.
*JASPAR matrices (Sandelin et al., 2004) used only by PAP.
Fig. 3.TFs are over-expressed in their top ranking tissues. Height of bins indicates the number of TFs expressed in the associated tissue of given rank based on the real sequence data (dark blue) or on the results obtained from 10 random sequence sets (light blue). Error bars show the 95% confidence interval for the results obtained from the 10 random sequence sets. Tissues top ranking for a given TF express the factor more often than expected, while bottom ranking tissues express the TF equally or less often than expected. The enrichment is particularly significant for the first three bins corresponding to all three top ranking TF–tissue associations (P-value of enrichment for bins 1–3 combined: 2.2 × 10−12). The general trend in the light blue bins indicates the technical bias caused by the different number of ESTs in each tissue category.
Top ranking tissues for a selected group of PFMs
Associations supported extensively by literature or by specific expression of the TF in the respective tissue are indicated in yellow and red, respectively.