| Literature DB >> 16845065 |
Matti Kankainen1, Petri Pehkonen, Päivi Rosenstöm, Petri Törönen, Garry Wong, Liisa Holm.
Abstract
We present POXO, a comprehensive tool series to discover transcription factor binding sites from co-expressed genes (www.bioinfo.biocenter.helsinki.fi/poxo). POXO manages tasks such as functional evaluation and grouping of genes, sequence retrieval, pattern discovery and pattern verification. It also allows users to tailor analytical pipelines from these tools, with single mouse clicks. One typical pipeline of POXO begins by examining the biological functions that a set of co-expressed genes are involved in. In this examination, the functional coherence of the gene set is evaluated and representative functions are associated with the gene set. This examination can also be used to group genes into functionally similar subsets, if several biological processes are affected in the experiment. The next step in the pipeline is then to discover over-represented nucleotide patterns from the upstream sequences of the selected gene sets. This enables to investigate the possibility that the genes are co-regulated by common cis-elements. If over-represented patterns are found, similar ones can then be clustered together and be verified. The performance of POXO is demonstrated by analysing expression data from pathogen treated Arabidopsis thaliana. In this example, POXO detected activated gene sets and suggested transcription factors responsible for their regulation.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845065 PMCID: PMC1538773 DOI: 10.1093/nar/gkl296
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Different tools in POXO. The computational pipeline and the tools used here to analyze the experimental data are highlighted by light blue arrows and boxes. Gray boxes and arrows indicate other available tools in POXO.
The seven clustered patterns over-represented in the activated response to stimulus (GO:0050896) gene set and under-represented in the repressed response to stimulus gene set
| Pattern | Activated genes | Repressed genes | min | Ac | |||
|---|---|---|---|---|---|---|---|
| occ | pro | occ | pro | ||||
| TGGAAd/TTCCA | 651 | 139 | 508 | 139 | 5E−05 | 5E−05 | S000403 |
| GGAAAANG/CNTTTTCC | 62 | 50 | 23 | 23 | 2E−04 | 2E−04 | S000453 |
| CATNNCGG/CCGNNATG | 36 | 34 | 9 | 9 | 3E−04 | 3E−04 | S000250 |
| TGCGANNC/GNNTCGCA | 38 | 36 | 11 | 11 | 4E−04 | 4E−04 | |
| GTVATCCT/AGGATBAC | 26 | 23 | 5 | 4 | 1E−03 | 1E−04 | S000470 |
| TNCNAGG/CCTNGNA | 200 | 100 | 127 | 86 | 7E−04 | 7E−04 | |
| CTGAGGAA/TTCCTCAG | 14 | 14 | 1 | 1 | 3E−03 | 3E−04 | S000473 |
In the table occ is the pattern occurrence, pro is the number of promoters with the pattern, P-value is the significance of the clustered pattern, min P is the minimum P-value of the original patterns used in clustering and Ac is the accession code of the best match found in PLACE (25).
The six clustered patterns under-represented in the activated response to stimulus gene set and over-represented in the repressed response to stimulus gene set
| Pattern | Activated genes | Repressed genes | min | Ac | |||
|---|---|---|---|---|---|---|---|
| occ | pro | occ | pro | ||||
| TNGGTCC/GGACCNA | 34 | 27 | 79 | 62 | 4E−04 | 4E−04 | S000360 |
| CTTTGCNT/ANGCAAAG | 21 | 19 | 64 | 47 | 5E−04 | 5E−04 | S000354 |
| GNANTATA/TATANTNC | 144 | 84 | 229 | 107 | 5E−04 | 5E−04 | |
| TGTGATTGG/CCAATCACA | 3 | 3 | 14 | 13 | 1E−02 | 1E−04 | S000143 |
| CAWTKATTG/CAATMAWTG | 18 | 18 | 26 | 23 | 5E−01 | 3E−04 | S000371 |
| TTTTGTCAC/GTGACAAAA | 3 | 3 | 5 | 6 | 1E+00 | 7E−05 | S000337 |
Notation as in Table 1.