| Literature DB >> 19040743 |
Abstract
BACKGROUND: Computational methods for characterizing novel transcription factor binding sites search for sequence patterns or "motifs" that appear repeatedly in genomic regions of interest. Correlation-based motif finding strategies are used to identify motifs that correlate with expression data and do not rely on promoter sequences from a pre-determined set of genes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19040743 PMCID: PMC2626603 DOI: 10.1186/1471-2105-9-506
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of results comparing c-REDUCE and f-REDUCE oligo predictions for ChIP-chip yeast data (part 1).
| GGRGK/MCYCC | ||||
| TTACTAA/TTAGTAA | 4: WCHAA 12.2 | |||
| YTGACT/AGTCAR | ||||
| GATAA/TTATC | 4: TTAKM 2.1 | |||
| AAAAGCCGCGGGCGGATT/ | ||||
| AATCCCGCCCGCGGCTTTT | ||||
| 2: RCGGC 13.5 | ||||
| 3: AAAAR 9.0 | ||||
| CGGN(11)CCG | ||||
| GGCTCCWC/ | ||||
| GWGGAGCC | ||||
| GATAAGATAAG/ | ||||
| CTTATCTTATC | ||||
| KGMCAGCGTGTC/ | 3: AYACK 4.4 | |||
| GACACGCTGKCM | ||||
| CCAAT/ATTGG | 1: ATTGGY 3.0 | 5: CCAATCA 6.7 | ||
| 1: CCAATCA 17.2 | ||||
| 4: ATTGGY 3.9 | ||||
| CCAAT/ATTGG | 2: YCAAD 4.8 | |||
| CCAAT/ATTGG | 1: ATTGGY 7.6 | |||
| 2: YCAAD 4.1 | ||||
| 3: CCAATCA 5.0 | ||||
| GAGCAAA/TTTGCTC | 1: GAGCAAA 26.6 | 4: GAGCAAA 3.5 | ||
| 2: TTTGCTC 16.4 | ||||
| 1: GAGCAAA 12.3 | ||||
| 2: TTTGCTC 6.4 | ||||
| AAACTGTGG/ | 4: CACAGT 2.7 | |||
| CCACAGTTT | ||||
| AAACTGTGG/ | 1: CACAGTT 10.2 | |||
| CCACAGTTT | 1: CACAGTT 32.3 | |||
| 4: AACTGTG 8.3 | ||||
| YAGGYA/TRCCTR | ||||
| MAGGGG/CCCCTK | ||||
| 1: CCCCT 16.7 | ||||
| 3: AAGGGG 4.1 | ||||
| TCGAAYC/GRTTCGA | 2: GTTCGA 6.2 | |||
| TCCGCGGA/TCCGCGGA | ||||
The 37 transcription factor (TF) motifs not discovered by the methods applied in [20] are listed in Tables 1 and 2. Only exact matches to the motifs (see Methods) are considered. The first and second columns list the transcription factor and known motif given in the Supplementary Table 3 file from [20]. The third column lists the environmental conditions examined (YPD: Rich medium, HEAT: Elevated temperature, SM: Amino acid starvation, H2O2Hi: Highly hyperoxic, H2O2Lo: Moderately hyperoxic, BUT14: Filamentation inducing, RAPA: Nutrient deprived and GAL: Galatose medium). The fourth and fifth columns list the results for c-REDUCE and f-REDUCE respectively. For example, "1: ATTGGY 3.0" for HAP2, indicates that the oligo ATTGGY was the first predicted oligo with a -log10(p-value) of 3.0 under the YPD condition. The degenerate symbols are R = (A, G), Y = (C, T), M = (A, C), K = (G, T), S = (C, G), W = (A, T), B = (C, G, T), D = (A, G, T), H = (A, C, T), V = (A, C, G) and N = (A, C, G, T).
Summary of results comparing c-REDUCE and f-REDUCE oligo predictions for ChIP-chip yeast data (part 2).
| CGGN(11)CCG | 1: GVVCG 35.2 | |||
| 2: CVCVG 15.1 | ||||
| CGGANNA/TNNTCCG | 2: CCHCV 10.8 | |||
| TCGGAAG/CTTCCGA | ||||
| CTAWWWWTAG/ | 1: TATTT 11.8 | 1: DTTWA 24.4 | ||
| CTAWWWWTAG | 2: AARAW 7.5 | 2: AAVHTA 10.9 | ||
| 3: RAWTT 5.7 | ||||
| 5: TTTYY 4.0 | ||||
| YSYATTGTT/AACAATRSR | ||||
| CCCCTTAAGG/ | ||||
| CCTTAAGGGG | ||||
| 1: CCCCT 16.1 | ||||
| 3: AAGGGG 3.8 | ||||
| GGTCAC/GTGACC | ||||
| 4: RTGAC 2.6 | ||||
| ACGTCA/TGACGT | 2: ACGTCAT 5.1 | 1: VCGBC 19.0 | ||
| 4: ATGACGT 3.1 | ||||
| ACTACTAWWWWTAG/ | 3: TTAATAG 6.3 | 2: TTTHA 9.4 | ||
| CTAWWWWTAGTAGT | ||||
| RCGGCNNNRCGGC/ | ||||
| GCCGYNNNNGCCGY | 2: CGGCAY 3.0 | |||
| 4: TMAGR 2.7 | ||||
| 5: RCGGY 2.2 | ||||
| KGCTGR/YCAGCM | 3: TGGCTGG 2.5 | 2: CBDGC 4.7 | ||
| 3: GKSTG 1.4 | ||||
| CCGNNNNCGG | ||||
| CTTCGAG/CTCGAAG | ||||
| 1: TCGAG 27.9 | ||||
| 5: TCGAR 7.3 | ||||
| TTACTAA/TTAGTAA | ||||
| TTACTAA/TTAGTAA | ||||
| TTACTAA/TTAGTAA | ||||
| TAATTG/CAATTA | ||||
| YAATA/TATTR | ||||
See Table 1 for details.
Summary of results on ChIP-chip yeast data.
| Harbison | c-REDUCE (exact) | c-REDUCE (1 MM/S) | |
| Recovered (44) | 44 | 38 | 40 |
| Not-recovered (37) | 0 | 18 | 24 |
| Total (81) | 44 (~54%) | 53 (~65%) | 64 (~79%) |
For each transcription factor motif, c-REDUCE results are listed for the complete Harbison et al. [20] data using either exact matches or at most one mismatch or shifted match "1 MM/S" (see Methods).
Summary of comparisons of c-REDUCE (1 MM/S) with other programs on 37 "not-recovered" cases.
| c-REDUCE | PhyloGibbs [ | |||
| Total (21) | 16 | 16 | ||
| Total (15) | c-REDUCE | Tree Gibbs Sampler [ | ||
| True Positives | 11 | 8 | ||
| False Positives | 3/14 (21.4%) | 5/13 (38.5%) | ||
| c-REDUCE | PhyloCon [ | Converge [ | PhyloCon & Converge [ | |
| Total (35) | 22 | 9 | 14 | 20 |
The sub-tables list comparisons between c-REDUCE and several other methods. The total number of transcription factor datasets evaluated (Total) is not the same in each sub-table because results are not always reported for the complete 37 "Not-recovered" set. For Tree Gibbs Sampler, the authors report all motif predictions and the false positive rates can be compared with c-REDUCE. For that sub-table, "True positives" indicates the number of correct predictions and "False positives" indicates the number of incorrect predictions out of all predictions.
c-REDUCE results on Dorsal expression study.
| Experiment | Oligo | Rank | -log10(p-value) |
| high vs none | ATRT | 1 | 20.1 |
| 2 | 8.4 | ||
| 3 | 6.5 | ||
| high vs low | ATRT | 1 | 43.1 |
| 2 | 33.7 | ||
| low vs none | 1 | 5.5 | |
| 4 | 2.7 | ||
The first column shows the pair-wise mutant comparisons (see text). Columns 2–4 list predicted oligos that match the Dorsal motif, their rank and -log10(p-value) respectively. Positions in bold indicate matches to the flanking GGG/CCC or KGG/CCM part of the consensus Dorsal motifs (GGGWWWWCCM or GGGWDWWWCCM [25]).
Examples of consensus construction.
| Position | |||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| Scer | A | A | G | C | A | A | - |
| Spar | A | A | G | N | N | - | T |
| Smik | A | A | - | N | N | * | * |
| Skud | A | T | - | N | - | * | * |
| Sbay | A | T | - | N | - | * | * |
| Consensus | |||||||
Sequences (rows) are from the yeast species, S. cerevisiae (Scer), S. paradoxus (Spar), S. mikatae (Smik), S. kudriavzevii (Skud) and S. bayanus (Sbay). An asterisk indicates that there are no sequences from that species that could be aligned.