| Literature DB >> 17941998 |
Randy Z Wu1, Christina Chaivorapol, Jiashun Zheng, Hao Li, Shoudan Liang.
Abstract
BACKGROUND: The precision of transcriptional regulation is made possible by the specificity of physical interactions between transcription factors and their cognate binding sites on DNA. A major challenge is to decipher transcription factor binding sites from sequence and functional genomic data using computational means. While current methods can detect strong binding sites, they are less sensitive to degenerate motifs.Entities:
Mesh:
Year: 2007 PMID: 17941998 PMCID: PMC2174516 DOI: 10.1186/1471-2105-8-399
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The fREDUCE algorithm. A set of possible IUPAC strings are generated from the input sequence. For each IUPAC string, we compute a pseudo-Pearson coefficient, which is an estimate and upper bound on the true Pearson coefficient. After the vast majority of motifs are filtered out using the pseudo-Pearson value, we then compute true Pearson coefficients for the remaining motifs and select the top motif. The residual expression value is then used to iteratively derive subsequent motifs.
fREDUCE motif predictions from yeast ChIP-chip
| rTCAyt....Acg | YPD | rTGATm | 22.4 | √ | √ | |
| tGCTGGT | YPD | kGCTGGy | 6.2 | √ | ||
| GGGTGy | H2O2Lo | rGGTGy | 91.5 | √ | √ | |
| YwTTkcKkTyyckgykky | YPD | mTTTTw | 14.8 | |||
| TGACTC | YPD | TGACTCCG | 37.2 | √ | √ | |
| mTTAsTmAkC | YPD | GmTTAsTA | 4.2 | √ | √ | |
| tCACGTG | YPD | CACGTG | 90.7 | √ | √ | |
| TTAygTAA | YPD | TTAyrTAA | 59.4 | √ | √ | |
| GATAAGa | RAPA | GATAAG | 9.4 | √ | ||
| TgAAAca | YPD | TGAAACA | 18 | √ | ||
| rTGTayGGrtg | YPD | GTAyGGrT | 141.2 | √ | √ | |
| tTgTTTac | YPD | yTGTTkAC | 28.8 | √ | ||
| aaa.GTAAACAa | YPD | GTAAACA | 23.7 | √ | √ | |
| CGG...........cCg | YPD | TTCGGAGC | 4.9 | √ | ||
| aGATAAG | RAPA | GATAAG | 13.3 | √ | ||
| TGAsTCa | YPD | rTGAsTCA | 166.7 | √ | √ | |
| GATAAGa.a | RAPA | GATAAG | 38.2 | √ | ||
| GGmraTA.CGs | YPD | kTTATCGG | 60.3 | √ | √ | |
| g.CcAAtcA | YPD | CCAATsAr | 21.7 | √ | √ | |
| TTCya.....TTC | H2O2Hi | TTCyrGAA | 109.5 | √ | √ | |
| H2O2Hi | ||||||
| CAcaTGc | YPD | kCACATGC | 12.8 | √ | ||
| CATGTGaaaa | YPD | CAyrTG | 89.2 | √ | √ | |
| cCGgtacCGG | YPD | CGGkACCG | 10.8 | √ | √ | |
| rACGCGt | YPD | ACGCGT | 126.9 | √ | √ | |
| tttCC.rAt..gg | Alpha | yTTCCTAA | 5.7 | √ | ||
| RMmAwsTGKSgyGsc | SM | CrCGyG | 14.8 | |||
| mAGGGGsgg | H2O2Hi | rGGGGy | 20.8 | √ | ||
| tt.CC.rAw..GG | YPD | CTCGAGGC | 12.3 | √ | ||
| GGaCCCT | YPD | AGGGTCs | 11.3 | √ | √ | |
| ccGCCgRAwra | YPD | CCrwACAT | 11.4 | |||
| sc.GC.gg | YPD | mTGCAk | 21.1 | √ | ||
| SGTGCGsygyG | Pi- | |||||
| CACGTGs | Pi- | sCACGTGs | 14.1 | √ | ||
| tGyayGGrtg | SM | GyrTGGGT | 57.1 | √ | √ | |
| ggGTGca.t | H2O2Lo | GGGTGCA | 43.6 | √ | √ | |
| kCGGCCGa | H2O2Hi | TCCGCGG | 35.6 | √ | ||
| CGGGTAA | YPD | CGGGTAAy | 136.7 | √ | √ | |
| TTgccATggCAAC | YPD | GTCGTCCG | 3.2 | √ | ||
| ATTTTCttCwTt | YPD | |||||
| TTTGCCACC | H2O2Lo | TyGCCACC | 109.8 | √ | √ | |
| ayCcrtACay | SM | yCCrTACA | 31.6 | √ | √ | |
| ArGmAwCrAmAA | H2O2Hi | |||||
| CGG.y.AATGGrr | SM | CTCGGCCC | 58.4 | |||
| G.C..GsCs | H2O2Lo | GsCyGGCC | 37.7 | √ | ||
| yGGCGCTAyca | YPD | GrTAGCGC | 96.1 | √ | √ | |
| tGCAg..a | BUT14 | GGTrCAGA | 5.6 | |||
| ymtGTmTytAw | YPD | TkyATA | 6.2 | |||
| rAAATsaA | YPD | wTkAAA | 25.1 | |||
| rracGCsAaa | YPD | wCGCGT | 4 | √ | ||
| TCGg..CGA | YPD | CGGryCGA | 7.1 | √ | √ | |
| CGGwstTAta | YPD | CGGkGwTA | 24 | √ | ||
| tgAAACa | YPD | TGAAACA | 38.9 | √ | √ | |
| gyGwCAswaaw | YPD | GyGTCAs | 25.0 | √ | √ | |
| gcsGsg..sG | YPD | wCkCCG | 49.8 | |||
| raCgCsAAA | YPD | CGCsAAAA | 12.6 | √ | √ | |
| tttcGCGt | YPD | TTTCsk | 11.6 | √ | ||
| rrGAATG | YPD | rrGAATGT | 22.4 | √ | ||
| gmAAcy.twAgA | Thi- | GGAAACyS | 4.5 | √ | ||
| tCACGTGAy | YPD | TCACGTGr | 70.8 | √ | √ | |
| taGCCGCCsa | YPD | GCsGCy | 154.3 | √ | √ | |
| TTaGTmAGc | YPD | mTkACTAA | 13.6 | √ | √ | |
| mTkAsTmAk | H2O2Hi | mTTAsTAA | 121.9 | √ | √ | |
| ttTACCCGGm | YPD | CCGGGTAA | 23.2 | √ | √ | |
| ACCCTmAAGGTyrT | YPD | wAyATT | 16.5 |
fREDUCE predictions from 65 yeast ChIP-chip experiments of Harbison et. al. Check marks (√) indicate that fREDUCE matched the IUPAC string corresponding to the benchmark logo. The results of a similar analysis with AlignACE is given in the right column.
Figure 2Comparison of fREDUCE to six other algorithms on 65 yeast ChIP-chip benchmarks. AlignACE* indicates results of running AlignACE from scratch, while the performance of other methods were compiled from the Harbison et. al supporting website.
Figure 3fREDUCE predictions in comparison to non-degenerate predictions made by REDUCE. Benchmark logos and their corresponding motifs are shown for reference. P-values are shown as -log10 values.
fREDUCEpredictions for regulators with poorly characterized specificities
| TTYTCY | 34.3 | CYNYYAANKRMAR | |
| AYTTKA | 9.1 | ||
| TYAAWA | 7.0 | ||
| WTTSAA | 16.7 | ||
| GCRSCC | 16.2 | TCGTATA | |
| TWTTSA | 8.4 | ||
| WTRAAG | 11.3 | ||
| CCTSGGC | 15.2 | ||
| TTCAWW | 5.0 | CTTCC | |
| WTTRAA | 14.7 | ||
| WTTRAA | 22.0 | ACGCTAAA | |
| YACACAC | 17.8 | ||
| CCASSG | 11.6 | ||
| GCRCAS | 13.8 | ||
| WTTCAA | 8.2 | ||
| TTTRAY | 5.9 | ||
| MMCCCA | 3.8 | ||
| CGCASY | 4.9 | CGGNNNTNAN9–12CCG | |
| CSGSCC | 27.1 | ||
| ATYTRA | 10.3 | ||
| WTCAAW | 7.6 | ||
| WTGWAG | 3.9 | ||
| CAAGGYC | 3.1 | ||
| TATSAW | 5.6 | ||
| AARMTT | 24.1 | ||
| RCACMC | 20.7 | ||
| MATSAA | 4.5 | ||
| TYAAGW | 6.6 | ||
| WATAYT | 16.8 | ||
| AWTGAW | 3.5 | ||
| AKYACT | 3.9 | ||
| CAARTW | 3.1 | ||
| WTCAAK | 3.6 | ||
| TTYAAW | 4.6 | ||
| WGTTRA | 6.3 | ||
| KTTMAA | 7.2 | ||
| WCAAMT | 3.7 | ||
| TCAARTA | 2.4 | ||
| WTCAAW | 10.3 |
We searched the literature for evidence supporting our motif predictions and the matching examples are highlighted. *The annotated motifs for Rgt1p.
Figure 4fREDUCE elicitation of the HNF-4 binding site from human hepatocyte expression data.