| Literature DB >> 17927813 |
Valentina Boeva1, Julien Clément, Mireille Régnier, Mikhail A Roytberg, Vsevolod J Makeev.
Abstract
BACKGROUND: cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap.Entities:
Year: 2007 PMID: 17927813 PMCID: PMC2174486 DOI: 10.1186/1748-7188-2-13
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Tree . Tree for the set = {AAA, AAC, ACA, ACC, CCT}. Dashed colored links represent δ function for internal node (5) – in red, and for marked node (7) corresponding to the word AAC ∈ – in purple.
Values of δ function for the set = {aaa, aac, aca, acc, cct}.
| A | C | G | T | |
| 0 | 1 | 2 | 0 | 0 |
| 1 | 3 | 4 | 0 | 0 |
| 2 | 1 | 5 | 0 | 0 |
| 3 | 6 | 7 | 0 | 0 |
| 4 | 8 | 9 | 0 | 0 |
| 5 | 1 | 5 | 0 | 10 |
| 6 | 6 | 7 | 0 | 0 |
| 7 | 8 | 9 | 0 | 0 |
| 8 | 3 | 4 | 0 | 0 |
| 9 | 1 | 5 | 0 | 10 |
| 10 | 1 | 2 | 0 | 0 |
Values of δ (q, α) function for q ∈ Q and α = A, C, G, T constructed for the set = {AAA, AAC, ACA, ACC, CCT}.
Figure 2Tree . Tree for the set = {AAA, AAC, ACA, ACC, CCT} under Markov model of order 1. Dashed colored links represent δ function for internal node (8) – in red, and for marked node (10) corresponding to the word AAC ∈ – in purple.
Comparison of p-values calculated by the AHOPRO program, by Monte Carlo simulations and by compound Poisson distribution formula under the M0 model
| MOTIF, CUTOFF | OCC. | AHOPRO | MONTE CARLO | POISSON | AHOPRO/MC | AHOPRO/POISSON |
| 3 | 0.012 | 0.012 | 0.010 | 1.00 | 1.10 | |
| 4 | 0.0044 | 0.0044 | 0.0033 | 1.01 | 1.34 | |
| 2 | 0.013 | 0.013 | 0.012 | 0.99 | 1.04 | |
| 3&4 | 0.00025 | 0.00026 | 3.6E-05 | 0.99 | 7.10 | |
| 3&4&2 | 6.54E-06 | 5.8E-06 | 4.34E-07 | 1.13 | 7.13 |
Comparison of p-values calculated for the Markov(0) model by the AHOPRO program with p-values calculated by Monte Carlo simulations and by Poisson formula for motifs of D. melanogaster developmental transcription factors bicoid, kruppel and hunchback.
Comparison of p-values calculated by the AHOPRO program, by Monte Carlo simulation and by compound Poisson distribution formula under the M1 model
| MOTIF, CUTOFF | OCC. | AHOPRO | MONTE CARLO | POISSON | AHOPRO/MC | AHOPRO/POISSON |
| 3 | 0.013 | 0.014 | 0.012 | 0.998 | 1.11 | |
| 4 | 0.011 | 0.011 | 0.008 | 1.01 | 1.43 | |
| 2 | 0.14 | 0.14 | 0.11 | 0.9987 | 1.25 | |
| 3&4 | 0.00051 | 0.00051 | 9.62E-05 | 0.9991 | 5.34 | |
| 3&4&2 | 6.9E-05 | 6.97E-05 | 1.08E-05 | 0.9889 | 6.36 |
Comparison of p-values calculated by the AHOPRO program for the Markov(1) model with those calculated by Monte Carlo simulations and by Poisson formula for motifs of D. melanogaster developmental transcription factors bicoid, kruppel, and hunchback.
Figure 3P-value distribution for . Distribution of log10 (Pvalue) calculated for the M1 model as a function of cutoff values for PWMs for BICOID and KRUPPEL in the even-skipped stripe 2 enhancer (A), in a random sequence (B). View from above: eve2 sequence (C), random sequence (D).
Comparison of p-values and cutoff for different sets of DNA sequences
| regulatory regions bicoid regulated | minimal pvalue | Cut-off | regulatory regions not regulated by bicoid | minimal pvalue | Cut-off | random seq. | minimal pvalue | Cut-off |
| Btd crm | 3.24E-05 | 3.4 | Gt p. enh. | 0.023 | 2.7 | seq. 1 | 0.16 | 2.6 |
| Hb P2 | 4.13E-05 | 3.7 | Hb upstream enh. | 0.053 | 4.4 | seq. 2 | 0.12 | 1.7 |
| Kni cis element | 0.01 | 5.3 | Eve stripe 4+6 enh. | 0.41 | 3.6 | seq. 3 | 0.25 | 1.2 |
| Kr CD-1 enh. | 0.0001 | 5.1 | Eve stripe 3+7 enh. | 0.58 | 2.5 | seq. 4 | 0.065 | 1.6 |
| Otd early enh. | 0.024 | 5 | Ftz upstream enh. | 0.037 | 5.8 | seq. 5 | 0.11 | 1 |
| Sal blastoder. enh. | 8.62E-04 | 6.5 | Ftz | 0.28 | 3.3 | seq. 6 | 0.0087 | 3.8 |
| Tll PD enh. | 0.26 | 4.2 | Ubx PBX enh. | 0.196 | 6.7 | seq. 7 | 0.024 | 2.9 |
| Tll AD+PD enh. | 0.025 | 8.1 | Ubx BXD enh. | 0.698 | 4.6 | seq. 8 | 0.17 | 3.4 |
| Eve stripe 2 enh. | 4.04E-05 | 5.1 | Ubx BX enh. (BRE) | 0.05 | 7.5 | seq. 9 | 0.092 | 2.8 |
| Eve stripe 1 enh. | 8.09E-06 | 5.2 | Ems upstream enh. | 0.276 | 4.4 | seq. 10 | 0.052 | 3.6 |
| Eve stripe 5 enh. | 0.27 | 3.8 | En stripe enh. (intr. 1) | 0.049 | 5 | seq. 11 | 0.13 | 1.7 |
| Median | 8.62E-04 | 5.1 | Median | 0.196 | 4.4 | Median | 0.1128 | 2.6 |
Comparison of minimal p-values and best found cutoffs for bicoid PWM calculated (i) in regulatory regions which are regulated by bicoid, (ii) in regulatory regions which are not regulated by bicoid, and (iii) in random sequences of the same length and with the same dinucleotide distribution as in the even-skipped stripe 2 enhancer.