| Literature DB >> 20380693 |
Timothy L Bailey1, Mikael Bodén, Tom Whitington, Philip Machanick.
Abstract
BACKGROUND: Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types-including sequence conservation, nucleosome positioning, and negative examples-can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).Entities:
Mesh:
Substances:
Year: 2010 PMID: 20380693 PMCID: PMC2868008 DOI: 10.1186/1471-2105-11-179
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Definition of terms used in describing the MEME algorithm
| number of input sequences | |
|---|---|
| length of input sequences | |
| the set of | |
| width of a MEME motif | |
| number of positions for a site | |
| probability of a site in any sequence | |
| PSPM model of motif; | |
| position-specific prior (PSP) | |
| width for which input PSP is defined | |
| missing information variables for | |
| expectation of | |
| prior probability given PSP & model | |
| model parameters at EM iteration | |
| all sequence model parameters | |
Performance of motif discovery algorithms on yeast TF ChIP-chip datasets.
| PhyloCon | local alignment of conserved regions | 19 | 12% |
| PhyME | alignment-based; uses EM | 21 | 13% |
| MEME_c | MEME run with non-conserved bases masked | 49 | 31% |
| PhyloGibbs | similar to PhyME but uses Gibbs sampling | 54 | 35% |
| Kellis | alignment-based | 56 | 36% |
| Converge | alignment-based; uses EM | 66 | 42% |
| PRIORITY- | Gibbs sampler with conservation-based priors | 69 | 44% |
| PRIORITY- | Gibbs sampler with discriminative conservation-based priors | 76 | 49% |
| MEME: OOPS | MEME with OOPS model | 36 | 23% |
| MEME: ZOOPS | MEME with ZOOPS model | 39 | 25% |
| MEME: OOPS- | MEME with OOPS model and | 73 | 47% |
| MEME: ZOOPS- | MEME with ZOOPS model and | 81 | 52% |
| PRIORITY- | Gibbs sampler with discriminative conservation-based priors | 69 (3) | 44% |
The table shows the number motifs (out of 156) successfully discovered by the named algorithms. The results in the top half of the table are taken from Gordân et al. [8]. Results in the bottom half are for new experiments performed by us. Each algorithm is allowed to report one motif, and success is declared if the scaled Euclidean distance to the known PSPM is <0.25. Proportions (out of 156) successes are rounded to the nearest integral percent.
Performance of motif discovery algorithms on mouse TF ChIP-seq datasets.
| TF | W | L | T | W | L | T | W | L | T | W | L | T | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nanog | × | 3.4e-01 | × | 1.3e-03 | × | 7.7e-03 | × | 2.1e-09 | ||||||||
| Oct4 | × | 1.0e-01 | × | 3.4e-01 | × | 5.8e-07 | × | 4.5e-14 | ||||||||
| Sox2 | × | 1.6e-01 | × | 1.6e-02 | × | 4.4e-01 | × | 1.3e-03 | ||||||||
| Smad1 | × | 1.0e-01 | × | 1.6e-01 | × | 2.4e-01 | × | 1.6e-01 | ||||||||
| E2f1 | × | 2.4e-01 | × | 4.5e-05 | × | 4.5e-05 | × | 4.4e-01 | ||||||||
| Tcfcp2l1 | × | 1.0e-01 | × | 7.7e-03 | × | 1.6e-02 | × | 1.9e-11 | ||||||||
| Ctcf | × | 4.4e-01 | × | 2.4e-01 | × | 4.4e-01 | × | 8.9e-16 | ||||||||
| Zfx | × | 1.0e-01 | × | 1.3e-03 | × | 1.6e-01 | × | 2.2e-10 | ||||||||
| Stat3 | × | 3.3e-03 | × | 4.4e-01 | × | 6.0e-02 | × | 1.6e-01 | ||||||||
| Klf4 | × | 1.6e-01 | × | 6.0e-02 | × | 1.0e-01 | × | 1.6e-01 | ||||||||
| Esrrb | × | 6.0e-02 | × | 6.0e-02 | × | 3.3e-03 | × | 4.5e-14 | ||||||||
| c-Myc | × | 3.3e-02 | × | 3.3e-03 | × | 2.4e-01 | × | 4.5e-05 | ||||||||
| n-Myc | × | 1.5e-04 | × | 4.5e-05 | × | 1.6e-02 | × | 1.6e-08 | ||||||||
| 3 | 0 | 10 | 4 | 3 | 6 | 4 | 2 | 7 | 6 | 3 | 4 | |||||
The table compares the relative accuracy of pairs of motif discovery algorithms. Relative accuracy is measured by the correlation on held out sets of sequences of the sequence ranks based on ChIP-seq peak scores versus the ranks based on the motif-based AMA score. A check in the "win" or "W" ("loss" or "L") column indicates that the motifs found by the first (second) algorithm had significantly better Spearman rank correlation, as judged by the sign test on the 50 random repeats (p-value < 0.05). A check in the "tie" or "T" column indicates that there was no significant difference. The "Total" line shows the totals using the sign test to judge significance. OOPS, ZOOPS, OOPS- and ZOOPS- refer to MEME with those models and with or without the prior.
Figure 1Comparison of motifs found in mouse ChIP-seq datasets. The figure shows the motifs reported by Chen et al. [11] and those found by MEME in sequences identified as bound to the given transcription factor in 13 ChIP-seq experiments. The MEME motifs were found using 100 randomly chosen bound sequences and the OOPS- prior. The inter-motif distance (scaled Euclidean distance) is computed as described in Additional file 1.