| Literature DB >> 24931999 |
Alastair M Kilpatrick1, Bruce Ward1, Stuart Aitken1.
Abstract
MOTIVATION: The Expectation-Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24931999 PMCID: PMC4058950 DOI: 10.1093/bioinformatics/btu286
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Realistic synthetic data: classification results
| Conservation (mean bits/col) | Deterministic EM | SEAM | MITSU | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC | AUC | AUC | |||||||
| 2.00 | 0.84 | 0.25 | — | 0.70 | 0.74 | 0.97 | |||
| 1.49 | 0.26 | 0.07 | 0.98 | 0.93 | — | 0.90 | |||
| 1.08 | 0.02 | 0.01 | 0.96 | 0.49 | 0.49 | — | |||
| 0.76 | 0.00 | 0.00 | 0.09 | 0.09 | — | ||||
| 0.51 | 0.00 | 0.00 | 0.06 | 0.06 | — | ||||
Note: sSn, sPPV and AUC for five collections of realistic synthetic data with varying levels of motif conservation. Best results are printed in bold. In these tests, motif discovery was carried out only at the known motif width.
Escherichia coli data: classification results
| Conservation(mean bits/col) | Deterministic EM | SEAM | MITSU | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC | AUC | AUC | |||||||
| ‘High’ (1.36) | 0.22 | 0.96 | 0.67 | 0.67 | — | 0.54 | |||
| ‘Low’ (0.78) | 0.63 | 0.41 | 0.96 | 0.65 | — | 0.57 | |||
| Overall (1.13) | 0.30 | 0.96 | 0.66 | 0.66 | — | 0.55 | |||
Note: sSn, sPPV and AUC for 20 datasets created using previously characterized E.coli TFBS sequences. Best results are printed in bold. In these tests, motif discovery was carried out only at the experimentally determined motif width.
Diverse prokaryotic data: classification results
| Conservation (mean bits/col) | Deterministic EM | SEAM | MITSU | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC | AUC | AUC | |||||||
| 0.99 | 0.75 | 0.67 | 0.99 | 0.86 | 0.86 | — | |||
Note: sSn, sPPV and AUC for nine datasets created using real prokaryotic data determined through ChIP experiments. Best results are printed in bold. In these tests, motif discovery was carried out only at the experimentally determined motif width.
Fig. 1.ROC curves (plotted for 0 ≤ sFPR ≤ 0.5) for the E.coli TorR motif discovered by the deterministic EM algorithm (left) and MITSU (right). Curve colour illustrates the threshold of , from highest (red) to lowest (blue)
Fig. 2.Sequence logos representing the E.coli TorR motif as discovered by the deterministic EM algorithm (top) and MITSU (bottom)
Fig. 3.Energy traces for two runs of both the deterministic EM algorithm (blue) and MITSU (red) on a synthetic dataset containing a perfectly conserved motif of width 8 bp. Algorithm convergence is marked with ‘×’ in both cases. We note that the sEM algorithm allows MITSU to escape local maxima of the likelihood function, which can trap deterministic EM (top)
Fig. 4.CRP motif sequence logos. From top: logo constructed from the 24 binding sites contained in the CRP dataset; logo representing the motif discovered by MITSU; logo representing the motif discovered by MEME when the number of known sites was not provided