| Literature DB >> 30340511 |
Ngoc Tam L Tran1, Chun-Hsi Huang2.
Abstract
BACKGROUND: Previous studies demonstrate the usefulness of using multiple tools and methods for improving the accuracy of motif detection. Over the past years, numerous motif discovery pipelines have been developed. However, they typically report only the top ranked results either from individual motif finders or from a combination of multiple tools and algorithms.Entities:
Keywords: Binding sites; DNA motif; Motif clustering; Motif detection tool; Motif discovery pipeline; Motif similarity detection
Mesh:
Year: 2018 PMID: 30340511 PMCID: PMC6194616 DOI: 10.1186/s12864-018-5148-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Characteristics of some existing motif discovery pipelines
| Pipeline | Components | Function | Input Format | Reference Database | Target Species | Platform | Year | Ref. |
|---|---|---|---|---|---|---|---|---|
| W-ChIPMotifs | Weeder, MaMF, Weeder, STAMP | Predict motifs from ChIP-Seq data | FASTA | TRANSFAC, Jaspar | Mouse Human | Web portal | 2009 | [ |
| CompleteMOTIFS | MEME, Weeder, ChIPMunk, Patser, STAMP | Predict motifs from ChIP-Seq data | FASTA, BED, GFF | TRANSFAC, Jaspar, User-defined file | Unspecified | Web portal | 2011 | [ |
| GimmeMotifs | BioProspector, GADEM, Improbizer, MDmodule, MEME, MoAn, MotifSampler, Trawler, Weeder | Predict motifs from ChIP-Seq data | BED, FASTA | Jaspar | Unspecified | Standalone application | 2011 | [ |
| MEME-ChIP | MEME, DREME, CentriMo, TOMTOM, SpaMo | Predict motifs from ChIP-Seq data | FASTA | Jaspar, UniProbe, User-defined file, etc.… | Unspecified | Web portal, Web-services, Command line tool | 2011 | [ |
| RSAT peak-motifs | Oligo-analysis, Position-analysis, Local-word analysis, Dyad-analysis | Predict motifs from ChIP-Seq data | FASTA | Jaspar, UniProbe, REGULONDB, User-defined file, etc.… | Unspecified | Web portal, Standalone application | 2012 | [ |
| MotifLab | AlignAce, BioProspector, ChIPMunk, MEME, MotifSampler, Priority, Weeder | Analyze regulatory sequence regions, Predict binding site motifs | FASTA, BED, etc.… | TRANSFAC, Jaspar, ScerTF | Unspecified | Standalone application | 2013 | [ |
| Promzea | BioProspector, MEME, Weeder, PSCAN, FIMO, Clover | Predict co-regulatory motifs | cDNA FASTA, microarray probe-set ID, BED | None | Maize, Rice, | Web portal | 2013 | [ |
Fig. 1Workflow of MODSIDE. The pipeline takes DNA input sequences in FASTA format. The motif discovery module has ChIPMunk, MEME, Weeder, and XXmotif. They can be run in a combination of at least two tools. The significant motifs are selected by using P-value ≤0.05 for ChIPMunk, E-value ≤0.05 for MEME and XXmotif, and the built-in significant score in Weeder. The selected motifs are subsequently fed into MOTIFSIM for comparisons. The comparison results include the global (common) significant motifs, the global and local significant motifs, and the best matches for each motif in the motif collection of multiple tools. MOTIFSIM also provides the options for generating the motif trees, merging similar motifs, and verifying the predicted motifs with the reference database
Sixteen benchmark sequence datasets [27]
| Sequence Dataset | Dataset Type | Species | Transcription Factor | Number of Sequences | Sequence Length |
|---|---|---|---|---|---|
| hm01g | Generic |
| AP-1 | 18 | 2000 |
| hm04g | Generic |
| c-Jun | 13 | 2000 |
| hm08m | Markov |
| CREB | 15 | 500 |
| hm15g | Generic |
| NF-1 | 4 | 2000 |
| hm17g | Generic |
| NF-kappaB | 11 | 500 |
| hm19g | Generic |
| Sp1 | 5 | 500 |
| hm22g | Generic |
| USF1 | 6 | 500 |
| hm22m | Markov |
| USF1 | 6 | 500 |
| mus09g | Generic |
| POU2F1 | 2 | 500 |
| mus10g | Generic |
| Sp1 | 13 | 1000 |
| mus11m | Markov |
| Sp1 | 12 | 500 |
| yst01g | Generic |
| ABF1 | 9 | 1000 |
| yst02g | Generic |
| GAL04 | 4 | 500 |
| yst03m | Markov |
| GCN4 | 8 | 500 |
| yst06g | Generic |
| MCM1 | 7 | 500 |
| yst09g | Generic |
| CAR1 | 16 | 1000 |
The datasets are grouped by species. Each dataset has a transcription factor embedded. Each dataset has different number of sequences and sequence length
Characteristics of MEME-ChIP, RSAT peak-motifs, and MODSIDE
| Pipeline | Components | Function | Input Format | Reference Database | Target Species | Sequence Limit | File Size Limit | Approach | Platform |
|---|---|---|---|---|---|---|---|---|---|
| MEME-ChIP | MEME, DREME, CentriMo, TOMTOM, SpaMo | Predict motifs from ChIP-Seq data | FASTA | Jaspar, UniProbe, User-defined file, etc.… | N/A | None | None | Profile-based method | Web portal, Web-services, Command line tool |
| RSAT peak-motifs | Oligo-analysis, Position-analysis, Local-word-analysis | Predict motifs from ChIP-Seq data | FASTA | Jaspar, UniProbe, REGULONDB, User-defined file, etc.… | N/A | None | None | Word-based method | Web portal, Standalone application |
| MODSIDE | ChIPMunk, MEME, Weeder, XXmotif, MOTIFSIM | Predict motifs in general and motifs from ChIP-Seq data | FASTA | Jaspar, TRANSFAC, UniPROBE | N/A | None | None | Profile-based method | Web portal |
Fig. 2Average statistics for ChIPMunk, MEME, Weeder, XXmotif, and MODSIDE on sixteen benchmark datasets. Four statistics at the nucleotide level are Sensitivity (nSn), Positive Predictive Value (nPPV), Specificity (nSp), and Correlation Coefficient (nCC). Two statistics at the site level are Sensitivity (sSn) and Positive Predictive Value (sPPV) [27]. MODSIDE achieves better accuracy than other tools
Fig. 3Average Statistics for MEME-ChIP, RSAT peak-motifs, and MODSIDE on sixteen benchmark datasets. MEME-ChIP has a lower accuracy than RSAT peak-motifs and MODSIDE. Both MODSIDE and RSAT peak-motifs achieve similar accuracy