| Literature DB >> 31057715 |
Fatma A Hashim1, Mai S Mabrouk2, Walid Al-Atabany1.
Abstract
The DNA motif discovery is a primary step in many systems for studying gene function. Motif discovery plays a vital role in identification of Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Over the past decades, different algorithms were used to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approaches that many of them are time-consuming and easily trapped in a local optimum. Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome these problems. This paper presents a general classification of motif discovery algorithms with new sub-categories that facilitate building a successful motif discovery algorithm. It also presents a summary of comparison between them.Entities:
Keywords: Algorithms; Bioinformatics; Consensus; Gene expression regulation; Nucleotide motif; Protein binding
Year: 2019 PMID: 31057715 PMCID: PMC6490410
Source DB: PubMed Journal: Avicenna J Med Biotechnol ISSN: 2008-2835
Figure 1.General block diagram of motif discovery technique.
Motif discovery algorithms
| 1 | YMF | ( | |
| 2 | DREME | Simple word- based | ( |
| 3 | oligonucleotide analysis | ( | |
| 4 | CisFinder | ( | |
| 5 | By Thomas | Simple word-based with Clustering technique | ( |
| 6 | POSMO | ( | |
| 7 | Weeder | ( | |
| 8 | FMotif | ( | |
| 9 | By G. Pavesi | ( | |
| 10 | MITRA | Tree-based | ( |
| 11 | CENSUS | ( | |
| 12 | RISOTTO | ( | |
| 13 | SLI-REST | ( | |
| 14 | DRIMust | ( | |
| 15 | MCES | Tree based with clustering technique | ( |
| 16 | WINNOWER | ( | |
| 17 | Pruner | ( | |
| 18 | cWINNOWER | ( | |
| 19 | By Sze | ( | |
| 20 | RecMotif | Graph-theoretic | ( |
| 21 | ListMotif | ( | |
| 22 | TreeMotif | ( | |
| 23 | GWM | ( | |
| 24 | GWM2 | ( | |
| 25 | Voting | ( | |
| 26 | PMS1 | ( | |
| 27 | PMS2 | ( | |
| 28 | PMS3 | ( | |
| 29 | By Sze | ( | |
| 30 | PMSi | ( | |
| 31 | PMSP | Fixed candidates | ( |
| 32 | Stemming | ( | |
| 33 | PMS4 | ( | |
| 34 | PMS5 | ( | |
| 35 | PMS6 | ( | |
| 36 | PairMotif | ( | |
| 37 | iTriplet | ( | |
| 38 | PMSPrune | ( | |
| 39 | Pampa | ( | |
| 40 | PMS3p | ( | |
| 41 | Provable | Modified candidate | ( |
| 42 | qPMSPruneI | ( | |
| 43 | qPMS7 | ( | |
| 44 | By Tanaka | ( | |
| 45 | Random projection | ( | |
| 46 | Uniform projection | Hashing | ( |
| 47 | Low-dispersion projection | ( | |
| 48 | MULTIPROFILER | Extended sample-driven (ESD) | ( |
| 49 | Pattern Branching | ( | |
| 50 | Ref Select | Reference selection | ( |
| Probabilistic approach | |||
| 51 | MEME | ( | |
| 52 | STEME | EM | ( |
| 53 | EXTREME | ( | |
| 54 | Profile Branching | ( | |
| 55 | APMotif | EM with clustering | ( |
| 56 | AlignACE | ( | |
| 57 | SPWDM | ( | |
| 58 | By Lawrence | Gibbs sampling | ( |
| 59 | Motif- Sampler | ( | |
| 60 | BioProspector | Gibbs Sampling with hidden markov | ( |
| 62 | MITSU | ( | |
| 63 | MCEMDA | Stochastic Expectation Maximization (sEM) | ( |
| 64 | SEAM | ( | |
| 65 | By Jensen | ( | |
| 66 | LOGOS | ( | |
| 67 | BaMM | ( | |
| 68 | By Jääskinen | ( | |
| 69 | By Frith | Baysian approach | ( |
| 70 | SBaSeTraM | ( | |
| 71 | By Wakefield | ( | |
| 72 | MotifCut | ( | |
| 73 | MCL-WMR | Graphic based | ( |
| 74 | EPP | Entropy-based position projection | ( |
| 75 | CONSENSUS | Greedy Algorithm | ( |
| 76 | By Huang | heuristic algorithm | ( |
| GA | |||
| 77 | St-GA | ( | |
| 78 | GAMI | ( | |
| 79 | FMGA | Simple GA | ( |
| 80 | MDGA | ( | |
| 81 | By Paul | ( | |
| 82 | By Vijayvargiya | Clustering | ( |
| 83 | By Gutierrez | ( | |
| 84 | GARPS | ( | |
| 85 | GAEM | ( | |
| 86 | GADEM | ( | |
| 87 | CompareProspector | ( | |
| 88 | By Fatemeh | Hybrid | ( |
| 89 | GEMFA | ( | |
| 90 | MRPGA | ( | |
| 91 | By Xiaochun | ( | |
| 92 | GAME | ( | |
| 93 | By Yetian | ( | |
| 94 | By Li | Others | ( |
| 95 | MOGAMOD | ( | |
| PSO | |||
| 96 | PMbPSO | Standard PSO | ( |
| 97 | LPBS | ( | |
| 98 | PSOMF | ( | |
| 99 | Lei | ( | |
| 100 | Lei | ( | |
| 101 | DSAPSO | Modified PSO | ( |
| 102 | By Karabulut | ( | |
| 103 | Lei | ( | |
| 104 | Hardin | ( | |
| 105 | GSA-PSO | Hybrid | ( |
| 106 | SPSO-Lk | ( | |
| ABC algorithm | |||
| 107 | Multiobjective ABC | ( | |
| 108 | MO-ABC/DE | ABC | ( |
| 109 | Consensus ABC | ( | |
| ACO algorithm | |||
| 110 | Machhi | ACO with Gibbs sampling | ( |
| 111 | MFACO | ( | |
| 112 | Cheng | ACO with EM | ( |
| CS algorithm | |||
| 113 | MACS | CS | ( |
| Combinatorial | |||
| 114 | STGEMS | Enumerative and probalistic approaches | ( |
| 115 | MDScan | ( | |
| 116 | MUSA | Probabilstic and machine learning approaches | ( |
| 117 | EMD | Multiple algorithms | ( |
| 118 | MobyDick | Dictionary | ( |
| 119 | WordSpy | ( |
Figure 2.Classification of motif discovery algorithms as enumerative, probabilistic, nature inspired and combinatorial types.
ant
on the character c at position l. A lot of methods used ACO algorithm in motif discovery. Machhi et al 109 used ACO algorithm with a Gibbs sampling algorithm. An ACO algorithm finds better starting positions of the sequences provided as starting position for the Gibbs sampler method instead of random initialization. This algorithm starts with each ant choosing the path to construct a sample with motif length (m) and that depends on pheromone probability. Then, each ant is compared between the selected sample (m) and each substring in input sequences to get the set that represents the best matching substrings. Next, fitness function for each selected set is calculated. After that, the amount of pheromone is updated and finally iterated until no change.Real datasets of DNA motifs
| TRANSFAC is the database of eukaryotic TFs, their genomic binding sites, and DNA-binding profiles | ||
| A public dataset of motifs for multicellular eukaryotes | ||
| PROSITE includes documentation sections describing protein domains, families and functional sites in addition to related patterns and profiles to recognize them | ||
| It contains predicted TFs for S. cerevisiae. | ||
| Provides curated information on the transcriptional regulatory network of E. coli and contains both computational as well as experimental data of predicted objects | ||
| It contains a list of >160,000 predicted TFs from >300 species | ||
| It contains TFs for Bacillus subtilis |
Comparison of approaches
| × | ✓ | ✓ | ? | |
| × | ✓ | ? | ? | |
| × | ✓ | ✓ | ✓ | |
| ✓ | × | ✓ | ? | |
| × | ✓ | ? | ? | |
| Consensus | PWM | Flexible | Flexible | |
| ✓ | × | ? | ? | |
| × | ✓ | ? | ? | |
| Limited | Flexible | Flexible | ? | |
| Flexible | Flexible | Flexible | Flexible | |
| ✓ | × | ✓ | ? | |
| × | × | ✓ | ? | |
Hint: Question mark means maybe yes or no, Enum., prob., Nat., and Com. are standing for Enumerative, Probabilistic, Nature-inspired and Combinatorial, respectively.