| Literature DB >> 35161949 |
Satarupa Mohanty1, Prasant Kumar Pattnaik1, Ahmed Abdulhakim Al-Absi2, Dae-Ki Kang3.
Abstract
Personalized diagnosis of chronic disease requires capturing the continual pattern across the biological sequence. This repeating pattern in medical science is called "Motif". Motifs are the short, recurring patterns of biological sequences that are supposed signify some health disorder. They identify the binding sites for transcription factors that modulate and synchronize the gene expression. These motifs are important for the analysis and interpretation of various health issues like human disease, gene function, drug design, patient's conditions, etc. Searching for these patterns is an important step in unraveling the mechanisms of gene expression properly diagnose and treat chronic disease. Thus, motif identification has a vital role in healthcare studies and attracts many researchers. Numerous approaches have been characterized for the motif discovery process. This article attempts to review and analyze fifty-four of the most frequently found motif discovery processes/algorithms from different approaches and summarizes the discussion with their strengths and weaknesses.Entities:
Keywords: evolutionary approach; hashing; local search approach; mismatch tree; probabilistic approach; search tree; suffix tree; tries
Mesh:
Substances:
Year: 2022 PMID: 35161949 PMCID: PMC8838483 DOI: 10.3390/s22031204
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Hierarchy of PMS pattern driven algorithms.
Summary of Motif Discovery Algorithms based on Probabilistic Approach.
| S/N | Algorithm(s) | Commonness | Conclusion |
|---|---|---|---|
| 1 | Expectation Maximization (EM) and BioProspector | Both algorithms are based on local optimization approach or can find the motif which is confined to a local optimum | BioProspector is better due to its applicableness to heterogeneous sequences |
| 2 | Gibbs Sampling and Motif Sampler | Both use Markov Chain Monte Carlo approach and are time consuming | Motif Sampler is more robust and applicable to noisy datasets |
| 3 | MEME and CONSENSUS | Both the algorithms align a group of related binomial sequences to extract the common sequence motif | MEME consumes more space than CONSENSUS. CONSENSUS is Applicable to unknown alignment. |
| 4 | MCEMDA | This algorithm uses position weight matrix (PWM) and is the best in probabilistic approach | Globally optimum |
Summary of Motif Discovery Algorithms based on Local Search Approach.
| S/N | Algorithm(s) | Commonness | Conclusion |
|---|---|---|---|
| 1 | WINNOWER, cWINNOWER, MotifCut | All are graph-based algorithms | cWINNOWER is relatively fast and able to identifies the fuzzy motifs |
| 2 | MULTIPROFILER, Pattern Branching and Profile Branching | Based on profile-based local search Techniques | Efficiency in finding the motif with numerous degenerate positions increases from MULTIPROFILER to Profile Branching |
| 3 | SP-STAR and Random Projection | Uses Sum-of-pairs scoring method which is the lobal search procedure | Memory efficient, faster algorithm and can reach the better seed |
| 4 | TEIRESIAS | Uses regular grammar to output each pattern that is present in the minimum number of sequences | Quasi-linear running time and output sensitive |
Summary of Motif Discovery Algorithms based on Machine Learning Approach.
| S/N | Algorithm(s) | Commonness | Conclusion |
|---|---|---|---|
| 1 | FMGA, GAME | All are based on Position Weight Matrix and Random Selection | GAME is better than FMGA but more complex in its implementation. |
| 2 | GEMFA, MOGAMOD, and MHABBO | All are genetic based algorithm approach | All have the common stumbling block of using more search space. MHABBO is more efficient in this group. |
| 3 | KEGRU and DESSO | Both are based on neural network. It can predict with shape feature detection of unknown motif in less time and cost | For some fixed K value, the prediction is more appropriate |
| 4 | GAMOT, GARPS, and AMDILM | Based on Random Projection Strategy where it uses total distance as scoring function | AMDILM is more efficient in this group as it works with consistency and high accuracy |
| 5 | GENMOTIF and Hierarchical LSTM | These are time series motif discovery algorithms. Uses more search space. | Flexible enough to accommodate task characteristics and all type of motif specification and reach to achieve optimization Function |
Summary of Motif Discovery Algorithms based on Exact Approach.
| S/N | Algorithm(s) | Commonness | Conclusion |
|---|---|---|---|
| Based on Enumeration of Patterns | |||
| 1 | PMS0, PMS1, PMSi, PMSP | Algorithms of Rajsekaran and his groups which are based on building neighborhoods of a group of | Implementation is easy to compare with other methods but takes more search space |
| 2 | Improved Pattern-driven, Pair-Motif and Stemming | Novel process of neighborhood generation through the pairing concept. Pair-Motif is the best algorithm in this group. | Efficiently decreases the search space of the previous group. |
| 3 | PMS5 and PMS6 | Advance version of the first group of algorithms which includes hashing and ILP techniques | Use of ILP intensifies the space but gives accurate result |
| 4 | PMS4 | Speedup Method which can actuate any motif search algorithm | Not self-sufficient. Needs a supportive algorithm to run |
| Based on Search Tree Approach | |||
| 5 | PMSprune, qPMSprune, PMS3p and qPMS7 | Incorporate the efficient pruning technique into the search tree of input (quorum) sequence | Efficiently handle the challenging instances with remarkable reduction in the search space. qPMS7 is the robust algorithm in this group |
| 6 | PAMPA | A branch and bound algorithm | Efficiently managing the search space of the motif |
| 7 | Provable | Employs the extended closest string | Solve the challenging instances on the real data set |
| 8 | PMS8 and qPMS9 | Based on state of the art of string reordering mechanism and novel pruning method | PMS8 is efficient for smaller sequence. qPMS9 is most efficient and recent quorum parallel |
| Based of Different Tree Data Structure and Hashing Search Process | |||
| 9 | SPELLER, SMILE, PSMILE, RISO, RISOTTO, SLI-REST and EXMOTIF | All are based on a suffix tree data structure | Save time and reduce search space. In this group, EXMOTIF is the finest to carry out approximate as well as exact matching |
| 10 | MITRA | Uses a mismatch tree and the pruning technique for minimizing the space | Combinatorial approach of Winnower and Smile |
| 11 | CENSUS | Uses tries to give a solution to the space component of the generalized suffix tree | Eliminates any sharing of information |
| 12 | VOTING | Uses easy tracking process of votes through hashing search process | Computation time and storage requirement are higher |
List of Motif Discovery Algorithms based on Probabilistic Approach.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | Expectation Maximization (EM) | Local optimization approach | Chances of getting biologically suitable motifs are higher | Sensitive to the initial position and do not give any guarantee to converge into a global optimum | Lawrence et al. [ |
| 2 | Gibbs Sampling | MCMC approach | Global over a parameterized distribution | Suffers a notable computational cost | Lawrence et al. [ |
| 3 | MEME | Alignment of motif | Applicable to unaligned biological sequences with insufficient prior knowledge | More space required | Bailey et al. [ |
| 4 | CONSENSUS | Alignment of a group of related binomial sequences | Applicable to unknown alignment | Uses Greedy algorithm | Hertz et al. [ |
| 5 | Motif Sampler or Gibbs Sampler | Iterative procedure of the Gibbs sampling | Robust and applicable to noisy datasets | Time consuming | Thijs, G. et al. [ |
| 6 | BioProspector | Expectation-Maximization (EM) method | Applicable to heterogeneous data | Confined to a local optimum | Sinha et al. [ |
| 7 | MCEMDA | Position weight matrix (PWM) | Globally optimum | - | Bi CP. et al. [ |
List of Motif Discovery Algorithms based on Local Search Approach.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | TEIRESIAS | Uses regular grammar | Output each pattern that present in the minimum number of sequences | Quasi-linear running time and output sensitive | Rigoutsos et al. [ |
| 2 | WINNOWER | Graph-based algorithm | Convert motif search to a clique finding problem | Requires considerable computational resources and is thus relatively slow | Pevzner et al. [ |
| 3 | SP-STAR | Sum-of-pairs scoring method | Memory efficient, faster algorithm | Heuristic-based approach | Pevzner et al. [ |
| 4 | cWINNOWER | Uses consensus constraint on graph designing | Identifies the fuzzy motifs | More space requirement | Liang et al. [ |
| 5 | Random Projection | Global search procedure | Reach the better seed | - | Buhler et al. [ |
| 6 | MULTIPROFILER | Profile-based approach | Better for the synthetic models | - | Keich et al. [ |
| 7 | Pattern Branching | Local search techniques | Faster than profile-based approach | Fails to find the motif with numerous degenerate positions | Price et al. [ |
| 8 | Profile Branching | Extends the pattern branching | Able to find the motif with numerous degenerate positions | Slower | Price et al. [ |
| 9 | MotifCut | Graph-theoretic approach | Different from the frequently used PWM | - | Fratkin et al. [ |
List of Motif Discovery Algorithms based on Evolutionary Approach.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | FMGA | Position Weight Matrix and weighted wheel method | Gives solution for not reaching the local minimum. | Uses random generation-based prediction | Falcon et al. [ |
| 2 | GAME | Position Weight Matrix and Random Selection | Eliminates the reliance on additional motif-finding programs | More complex | Wei et al. [ |
| 3 | GEMFA | EM based genetic algorithm | Escapes from the locally minimal solution | Uses more search space due to heuristic search | Chengpeng et al. [ |
| 4 | GAMOT | Uses total distance as scoring function | Fast Motif Discovery with smaller search space | Not tested on real data set. | Pevzner et al. [ |
| 5 | MOGAMOD | Multi objective genetic approach | Flexibility exists in the selection process | Can only work with data set with sequential character | Kaya et al. [ |
| 6 | GARPS | Random Projection Strategy | Optimize the heuristic method. | Process fails for the skewed nucleotide distribution | Hongwei Huo et al. [ |
| 7 | AMDILM | Iteratively process the increased length motifs | Works with consistency and high accuracy. | Time consuming in finding fitness score iteratively | Yetian Fan et al. [ |
| 8 | GENMOTIF | Time series motif discovery | Flexible enough to accommodate task characteristics and all type of motif specification | More search space required | Joan Serra et al. [ |
| 9 | MHABBO | Differential evolution and multi-objective optimization | Diversity in population is maintained | Algorithm has not applied to multi-objective motif discovery problem | Siling Feng et. al. [ |
| 10 | KEGRU | Recurrent Neural Network (RNN) | Inexpensive and Fast | Prediction done for some fixed ‘K’ values | Shen Z et. al. [ |
| 11 | DESSO | Binomial distribution and deep neural network | Prediction with shape feature detection of unknown motif | Time Complexity | Yang J et. al. [ |
| 12 | Hierarchical LSTM and attention network | Hierarchical LSTM and attention mechanism | Reach to achieve optimization Function | - | Shen Z et. al. [ |
List of Exact Motif Discovery Algorithms based Enumeration of Patterns.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | PMS0 | Comparison of neighbor distances | Easy to implement and understand | More Space complexity and time complexity | Rajasekaran et al. [ |
| 2 | PMS1 | Radix Sort | Easy to implement and track the neighbors | More Space complexity and time complexity | Rajasekaran et al. [ |
| 3 | PMSi | Compute the pairwise relationship among the neighbors | Pairwise approach reduce computational time and space | Computes the common neighbors of all | Davila et al. [ |
| 4 | PMSP | Based on building neighborhood of all | Can solve previously unsolvable instances | Space consuming | Davila et al. [ |
| 5 | Improved Pattern-driven | Fundamental pattern-driven approach | Handles the motif with higher l and d values | More time complexity | Sze et al. [ |
| 6 | Stemming | Novel process of neighborhood generation | Reduces the computational search space | Not able to reduce time of computation | Kuksa et al. [ |
| 7 | PMS4 | Speedup method | Actuates any motif search algorithm | Not self-sufficient. Needs a supportive algorithm to run | Rajasekaran et al. [ |
| 8 | PMS5 | Fusion of PMS1 and PMSprune. | More time complexity | Use of ILP intensifies the space | Dinh et al. [ |
| 9 | PMS6 | Introduces improved pre-processing steps and hashing technique | Skillfully reduces the searching time and space requirement | Higher number of equivalence classes are used | Bandyopadhyay et al. [ |
| 10 | Pair-Motif | Introduces the state of the art of pairing concept | Efficiently decreases the search space. | - | Yu, Q. et al. [ |
List of Exact Motif Discovery Algorithms based Search Tree Approach.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | PMSprune | Incorporates the efficient pruning technique on the search tree | Remarkable reduction in the search space | After the tree construction, pruning happens not at the time of tree construction | Davila et al. [ |
| 2 | qPMSprune | Search is applied on quorum of input sequences | First efficient exact quorum algorithm to solve challenging instances | - | Dinh et al. [ |
| 3 | PAMPA | Branch and bound Algorithm | Efficiently managing the search space of the motif | Theoretical time and space requirements are the same as PMSprune | Davila et al. [ |
| 4 | PMS3p | Efficiently integrates the idea of PMS3 with the feature of PMSprune | Proficiency to handle the challenging instances is very high | Computation time exceeds from some of the previous algorithms | Sharma et al. [ |
| 5 | Provable | Employs the extended closest string | Solve the challenging instances on the real data set | - | Chen et al. [ |
| 6 | qPMS7 | Extends the idea of qPMSprune | Robust algorithm in the group of quorum | More space requirements | Dinh et al. [ |
List of Exact Motif Discovery Algorithms based on Suffix tree, Mismatch tree, Tries, and Hashing Search Process.
| S/N | Algorithm | Operating Principle | Strengths | Weakness | Author and Reference |
|---|---|---|---|---|---|
| 1 | SPELLER | Suffix Tree | Speeds up the motif search process | Extra pre-processing is required | Sagot et al. [ |
| 2 | SMILE | Suffix Tree | Uses generalized suffix tree in contrast to the classical suffix tree | More space required due to the use of node-wise bit vector | Marsan et al. [ |
| 3 | PSMILE | Suffix Tree | Efficiently extracts the structured motif | For some exceptional instances, it does not work properly. | Carvalho et al. [ |
| 4 | RISO | Suffix Tree | Saves time and reduces search space | - | Carvalho et al. [ |
| 5 | RISOTTO | Box-link data structure with suffix tree | Upgrades the proficiency of RISO | Supplementary space is required | Pisanti et al. [ |
| 6 | EXMOTIF | Suffix Tree | In approximate matching as well as exact matching, it out-performs Riso | - | Zhang et al. [ |
| 7 | SLI-REST | Reverse engineering method on the suffix tree and links | Fast response | Explores the feasible Eulerian routes | Cazaux et al. [ |
| 8 | MITRA | Mismatch tree | Uses pruning technique to minimize the space | Combinatorial approach of Winnower and Smile | Eskin et al. [ |
| 9 | CENSUS | Tries | Gives a solution to the space component of the generalized suffix tree | Eliminates any sharing of information | Evans et al. [ |
| 10 | VOTING | Hashing search process | Uses easy tracking process of votes | Computation time and storage requirement is higher in simple voting | Chin et al. [ |