Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Literature DB >> 29116822

Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Wenhe Su¹, Xiangke Liao¹, Yutong Lu¹, Quan Zou², Shaoliang Peng^1,3.

Abstract

Multiple sequence alignment (MSA) is an essential prerequisite and dominant method to deduce the biological facts from a set of molecular biological sequences. It refers to a series of algorithmic solutions for the alignment of evolutionarily related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. These methods can be applied to DNA, RNA, or protein sequences. In this work, we take advantage of a center-star strategy to reduce the MSA problem to pairwise alignments, and we use a suffix tree to match identical substrings between two pairwise sequences. Multiple sequence alignment based on a suffix tree and center-star strategy (MASC) can accomplish MSA in O(mn), which is linear time complexity, where m is the number of sequences and n is the average length of sequences. Furthermore, we execute our method on the Spark-distributed parallel framework to deal with ever-increasing massive data sets. Our method is significantly faster than previous techniques, with no loss in accuracy for highly similar nucleotide sequences like homologous sequences, which we experimentally demonstrate. Comparing with mainstream MSA tools (e.g., MAFFT), MASC could finish the alignment of 67,200 sequences, longer than 10,000 bps, in 9 minutes, which takes MAFFT >3.5 days.

Keywords: Spark; center-star strategy; multiple sequence alignment; suffix tree

Mesh：

Substances：
Proteins
RNA
DNA

Year: 2017 PMID： 29116822 DOI： 10.1089/cmb.2017.0040

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

6 in total

1. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods.

Authors: Fuyi Li; Yanan Wang; Chen Li; Tatiana T Marquez-Lago; André Leier; Neil D Rawlings; Gholamreza Haffari; Jerico Revote; Tatsuya Akutsu; Kuo-Chen Chou; Anthony W Purcell; Robert N Pike; Geoffrey I Webb; A Ian Smith; Trevor Lithgow; Roger J Daly; James C Whisstock; Jiangning Song
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622

2. Identification of Inhibitors of MMPS Enzymes via a Novel Computational Approach.

Authors: Jian Song; Jijun Tang; Fei Guo
Journal: Int J Biol Sci Date: 2018-05-22 Impact factor: 6.580

3. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning.

Authors: V Vineetha; C L Biji; Achuthsankar S Nair
Journal: Sci Rep Date: 2019-04-29 Impact factor: 4.379

4. SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array.

Authors: Ziyuan Wang; Junjie Tan; Yanling Long; Yijia Liu; Wenyan Lei; Jing Cai; Yi Yang; Zhibin Liu
Journal: Comput Struct Biotechnol J Date: 2022-03-21 Impact factor: 7.271

Review 5. Developments in Algorithms for Sequence Alignment: A Review.

Authors: Jiannan Chao; Furong Tang; Lei Xu
Journal: Biomolecules Date: 2022-04-06

6. HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences.

Authors: Furong Tang; Jiannan Chao; Yanming Wei; Fenglong Yang; Yixiao Zhai; Lei Xu; Quan Zou
Journal: Mol Biol Evol Date: 2022-08-03 Impact factor: 8.800

6 in total