Literature DB >> 28582480

PASTASpark: multiple sequence alignment meets Big Data.

José M Abuín1, Tomás F Pena1, Juan C Pichel1.   

Abstract

MOTIVATION: One basic step in many bioinformatics analyses is the multiple sequence alignment. One of the state-of-the-art tools to perform multiple sequence alignment is PASTA (Practical Alignments using SATé and TrAnsitivity). PASTA supports multithreading but it is limited to process datasets on shared memory systems. In this work we introduce PASTASpark, a tool that uses the Big Data engine Apache Spark to boost the performance of the alignment phase of PASTA, which is the most expensive task in terms of time consumption.
RESULTS: Speedups up to 10×  with respect to single-threaded PASTA were observed, which allows to process an ultra-large dataset of 200 000 sequences within the 24-h limit.
AVAILABILITY AND IMPLEMENTATION: PASTASpark is an Open Source tool available at https://github.com/citiususc/pastaspark. CONTACT: josemanuel.abuin@usc.es. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28582480     DOI: 10.1093/bioinformatics/btx354

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning.

Authors:  V Vineetha; C L Biji; Achuthsankar S Nair
Journal:  Sci Rep       Date:  2019-04-29       Impact factor: 4.379

2.  The multiple alignments of very short sequences.

Authors:  Kristóf Takács; Vince Grolmusz
Journal:  FASEB Bioadv       Date:  2021-04-29

3.  PASTA for proteins.

Authors:  Kodi Collins; Tandy Warnow
Journal:  Bioinformatics       Date:  2018-11-15       Impact factor: 6.937

Review 4.  Bioinformatics applications on Apache Spark.

Authors:  Runxin Guo; Yi Zhao; Quan Zou; Xiaodong Fang; Shaoliang Peng
Journal:  Gigascience       Date:  2018-08-01       Impact factor: 6.524

5.  BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors:  Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal:  Front Big Data       Date:  2022-01-18
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.