Literature DB >> 21984754

FastSP: linear time calculation of alignment accuracy.

Siavash Mirarab1, Tandy Warnow.   

Abstract

MOTIVATION: Multiple sequence alignment is a basic part of much biological research, including phylogeny estimation and protein structure and function prediction. Different alignments on the same set of unaligned sequences are often compared, sometimes in order to assess the accuracy of alignment methods or to infer a consensus alignment from a set of estimated alignments. Three of the standard techniques for comparing alignments, Developer, Modeler and Total Column (TC) scores can be derived through calculations of the set of homologies that the alignments share. However, the brute-force technique for calculating this set is quadratic in the input size. The remaining standard technique, Cline Shift Score, inherently requires quadratic time.
RESULTS: In this article, we prove that each of these scores can be computed in linear time, and we present FastSP, a linear-time algorithm for calculating these scores. Even on the largest alignments we explored (one with 50 000 sequences), FastSP completed <2 min and used at most 2 GB of the main memory. The best alternative is qscore, a method whose empirical running time is approximately the same as FastSP when given sufficient memory (at least 8 GB), but whose asymptotic running time has never been theoretically established. In addition, for comparisons of large alignments under lower memory conditions (at most 4 GB of main memory), qscore uses substantial memory (up to 10 GB for the datasets we studied), took more time and failed to analyze the largest datasets. AVAILABILITY: The open-source software and executables are available online at http://www.cs.utexas.edu/~phylo/software/fastsp/. CONTACT: tandy@cs.utexas.edu.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21984754     DOI: 10.1093/bioinformatics/btr553

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

1.  DNA-binding specificity changes in the evolution of forkhead transcription factors.

Authors:  So Nakagawa; Stephen S Gisselbrecht; Julia M Rogers; Daniel L Hartl; Martha L Bulyk
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-08       Impact factor: 11.205

2.  Large-scale multiple sequence alignment and tree estimation using SATé.

Authors:  Kevin Liu; Tandy Warnow
Journal:  Methods Mol Biol       Date:  2014

3.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

Authors:  Siavash Mirarab; Nam Nguyen; Sheng Guo; Li-San Wang; Junhyong Kim; Tandy Warnow
Journal:  J Comput Biol       Date:  2014-12-30       Impact factor: 1.479

4.  Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Authors:  Michael Nute; Ehsan Saleh; Tandy Warnow
Journal:  Syst Biol       Date:  2019-05-01       Impact factor: 15.683

5.  TreeCluster: Clustering biological sequences using phylogenetic trees.

Authors:  Metin Balaban; Niema Moshiri; Uyen Mai; Xingfan Jia; Siavash Mirarab
Journal:  PLoS One       Date:  2019-08-22       Impact factor: 3.240

6.  MAFFT-DASH: integrated protein sequence and structural alignment.

Authors:  John Rozewicki; Songling Li; Karlou Mar Amada; Daron M Standley; Kazutaka Katoh
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

7.  Recursive MAGUS: Scalable and accurate multiple sequence alignment.

Authors:  Vladimir Smirnov
Journal:  PLoS Comput Biol       Date:  2021-10-06       Impact factor: 4.475

8.  APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.

Authors:  Metin Balaban; Shahab Sarmashghi; Siavash Mirarab
Journal:  Syst Biol       Date:  2020-05-01       Impact factor: 15.683

9.  Phylogeny Estimation Given Sequence Length Heterogeneity.

Authors:  Vladimir Smirnov; Tandy Warnow
Journal:  Syst Biol       Date:  2021-02-10       Impact factor: 15.683

10.  MAGUS: Multiple sequence Alignment using Graph clUStering.

Authors:  Vladimir Smirnov; Tandy Warnow
Journal:  Bioinformatics       Date:  2021-07-19       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.