Literature DB >> 33289888

Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation.

Edgar Garriga1, Paolo Di Tommaso1, Cedrik Magis1, Ionas Erb1, Leila Mansouri1, Athanasios Baltzis1, Evan Floden1, Cedric Notredame2,3.   

Abstract

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.

Keywords:  Guide tree; MSA; Progressive alignment; Sequence alignment

Mesh:

Year:  2021        PMID: 33289888     DOI: 10.1007/978-1-0716-1036-7_6

Source DB:  PubMed          Journal:  Methods Mol Biol        ISSN: 1064-3745


  6 in total

1.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

2.  PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.

Authors:  Kazutaka Katoh; Hiroyuki Toh
Journal:  Bioinformatics       Date:  2006-11-21       Impact factor: 6.937

3.  [Cortical electrostimulation in skull and brain injury].

Authors:  F A Gurchin; S V Medvedev; V Iu Puzenko
Journal:  Fiziol Cheloveka       Date:  1988 Mar-Apr

4.  The alignment of sets of sequences and the construction of phyletic trees: an integrated method.

Authors:  P Hogeweg; B Hesper
Journal:  J Mol Evol       Date:  1984       Impact factor: 2.395

5.  Sequence embedding for fast construction of guide trees for multiple sequence alignment.

Authors:  Gordon Blackshields; Fabian Sievers; Weifeng Shi; Andreas Wilm; Desmond G Higgins
Journal:  Algorithms Mol Biol       Date:  2010-05-14       Impact factor: 1.405

6.  Pfam: the protein families database.

Authors:  Robert D Finn; Alex Bateman; Jody Clements; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Andreas Heger; Kirstie Hetherington; Liisa Holm; Jaina Mistry; Erik L L Sonnhammer; John Tate; Marco Punta
Journal:  Nucleic Acids Res       Date:  2013-11-27       Impact factor: 16.971

  6 in total
  1 in total

1.  Evidence that nuclear receptors are related to terpene synthases.

Authors:  Douglas R Houston; Jane G Hanna; J Constance Lathe; Stephen G Hillier; Richard Lathe
Journal:  J Mol Endocrinol       Date:  2022-03-14       Impact factor: 5.098

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.