Literature DB >> 34668516

AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees.

Lenore Pipes1, Rasmus Nielsen1,2,3.   

Abstract

MOTIVATION: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.
RESULTS: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.
AVAILABILITY AND IMPLEMENTATION: AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust. SUPPLEMENTARY INFORMATION: Supplementary figures and table are available online.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 34668516      PMCID: PMC8756197          DOI: 10.1093/bioinformatics/btab723

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  19 in total

1.  Clustering of highly homologous sequences to reduce the size of large protein databases.

Authors:  W Li; L Jaroszewski; A Godzik
Journal:  Bioinformatics       Date:  2001-03       Impact factor: 6.937

2.  A parallel computational framework for ultra-large-scale sequence clustering analysis.

Authors:  Wei Zheng; Qi Mao; Robert J Genco; Jean Wactawski-Wende; Michael Buck; Yunpeng Cai; Yijun Sun
Journal:  Bioinformatics       Date:  2019-02-01       Impact factor: 6.937

3.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

4.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

5.  The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.

Authors:  Douglas B Rusch; Aaron L Halpern; Granger Sutton; Karla B Heidelberg; Shannon Williamson; Shibu Yooseph; Dongying Wu; Jonathan A Eisen; Jeff M Hoffman; Karin Remington; Karen Beeson; Bao Tran; Hamilton Smith; Holly Baden-Tillson; Clare Stewart; Joyce Thorpe; Jason Freeman; Cynthia Andrews-Pfannkoch; Joseph E Venter; Kelvin Li; Saul Kravitz; John F Heidelberg; Terry Utterback; Yu-Hui Rogers; Luisa I Falcón; Valeria Souza; Germán Bonilla-Rosso; Luis E Eguiarte; David M Karl; Shubha Sathyendranath; Trevor Platt; Eldredge Bermingham; Victor Gallardo; Giselle Tamayo-Castillo; Michael R Ferrari; Robert L Strausberg; Kenneth Nealson; Robert Friedman; Marvin Frazier; J Craig Venter
Journal:  PLoS Biol       Date:  2007-03       Impact factor: 8.029

6.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

7.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

8.  bold: The Barcode of Life Data System (http://www.barcodinglife.org).

Authors:  Sujeevan Ratnasingham; Paul D N Hebert
Journal:  Mol Ecol Notes       Date:  2007-05-01

9.  Kalign 3: multiple sequence alignment of large data sets.

Authors:  Timo Lassmann
Journal:  Bioinformatics       Date:  2019-10-26       Impact factor: 6.937

10.  Fast gap-affine pairwise alignment using the wavefront algorithm.

Authors:  Santiago Marco-Sola; Juan Carlos Moure; Miquel Moreto; Antonio Espinosa
Journal:  Bioinformatics       Date:  2021-05-01       Impact factor: 6.937

View more
  1 in total

Review 1.  Methodologies for the De novo Discovery of Transposable Element Families.

Authors:  Jessica M Storer; Robert Hubley; Jeb Rosen; Arian F A Smit
Journal:  Genes (Basel)       Date:  2022-04-17       Impact factor: 4.141

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.