Literature DB >> 23899776

MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.

Wei Chen1, Yongmei Cheng, Clarence Zhang, Shaowu Zhang, Hongyu Zhao.   

Abstract

Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.
© 2013 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  16S rRNA reads; Clustering algorithms; Next-generation sequencing; Operational taxonomic unit (OTU); Seeds-selection

Mesh:

Substances:

Year:  2013        PMID: 23899776      PMCID: PMC3895816          DOI: 10.1016/j.mimet.2013.07.004

Source DB:  PubMed          Journal:  J Microbiol Methods        ISSN: 0167-7012            Impact factor:   2.363


  20 in total

1.  Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness.

Authors:  Patrick D Schloss; Jo Handelsman
Journal:  Appl Environ Microbiol       Date:  2005-03       Impact factor: 4.792

2.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

3.  Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.

Authors:  Patrick D Schloss; Sarah L Westcott
Journal:  Appl Environ Microbiol       Date:  2011-03-18       Impact factor: 4.792

4.  A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis.

Authors:  Yijun Sun; Yunpeng Cai; Susan M Huse; Rob Knight; William G Farmerie; Xiaoyu Wang; Volker Mai
Journal:  Brief Bioinform       Date:  2011-04-27       Impact factor: 11.622

5.  SPICi: a fast clustering algorithm for large biological networks.

Authors:  Peng Jiang; Mona Singh
Journal:  Bioinformatics       Date:  2010-02-24       Impact factor: 6.937

6.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering.

Authors:  Susan M Huse; David Mark Welch; Hilary G Morrison; Mitchell L Sogin
Journal:  Environ Microbiol       Date:  2010-03-11       Impact factor: 5.491

7.  A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences.

Authors:  David J Russell; Samuel F Way; Andrew K Benson; Khalid Sayood
Journal:  BMC Bioinformatics       Date:  2010-12-17       Impact factor: 3.169

8.  ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.

Authors:  Yunpeng Cai; Yijun Sun
Journal:  Nucleic Acids Res       Date:  2011-05-19       Impact factor: 16.971

9.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes.

Authors:  Mohammadreza Ghodsi; Bo Liu; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2011-06-30       Impact factor: 3.169

10.  Accuracy and quality of massively parallel DNA pyrosequencing.

Authors:  Susan M Huse; Julie A Huber; Hilary G Morrison; Mitchell L Sogin; David Mark Welch
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  4 in total

1.  ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

Authors:  Yunpeng Cai; Wei Zheng; Jin Yao; Yujie Yang; Volker Mai; Qi Mao; Yijun Sun
Journal:  PLoS Comput Biol       Date:  2017-04-24       Impact factor: 4.475

2.  Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.

Authors:  Oscar Franzén; Jianzhong Hu; Xiuliang Bao; Steven H Itzkowitz; Inga Peter; Ali Bashir
Journal:  Microbiome       Date:  2015-10-05       Impact factor: 14.650

3.  Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods.

Authors:  Jullien M Flynn; Emily A Brown; Frédéric J J Chain; Hugh J MacIsaac; Melania E Cristescu
Journal:  Ecol Evol       Date:  2015-05-13       Impact factor: 2.912

4.  Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.

Authors:  Ze-Gang Wei; Xiao-Dan Zhang; Ming Cao; Fei Liu; Yu Qian; Shao-Wu Zhang
Journal:  Front Microbiol       Date:  2021-03-24       Impact factor: 5.640

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.