Literature DB >> 22962346

Comparing clustering and pre-processing in taxonomy analysis.

Marc J Bonder1, Sanne Abeln, Egija Zaura, Bernd W Brandt.   

Abstract

MOTIVATION: Massively parallel sequencing allows for rapid sequencing of large numbers of sequences in just a single run. Thus, 16S ribosomal RNA (rRNA) amplicon sequencing of complex microbial communities has become possible. The sequenced 16S rRNA fragments (reads) are clustered into operational taxonomic units and taxonomic categories are assigned. Recent reports suggest that data pre-processing should be performed before clustering. We assessed combinations of data pre-processing steps and clustering algorithms on cluster accuracy for oral microbial sequence data.
RESULTS: The number of clusters varied up to two orders of magnitude depending on pre-processing. Pre-processing using both denoising and chimera checking resulted in a number of clusters that was closest to the number of species in the mock dataset (25 versus 15). Based on run time, purity and normalized mutual information, we could not identify a single best clustering algorithm. The differences in clustering accuracy among the algorithms after the same pre-processing were minor compared with the differences in accuracy among different pre-processing steps. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: bonder.m.j@gmail.com or b.brandt@acta.nl

Mesh:

Substances:

Year:  2012        PMID: 22962346     DOI: 10.1093/bioinformatics/bts552

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  27 in total

Review 1.  Second Era of OMICS in Caries Research: Moving Past the Phase of Disillusionment.

Authors:  M M Nascimento; E Zaura; A Mira; N Takahashi; J M Ten Cate
Journal:  J Dent Res       Date:  2017-04-06       Impact factor: 6.116

2.  Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem.

Authors:  Benjamin Flück; Laëtitia Mathon; Stéphanie Manel; Alice Valentini; Tony Dejean; Camille Albouy; David Mouillot; Wilfried Thuiller; Jérôme Murienne; Sébastien Brosse; Loïc Pellissier
Journal:  Sci Rep       Date:  2022-06-17       Impact factor: 4.996

3.  AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees.

Authors:  Lenore Pipes; Rasmus Nielsen
Journal:  Bioinformatics       Date:  2021-10-20       Impact factor: 6.931

4.  ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

Authors:  Yunpeng Cai; Wei Zheng; Jin Yao; Yujie Yang; Volker Mai; Qi Mao; Yijun Sun
Journal:  PLoS Comput Biol       Date:  2017-04-24       Impact factor: 4.475

5.  NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL.

Authors:  Ali May; Sanne Abeln; Mark J Buijs; Jaap Heringa; Wim Crielaard; Bernd W Brandt
Journal:  Nucleic Acids Res       Date:  2015-04-15       Impact factor: 16.971

6.  The Gut Microbiome Contributes to a Substantial Proportion of the Variation in Blood Lipids.

Authors:  Jingyuan Fu; Marc Jan Bonder; María Carmen Cenit; Ettje F Tigchelaar; Astrid Maatman; Jackie A M Dekens; Eelke Brandsma; Joanna Marczynska; Floris Imhann; Rinse K Weersma; Lude Franke; Tiffany W Poon; Ramnik J Xavier; Dirk Gevers; Marten H Hofker; Cisca Wijmenga; Alexandra Zhernakova
Journal:  Circ Res       Date:  2015-09-10       Impact factor: 17.367

7.  Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities?

Authors:  Emily A Brown; Frédéric J J Chain; Teresa J Crease; Hugh J MacIsaac; Melania E Cristescu
Journal:  Ecol Evol       Date:  2015-05-13       Impact factor: 2.912

8.  Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods.

Authors:  Jullien M Flynn; Emily A Brown; Frédéric J J Chain; Hugh J MacIsaac; Melania E Cristescu
Journal:  Ecol Evol       Date:  2015-05-13       Impact factor: 2.912

9.  Ecological consistency of SSU rRNA-based operational taxonomic units at a global scale.

Authors:  Thomas S B Schmidt; João F Matias Rodrigues; Christian von Mering
Journal:  PLoS Comput Biol       Date:  2014-04-24       Impact factor: 4.475

10.  Density-based hierarchical clustering of pyro-sequences on a large scale--the case of fungal ITS1.

Authors:  Marco Pagni; Hélène Niculita-Hirzel; Loïc Pellissier; Anne Dubuis; Ioannis Xenarios; Antoine Guisan; Ian R Sanders; Jérôme Goudet; Nicolas Guex
Journal:  Bioinformatics       Date:  2013-03-28       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.