Ali May1, Sanne Abeln1, Wim Crielaard2, Jaap Heringa3, Bernd W Brandt2. 1. Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands. 2. Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands. 3. Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands.
Abstract
MOTIVATION: 16S rDNA pyrosequencing is a powerful approach that requires extensive usage of computational methods for delineating microbial compositions. Previously, it was shown that outcomes of studies relying on this approach vastly depend on the choice of pre-processing and clustering algorithms used. However, obtaining insights into the effects and accuracy of these algorithms is challenging due to difficulties in generating samples of known composition with high enough diversity. Here, we use in silico microbial datasets to better understand how the experimental data are transformed into taxonomic clusters by computational methods. RESULTS: We were able to qualitatively replicate the raw experimental pyrosequencing data after rigorously adjusting existing simulation software. This allowed us to simulate datasets of real-life complexity, which we used to assess the influence and performance of two widely used pre-processing methods along with 11 clustering algorithms. We show that the choice, order and mode of the pre-processing methods have a larger impact on the accuracy of the clustering pipeline than the clustering methods themselves. Without pre-processing, the difference between the performances of clustering methods is large. Depending on the clustering algorithm, the most optimal analysis pipeline resulted in significant underestimations of the expected number of clusters (minimum: 3.4%; maximum: 13.6%), allowing us to make quantitative estimations of the bacterial complexity of real microbiome samples.
MOTIVATION: 16S rDNA pyrosequencing is a powerful approach that requires extensive usage of computational methods for delineating microbial compositions. Previously, it was shown that outcomes of studies relying on this approach vastly depend on the choice of pre-processing and clustering algorithms used. However, obtaining insights into the effects and accuracy of these algorithms is challenging due to difficulties in generating samples of known composition with high enough diversity. Here, we use in silico microbial datasets to better understand how the experimental data are transformed into taxonomic clusters by computational methods. RESULTS: We were able to qualitatively replicate the raw experimental pyrosequencing data after rigorously adjusting existing simulation software. This allowed us to simulate datasets of real-life complexity, which we used to assess the influence and performance of two widely used pre-processing methods along with 11 clustering algorithms. We show that the choice, order and mode of the pre-processing methods have a larger impact on the accuracy of the clustering pipeline than the clustering methods themselves. Without pre-processing, the difference between the performances of clustering methods is large. Depending on the clustering algorithm, the most optimal analysis pipeline resulted in significant underestimations of the expected number of clusters (minimum: 3.4%; maximum: 13.6%), allowing us to make quantitative estimations of the bacterial complexity of real microbiome samples.
Authors: Jyoti Shankar; Sebastian Szpakowski; Norma V Solis; Stephanie Mounaud; Hong Liu; Liliana Losada; William C Nierman; Scott G Filler Journal: BMC Bioinformatics Date: 2015-02-01 Impact factor: 3.169
Authors: Jingyuan Fu; Marc Jan Bonder; María Carmen Cenit; Ettje F Tigchelaar; Astrid Maatman; Jackie A M Dekens; Eelke Brandsma; Joanna Marczynska; Floris Imhann; Rinse K Weersma; Lude Franke; Tiffany W Poon; Ramnik J Xavier; Dirk Gevers; Marten H Hofker; Cisca Wijmenga; Alexandra Zhernakova Journal: Circ Res Date: 2015-09-10 Impact factor: 17.367
Authors: Marc Jan Bonder; Ettje F Tigchelaar; Xianghang Cai; Gosia Trynka; Maria C Cenit; Barbara Hrdlickova; Huanzi Zhong; Tommi Vatanen; Dirk Gevers; Cisca Wijmenga; Yang Wang; Alexandra Zhernakova Journal: Genome Med Date: 2016-04-21 Impact factor: 11.117