| Literature DB >> 32636817 |
Moira Marizzoni1, Thomas Gurry2, Stefania Provasi3, Gilbert Greub4, Nicola Lopizzo3, Federica Ribaldi1,5,6, Cristina Festari1, Monica Mazzelli3, Elisa Mombelli3, Marco Salvatore7, Peppino Mirabelli7, Monica Franzese7, Andrea Soricelli7, Giovanni B Frisoni6, Annamaria Cattaneo3.
Abstract
Amplicon high-throughput sequencing of 16S ribosomal RNA (rRNA) gene is currently the most widely used technique to investigate complex gut microbial communities. Microbial identification might be influenced by several factors, including the choice of bioinformatic pipelines, making comparisons across studies difficult. Here, we compared four commonly used pipelines (QIIME2, Bioconductor, UPARSE and mothur) run on two operating systems (OS) (Linux and Mac), to evaluate the impact of bioinformatic pipeline and OS on the taxonomic classification of 40 human stool samples. We applied the SILVA 132 reference database for all the pipelines. We compared phyla and genera identification and relative abundances across the four pipelines using the Friedman rank sum test. QIIME2 and Bioconductor provided identical outputs on Linux and Mac OS, while UPARSE and mothur reported only minimal differences between OS. Taxa assignments were consistent at both phylum and genus level across all the pipelines. However, a difference in terms of relative abundance was identified for all phyla (p < 0.013) and for the majority of the most abundant genera (p < 0.028), such as Bacteroides (QIIME2: 24.5%, Bioconductor: 24.6%, UPARSE-linux: 23.6%, UPARSE-mac: 20.6%, mothur-linux: 22.2%, mothur-mac: 21.6%, p < 0.001). The use of different bioinformatic pipelines affects the estimation of the relative abundance of gut microbial community, indicating that studies using different pipelines cannot be directly compared. A harmonization procedure is needed to move the field forward.Entities:
Keywords: 16S rRNA amplicon sequencing; QIIME2; UPARSE; bioconductor; fecal human samples; microbiome; mothur
Year: 2020 PMID: 32636817 PMCID: PMC7318847 DOI: 10.3389/fmicb.2020.01262
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Overview of the pipelines used by free and open-source workflows: QIIME2, Bioconductor, UPARSE, and mothur. Each gray box represents a command of the pipelines. For UPARSE, chimera filtering is part of the OTU clustering step, and OTU taxonomic assignment was performed using mothur.
Results of 16S sequencing analyses by using QIIME2, Bioconductor, UPARSE, or mothur.
| Software version | 2018.08 | R version 3.5.1 | v11.0.667_i86 | v11.0.667_i86 | v.1.43.0 | v.1.43.0 | |||
| Approximate analysis time | 4 h | 3 h | 9 h | 8 h | 45 min | 45 min | 9 h | 9 h | – |
| # input reads | 4715000 | ||||||||
| # reads after filtering/denoising | 3,391,670 | 3,736,927 | 3,173,733 | 3,244,489 | – | ||||
| # reads assigned at phylum levela (tot n, mean ± SD) | 2,941,772 (87%) | 3,143,413 (84%) | 3,123,028 (98%) | 3,131,211 (99%) | 2,812,333 (89%) | 2,812,470 (87%) | |||
| # unclassified reads at phylum level (tot n, mean ± SD) | 3,567 (< 1%) | 163 (< 1%) | 77,395 (2%) | 79,031 (2%) | 47,883 (2%) | 48,770 (2%) | |||
| # reads assigned at genus levelb (tot n, mean ± SD) | 2,770,029 (82%) | 2,798,953 (75%) | 3,123,028 (98%) | 3,131,211 (99%) | 2,812,333 (89%) | 2,812,470 (87%) | <0.001 | ||
| # unclassified reads at genus level (tot n, mean ± SD) | 171,743 b (5%) | Taxonomic filtering during processing | 801,052 (25%) | 923,329 (29%) | 763,315 (24%) | 751,717 (23%) | <0.001 | ||
FIGURE 2Phyla distribution as identified by using QIIME2, Bioconductor, UPARSE, or mothur. The phyla present in less than 0.005% were Epsilonbacteraeota (QIIME2: 0.0046%, Bioconductor: 0.0043%, UPARSE-linux: 0.0049%, UPARSE-mac: 0.0049%, mothur: 0.0050%) and, for mothur only, Planctomycetes (0.0006%), Acidobacteria (0.0003%), Nitrospirae and Gemmatimonadetes (0.0001% both) Chloroflexi and Omnitrophicaeota (<0.0001%, both).
FIGURE 3Comparison of the relative abundance of phyla obtained by using QIIME2, Bioconductor, UPARSE, or mothur. p-Values were calculated using Friedman test followed by Dunn’s multiple comparisons test. Wilcoxon signed rank test was applied when only two pipelines were compared.
FIGURE 4Venn diagram showing the number of shared and specific bacterial genera among the pipelines. Histogram representing the number of genera identified by each pipeline and number of genera shared between 6, 5, 4, 3, 2, and no pipelines are also reported.
FIGURE 5Comparison of the relative abundance of the 10 most abundant genera obtained by applying QIIME2, Bioconductor, UPARSE, or mothur at genera level. p-Values were calculated using Friedman test followed by Dunn’s multiple comparisons test. Wilcoxon signed rank test was applied when only two pipelines were compared.
Usability of pipelines: pros and cons from an untrained user’s point of view.