| Literature DB >> 32756341 |
Maria A Sierra1, Qianhao Li1, Smruti Pushalkar1, Bidisha Paul1, Tito A Sandoval2, Angela R Kamer1, Patricia Corby1, Yuqi Guo1, Ryan Richard Ruff3, Alexander V Alekseyenko4, Xin Li1, Deepak Saxena1,5.
Abstract
There is currently no criterion to select appropriate bioinformatics tools and reference databases for analysis of 16S rRNA amplicon data in the human oral microbiome. Our study aims to determine the influence of multiple tools and reference databases on α-diversity measurements and β-diversity comparisons analyzing the human oral microbiome. We compared the results of taxonomical classification by Greengenes, the Human Oral Microbiome Database (HOMD), National Center for Biotechnology Information (NCBI) 16S, SILVA, and the Ribosomal Database Project (RDP) using Quantitative Insights Into Microbial Ecology (QIIME) and the Divisive Amplicon Denoising Algorithm (DADA2). There were 15 phyla present in all of the analyses, four phyla exclusive to certain databases, and different numbers of genera were identified in each database. Common genera found in the oral microbiome, such as Veillonella, Rothia, and Prevotella, are annotated by all databases; however, less common genera, such as Bulleidia and Paludibacter, are only annotated by large databases, such as Greengenes. Our results indicate that using different reference databases in 16S rRNA amplicon data analysis could lead to different taxonomic compositions, especially at genus level. There are a variety of databases available, but there are no defined criteria for data curation and validation of annotations, which can affect the accuracy and reproducibility of results, making it difficult to compare data across studies.Entities:
Keywords: 16S rRNA; DADA2; Greengenes; HOMD; NCBI; QIIME; RDP; SILVA; databases
Mesh:
Substances:
Year: 2020 PMID: 32756341 PMCID: PMC7465726 DOI: 10.3390/genes11080878
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1(A) Dotplot of phylum abundances from Quantitative Insights Into Microbial Ecology (QIIME) and the Divisive Amplicon Denoising Algorithm (DADA2) pipelines, comparing the five reference databases. Total abundances are log10 transformed. (B) α-diversity measurements for QIIME pipeline. p-values are assigned as ≤0.05 (*), <0.002 (**), <0.0002 (***), and <0.0001 (****).
Figure 2Prevalence heatmap of presence/absence of the 50 most abundant genera in (A) QIIME and (B) DADA2 pipelines.
Figure 3Comparisons of hierarchical clusters at genus level. (A) Cladogram of the 50 most abundant genera in each pipeline, with cophenetic correlation coefficients of r = 0.889 and r = 0.898 for QIIME and DADA2, respectively. (B) Non-metric multidimensional scaling (NMDS) at genus level, based on a Bray–Curtis dissimilarity matrix.