| Literature DB >> 31952477 |
Nicole R Narayan1, Thomas Weinmaier1, Emilio J Laserna-Mendieta2,3, Marcus J Claesson2,3, Fergus Shanahan2,4, Karim Dabbagh1, Shoko Iwai1, Todd Z DeSantis5.
Abstract
BACKGROUND: Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases. Piphillin (and similar tools) have previously been evaluated on 16S rRNA data processed by the clustering of sequences into operational taxonomic units (OTUs). New techniques such as amplicon sequence variant error correction are in increased use, but it is unknown if these techniques perform better in metagenomic content prediction pipelines, or if they should be treated the same as OTU data in respect to optimal pipeline parameters.Entities:
Keywords: Genomic databases; Metagenomics; Phylogenetic analysis; Sequence alignment; Shotgun sequencing
Mesh:
Substances:
Year: 2020 PMID: 31952477 PMCID: PMC6967091 DOI: 10.1186/s12864-019-6427-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Piphillin results comparing 16S rRNA sequence analysis approaches using the KEGG database. a 16S rRNA gene amplicon sequences passing the identity threshold to the reference genomes. Percentage of amplicon sequences from two datasets using two different 16S rRNA sequence analysis approaches passing identity cutoffs from 75 to 100% against 16S rRNA gene sequences in the KEGG genome database. b Spearman’s correlation coefficient between Piphillin results and shotgun metagenomics at ten different identity cutoffs tested in Piphillin. Spearman’s correlation coefficient was calculated for each sample and mean, 1st and 3rd quartiles are depicted by the boxes. Whiskers extend to the furthest points within 150% of the interquartile range. c Balanced accuracy in identifying differentially abundant KOs from Piphillin against corresponding metagenomics at each identity cutoff. * indicates p < 0.05, ** indicates p < 0.001, *** indicates p < 0.0001
Fig. 2Piphillin results comparing 16S rRNA sequence analysis approaches using the BioCyc database. a 16S rRNA gene amplicon sequences passing the identity threshold to the reference genomes. Percentage of amplicon sequences from two datasets using two different 16S rRNA sequence analysis approaches passing identity cutoffs from 75 to 100% against 16S rRNA gene sequences in the BioCyc genome database. b Spearman’s correlation coefficient between Piphillin results and shotgun metagenomics at ten different identity cutoffs tested in Piphillin. Spearman’s correlation coefficient was calculated for each sample and mean, 1st and 3rd quartiles are depicted by the boxes. Whiskers extend to the furthest points within 150% of the interquartile range. c Balanced accuracy in identifying differentially abundant features from Piphillin against corresponding metagenomics at each identity cutoff. * indicates p < 0.05, ** indicates p < 0.001, *** indicates p < 0.0001
Fig. 3Comparison between Piphillin and PICRUSt2 using DADA2-corrected ASV data. a Spearman’s correlation coefficient against corresponding shotgun metagenomics results were compared for two datasets. Spearman’s correlation coefficient was calculated for each sample and ranges are depicted as box and whisker plots as described in Fig. 1. b Comparison of log2FC in differential abundance analysis of KOs between metagenomic and Piphillin-predicted data. Color based on comparison to metagenomics results, in which the adjusted p-value cutoff was 0.2 for significance. c Comparison of log2FC in differential abundance analysis of KOs between metagenomic and PICRUSt2-predicted data. Color based on metagenomics results, in which the adjusted p-value cutoff was 0.2 for significance. d False positive rate, true positive rate/recall, balanced accuracy, and precision of detecting significant differences between cancer and healthy human oral biopsy samples were compared. * indicates p < 0.05, ** indicates p < 0.001, *** indicates p < 0.0001
Fig. 4Updated references results in more sequences considered in Piphillin analysis. Comparison of 16S rRNA gene amplicon sequences passing the identity threshold to the reference genomes based on database version for Piphillin results from (a) DADA2-corrected ASVs and (b) 97% de novo OTUs
Fig. 5Piphillin executed with BioCyc vs KEGG reference on environmental samples. Spearman’s correlation coefficient against corresponding shotgun metagenomics results were compared the hypersaline microbial mat dataset using either KEGG and BioCyc references. Spearman’s correlation coefficient was calculated for each sample and ranges are depicted as box and whisker plots as described in Fig. 1. * indicates p < 0.05, ** indicates p < 0.001, *** indicates p < 0.0001