| Literature DB >> 35876544 |
Jun-Jie Zheng1, Po-Wen Wang1, Tzu-Wen Huang2, Yao-Jong Yang3, Hua-Sheng Chiu4, Pavel Sumazin4, Ting-Wen Chen1,5,6.
Abstract
MOTIVATION: Microbiota analyses have important implications for health and science. These analyses make use of 16S/18S rRNA gene sequencing to identify taxa and predict species diversity. However, most available tools for analyzing microbiota data require adept programming skills and in-depth statistical knowledge for proper implementation. While long-read amplicon sequencing can lead to more accurate taxa predictions and is quickly becoming more common, practitioners have no easily accessible tools with which to perform their analyses.Entities:
Year: 2022 PMID: 35876544 PMCID: PMC9477538 DOI: 10.1093/bioinformatics/btac494
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.The workflow of MOCHI. MOCHI comprises with three analysis modules which may be used either sequentially or independently. The first module, Sequence Preprocessing, accepts sequence raw data as input and conducts sequence quality checks, sequence denoising and taxonomy assignments. The output files from the first module are ASVs tables, taxonomy tables and representative sequences. The second and third modules, take ASVs tables, taxonomy tables, representative sequences and sample metadata as input. Taxonomy Analysis yields taxonomy tables, taxonomy plots, alpha diversity, beta diversity and offers statistical tests. Users may identify samples having higher alpha diversity or determine taxa having significantly different abundance. The third module, Function Analysis, predicts potential functions for taxonomy classification results based on Functional Annotation of Prokaryotic Taxa (FAPROTAX), a function database. All the tables and figures generated by MOCHI on the webpage are interactive. For some analysis, MOCHI provides options for users to customize the resulting plots
Fig. 2.User-interactive table and plots generated for taxonomy profiles with MOCHI. (a) A taxonomic table shows the taxonomic read counts and numbers of taxonomic levels. (b) A bar plot shows relative abundance for the union of the top five most abundant taxa identified in four body sites. (c) A heatmap shows log-transformed relative abundance. For bar plot and heatmap, the user may regroup samples with group information provided in metadata. Also, MOCHI offers different taxonomy levels for users to explore the taxonomy profiles. By selecting the level of interest, the user can readily get an updated plot on the fly. (d) A multilayered pie chart for exploring taxonomy composition in each sample. The pie chart is adapted from Krona
Features of the datasets analyzed and computation time used in MOCHI
| Dataset | Sample size | Sequence type | Number of reads | Variable region | Taxonomy database | Computation time | ||
|---|---|---|---|---|---|---|---|---|
| SS | SD | TC | ||||||
| Caporaso | 34 | Single-end | 263 878 | V4 | GREENGENES (16S rRNA) | 1.78 m | 1.9 m | 1.4 m |
| Suenami | 17 | Paired-end | 2 197 558 | V4 | SILVA (16S rRNA) | 2.33 m | 3.0 m | 2.3 h |
| Hernández | 65 | Paired-end | 14 474 241 | V3–V4 | SILVA (16S rRNA) | 16.2 m | 41.3 m | 3.3 h |
| Quijada | 10 | Long-read | 1 102 834 | V1–V9 | SILVA (16S rRNA, full-length) | 46.0 s | 35.9 m | 1.2 h |
Computation time for Sequence Summary, Sequence Denoising and Taxonomy Classification is tabulated in that order. The analyses were executed on a Linux server with eight CPUs (3.70 GHz) and 64 GB RAM.
Fig. 3.Boxplot and PCoA analysis for microbiota diversity in Suenami ) dataset. Suenami compared the gut microbiota originating from two hornets, Vespa mandarinia and Vespa simillima, which are shortened to Vman and Vsim in the figures. (a) The boxplots show the alpha diversity for microbiota identified in two groups. Four different alpha diversity indexes: ACE, Shannon diversity, Faith’s PD and Shannon evenness are shown as examples. MOCHI performed statistical tests on the alpha diversities between the two groups. The KW tests and P values are shown at the bottom. (b) The PCoA plot presents beta diversity and Bray–Curtis distances for 17 samples. Samples from Vman and Vsim are labeled with blue and red, respectively. MOCHI also revealed a significant difference in Bray–Curtis distance between these two groups, using the three statistical tests: PERMANOVA, ANOSIM and MRPP, for which the P values were 0.006, 0.002 and 0.003, respectively (A color version of this figure appears in the online version of this article.)
Fig. 4.Differential abundance analysis and function prediction results for the Quijada2020 dataset. Quijada2020 took microbiota from different time points during cheese ripening. (a) MOCHI identified Lactobacillus as the only significantly different abundant taxon among different days during cheese ripening with ANCOM. (b) Bar plot of one predicted function, fermentation, showing the relative abundance of fermentation-capable taxa at different days. The bar plot shows the abundance of taxa carrying genes involved in fermentation at Days 0, 14, 30, 90 and 160, with average relative abundances 29%, 3%, 7%, 4% and 6%, respectively. Each error bar represents one standard deviation
Comparison of MOCHI with other GUI tools for microbiota analysis
| Tools | MOCHI |
|
|
|
|
|
|---|---|---|---|---|---|---|
| Platform | Website, stand-alone | Website | Website | Website | Website | Website |
| Registration | No | Yes | No | No | Yes | Yes |
| Input data type | 16S rRNA, 18S rRNA | 16S rRNA, 18S rRNA | 16S rRNA | 16S rRNA | 16S rRNA, 18S rRNA | 16S rRNA, 18S rRNA |
| File format | Sequence/count table | Sequences | Count table | Count table | Sequences | Sequences |
| Full-length 16S rRNA | Supported | No | Not applicable | Not applicable | No | Supported (VAMPS2) |
| Taxonomy database | SILVA, GREENGENES, PR2 | SILVA, ITSoneDB, UNITE | No | No | SILVA, GREENGENES, RDP, ITS | SILVA |
| Rarefaction plot | Yes | No | Yes | Yes | Yes | No |
| Abundance heatmap | Yes | No | Yes | Yes | Yes | Yes |
| Alpha diversity | Multiple (7) | No | Multiple (6) | Multiple (8) | Shannon | Multiple (5) |
| Alpha diversity test | ANOVA/K-W test | No | ANOVA | ANOVA | No | No |
|
| Tukey test/Dunn test | No | No | No | No | No |
| Beta diversity | Bray–Curtis, unweighted unifrac, weighted unifrac | No | Bray–Curtis, Jensen–Shannon divergence, Jaccard, unweighted unifrac, weighted unifrac | Unifrac, Bray–Curtis, Jaccard, Yue and Clayton, Chao, Bionomial, Manhattan, Euclidean, Pearson’s cor, Spearman cor, Hamming | Bray–Curtis, Euclidean, Manhattan, maximum, Minkowski | Morisita-Horn |
| Distance heatmap | Yes | No | No | No | No | Yes |
| Dimension reduction (beta diversity) | PCA, PCoA, NMDS | No | PCoA, NMDS | PCA, PCoA, NMDS, CCA, RDA | PCoA | PCoA, NMDS |
| Beta diversity test | PERMANOVA, ANOSIM, MRPP | No | PERMANOVA, ANOSIM, PERMDISP | PERMANOVA, ANOSIM, PERMDISP | No | No |
| Post hoc test for beta diversity | Yes | No | No | No | No | No |
| Differential abundant taxa identification | ANCOM | No | metagenomeSeq, edgeR, DESeq2 | ANCOM, DESeq2, ALDEx2 | No | No |
| Function prediction/annotation | FAPROTAX | KEGG, Pfam | PICRUSt, Tax4Fun | No | SEED, KEGG, COG, EggNOG | No |
MicrobiomeAnalyst and Calypso take count table as input instead of raw sequences. VAMPS2 supports full-length 16S rRNA analysis.
Number within parentheses indicates how many alpha diversity indexes were provided.
The statistical test methods between multiple group for parametric and nonparametric data are ANOVA and K-W test, respectively.
The post hoc test for parametric and non-parametric data are Tukey test and Dunn test, respectively.