Literature DB >> 35876544

MOCHI, a comprehensive cross-platform tool for amplicon-based microbiota analysis.

Jun-Jie Zheng¹, Po-Wen Wang¹, Tzu-Wen Huang², Yao-Jong Yang³, Hua-Sheng Chiu⁴, Pavel Sumazin⁴, Ting-Wen Chen^1,5,6.

Abstract

MOTIVATION: Microbiota analyses have important implications for health and science. These analyses make use of 16S/18S rRNA gene sequencing to identify taxa and predict species diversity. However, most available tools for analyzing microbiota data require adept programming skills and in-depth statistical knowledge for proper implementation. While long-read amplicon sequencing can lead to more accurate taxa predictions and is quickly becoming more common, practitioners have no easily accessible tools with which to perform their analyses.
RESULTS: We present MOCHI, a GUI tool for microbiota amplicon sequencing analysis. MOCHI preprocesses sequences, assigns taxonomy, identifies different abundant species and predicts species diversity and function. It takes either taxonomic count table or FASTQ of partial 16S/18S rRNA or full-length 16S rRNA gene as input. It performs analyses in real-time and visualizes data in both tabular and graphical formats. AVAILABILITY: MOCHI can be installed to run locally or accessed as a web tool at https://mochi.life.nctu.edu.tw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2022 PMID： 35876544 PMCID： PMC9477538 DOI： 10.1093/bioinformatics/btac494

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.931

1 Introduction

Over the past decades, researchers have revealed the critical role of microbiota in ecology, agriculture, fishery, medicine and health (Hacquard ; Yang ). In particular, numerous studies have shown the association between microbiota and human health, including obesity, infectious diseases and even mental hygiene (Honda and Littman, 2012; Ley ; Valles-Colomer ). Dissecting the human microbiome hence provides a new perspective for investigating biological topics. Traditional approaches to bacterial species identification rely heavily on laboratory culturing, but the majority of bacteria are unculturable with present-day laboratory techniques (Rappé and Giovannoni, 2003; Stewart, 2012), and the profile of bacteria is likely distorted due to environmental stress in the lab culture (Petti ). With the advent of next-generation sequencing (NGS), culture-independent sequencing-based microbiota analysis has become the foremost paradigm for microbiome analysis. Two sequencing methods, amplicon sequencing and metagenomic shotgun sequencing, are commonly employed for microbiome analysis. Metagenomic shotgun sequencing yields higher resolution of microbial taxonomy (Brumfield ), but it is relatively expensive and requires more computational workload for data processing. In contrast, amplicon sequencing, which is more cost-effective, provides higher coverage and demands lower computational workload, is currently the most common method for microbiome analysis and the predominant method used in Human Microbiome Project (HMP) (Huttenhower ; NIH Human Microbiome Portfolio Analysis Team, 2019). Traditionally, amplicon sequencing targets at partial 16S/18S ribosomal RNA gene and sequenced with NGS platform. The full-length microbial 16S rRNA gene sequences have the potential for classification of taxonomy at the species level and strain level (Benítez-Páez ; Johnson ; Kumar ). Recently, third-generation sequencing technology has been applied to generate full-length 16S rRNA gene sequences from microbiota and provides species-level resolution of microbiota (Quijada ). In general, microbiota sequence analysis consists of several steps—sequence preprocessing, taxonomy classification, taxonomy diversity comparisons, differential abundance analysis and functional analysis. A pivotal step is taxonomy classification for representative sequences. Because of PCR-induced errors or misincorporation of nucleotides in sequencing, variants may be introduced randomly in the sequence data. Because of the difficulty to distinguish between biological variants and technical errors, two strategies—operating taxonomic units (OTUs) and amplicon sequence variants (ASVs)—have been developed to minimize the confusion. The principle of OTUs is to align and cluster the 16S rRNA gene amplicons with a defined threshold of sequence similarity, which is set to 97% in several proposed methods (Edgar, 2018). Among the published OTU clustering algorithms, QIIME (Caporaso ), MOTHUR (Schloss ) and UPARSE (Edgar, 2013) are most commonly used. In later algorithms, denoising is incorporated to eliminate artificial errors by constructing error rate models with statistical methods, and the noiseless variants after denoising are termed ASVs. Currently, there are three widely used algorithms, DADA2 (Callahan ), Deblur (Amir ) and UNOISE3 (Edgar, 2016). Using ASVs to represent the taxonomic unit increases the possibility of detecting novel taxa and sharpens taxonomy classification to single nucleotide resolution (Callahan , 2019). The representative sequences are then used to search against a reference database such as SILVA (Quast ), Greengenes (DeSantis ) or PR2 (Guillou ), for taxonomy assignment. Even though no algorithm can perfectly picture the microbial structure in natural conditions, the denoising algorithms, i.e. ASVs, have become preferred to OTUs (Callahan ). The next step following taxonomy classification is comparison of microbiota profiles under different conditions. Two biodiversity indexes, alpha diversity and beta diversity, are often used for comparison of microbiota biodiversities. Alpha diversity represents the diversity within a sample, and beta diversity represents the diversity between two samples (Whittaker, 1972). For example, Caporaso generated time series data from two individuals at four body sites: gut, mouth, left and right palms. From beta diversity analysis, they found dynamic changes in the microbiota community at the same body site over time but constantly distinct microbiota compositions between gut, oral and skin (left and right palms). Alpha diversities of fecal samples also showed a rapid decrease in microbiota diversities after antibiotic therapy and a rapid return to similar diversities after the termination of antibiotic treatment. Characterizing the differences in microbial composition among different samples is also a major focus of microbiome analyses. A matrix of relative abundance is usually used for differential abundance analysis. However, several challenges arise when handling the microbiome relative abundance matrix. First, the matrix is usually sparse, meaning that almost 90% of features are zero (Paulson ), making it difficult to detect rare species. Second, the microbiome data are compositional (Chen and Li, 2016; Gloor and Reid, 2016; Gloor ; Xia ), which implies that fluctuations in one taxon change the relative abundance of other taxa even if the absolute quantities remain the same. Sparse and compositional data make standard statistical methods inapplicable to microbiota analyses (Weiss ). Several algorithms have been devised to deal with the problems that raise in microbiome analysis (Lin and Peddada, 2020; Morton ; Weiss ). Although these tools are powerful, they generally demand basic programming skills by the user. Here we present a user-friendly tool, MOCHI (Microbiota amplicOn CHaracterization Implement), designed for microbiome analyses based on 16S and 18S rRNA gene sequencing. The framework of MOCHI is powered by the R package of Shiny, and it is built on the Docker platform to achieve cross-platform compatibility. Users may decide whether to run the analysis on our web server or download a local stand-alone version to accelerate the process with multithreading in their own computational resource. Compared with other GUI microbiota analysis tools, MOCHI supports analysis from the raw data of partial and full-length ribosomal amplicon sequencing and provides the most comprehensive analysis function. Overall, MOCHI encompasses a variety of popular microbiota-analysis-specific tools and enables the user to complete microbiome analysis all on one webpage.

2 Availability and implementation

2.1 Design

MOCHI is developed using the bioinformatic tool QIIME2 and the R language. Specifically, QIIME2 is used to process the sequence data, including denoising, taxonomy classification and calculation of phylogenetic diversity. R is used for the statistical analysis for biodiversity indexes. Most of the microbiota diversity indexes are calculated with the R package vegan (Dixon, 2003) except for Faith PD and UniFrac, which are calculated with QIIME2. The R package Shiny is used to construct the user interface and interactive plots. MOCHI was built on the Docker platform (Merkel, 2014). It is available as a web server (https://mochi.life.nctu.edu.tw/ with a sequence upload limit of up to 20 Mb per file). MOCHI is also downloadable as a stand-along Docker image (https://hub.docker.com/repository/docker/dockerjjz/mochi_local), for implementation on a local computer running Linux, Windows or MacOS with at least 16 GB of RAM and 8 CPUs. The source code of MOCHI is available at https://github.com/v0369012/mochi_web_service.

2.2 Data analysis modules

MOCHI utilizes modules in QIIME2 (Bolyen ) and several microbiota R packages and presents the results with R Shiny. The analysis workflow of MOCHI is shown in Figure 1. MOCHI consists of three modules—Sequence Preprocessing, Taxonomy Analysis and Function Analysis which may be used sequentially or independently. Sequence Preprocessing includes sequence quality check, sequence summary, sequence filtering/denoising and taxonomy assignment. In Taxonomy Analysis, microbiota statistical analyses are conducted, and the results are presented in different interactive plots, e.g. relative abundance bar plots, alpha diversity boxplots, beta diversity Principal Coordinates Analysis (PCoA) plots and ANCOM volcano plots. Function Analysis focuses on predicting metabolically or ecologically relevant functions. Each of the modules provides parameters that may be customized or adjusted by users to generate interactive table/charts to help them explore the dataset extensively (see below).

Fig. 1.

The workflow of MOCHI. MOCHI comprises with three analysis modules which may be used either sequentially or independently. The first module, Sequence Preprocessing, accepts sequence raw data as input and conducts sequence quality checks, sequence denoising and taxonomy assignments. The output files from the first module are ASVs tables, taxonomy tables and representative sequences. The second and third modules, take ASVs tables, taxonomy tables, representative sequences and sample metadata as input. Taxonomy Analysis yields taxonomy tables, taxonomy plots, alpha diversity, beta diversity and offers statistical tests. Users may identify samples having higher alpha diversity or determine taxa having significantly different abundance. The third module, Function Analysis, predicts potential functions for taxonomy classification results based on Functional Annotation of Prokaryotic Taxa (FAPROTAX), a function database. All the tables and figures generated by MOCHI on the webpage are interactive. For some analysis, MOCHI provides options for users to customize the resulting plots

2.2.1 Sequences Preprocessing

Sequences Preprocessing further includes three steps: Sequence Summary, Sequence Denoising and Taxonomy Classification. Sequence Summary summarizes sequencing quality and the number of reads for each sample from raw FASTQ files and presents the results in tables and figures (Supplementary Fig. S1a and b). Sequence Denoising takes the primer sequences as input and performs sequence quality filtering, merging paired reads and chimera removal based on the DADA2 protocol (Callahan ). Users may customize the criteria for trimming and chimera removing. To ensure the sequencing depth is sufficiently high for a representative microbiota profile, a rarefaction curve is usually drawn to find a minimum library size (Gotelli and Colwell, 2001). Therefore, MOCHI also presents a rarefaction plot based on randomly sampled reads to show correlation between the identified number of ASVs and the sequencing library size (Supplementary Fig. S1c). Taxonomy Classification assigns taxonomy to the denoised-sequences based on SILVA (Quast ), Greengenes (DeSantis ) and the PR2 (Guillou ) reference sequence database. This final step generates an ASVs table, taxonomic table and ASVs representative sequences, which provide input files for the second module. In Sequences Preprocessing, all the parameters used, computing time used, analysis date and relevant information are placed under the ‘Log’ tab. MOCHI assigns a unique random number as the job ID for the sequences uploaded to the MOCHI website. Users may use the job ID to retrieve results and parameters in their analyses.

2.2.2 Taxonomy Analysis

Taxonomy Analysis takes the sample metadata, taxonomy table and ASVs table as inputs to produce visualizations and statistical analyses. For taxonomy information, MOCHI integrates the sample metadata and taxonomy table and presents taxonomy information in tables and figures. Samples may be grouped based on conditions specified in the metadata file uploaded. The single-end 16S rRNA gene sequencing dataset from Caporaso will be used as an example here. The taxonomic table displays the read counts of each taxon in the sample (Fig. 2a). The taxonomic bar plot and heatmap show relative abundance of taxa and log-transform relative abundance in an interactive bar plot and heatmap, respectively (Fig. 2b and c). For the heatmap, a small value of 0.01 is added before log-transformation to prevent taking logarithms of zero. Users may select the taxonomic level (kingdom, phylum, class, etc.) for display. Users may also show the top abundant taxon by choosing a value from top N scroll bar. By selecting a value of N, the union of the top N abundant taxa in each sample will be shown in the taxonomic bar plot. Additionally, MOCHI provides interactive multilayered pie charts for each sample generated with Krona (Fig. 2d) (Ondov ).

Fig. 2.

User-interactive table and plots generated for taxonomy profiles with MOCHI. (a) A taxonomic table shows the taxonomic read counts and numbers of taxonomic levels. (b) A bar plot shows relative abundance for the union of the top five most abundant taxa identified in four body sites. (c) A heatmap shows log-transformed relative abundance. For bar plot and heatmap, the user may regroup samples with group information provided in metadata. Also, MOCHI offers different taxonomy levels for users to explore the taxonomy profiles. By selecting the level of interest, the user can readily get an updated plot on the fly. (d) A multilayered pie chart for exploring taxonomy composition in each sample. The pie chart is adapted from Krona In Taxonomy Analysis, MOCHI also provides alpha/beta diversity indexes and comparisons between samples. MOCHI is equipped with seven alpha diversity indexes: Abundance-based Coverage Estimators (ACE) (Chao and Yang, 1993), Shannon diversity (Shannon and Weaver, 1964), Simpson diversity (Simpson, 1949), InvSimpson diversity (Hill, 1973), Shannon evenness (Keylock, 2005), Simpson evenness (Mulder ) and Faith’s phylogenetic diversity (Faith PD) (Faith, 1992). With the desired diversity index selected, MOCHI shows the distribution of alpha indexes across the samples in a grouped boxplot. Users may choose sample grouping in the metadata file for MOCHI to determine whether the differences between alpha diversities in different groups are statistically significant. Common parametric and nonparametric statistical methods, i.e. analysis of variance (ANOVA) and the Kruskal–Wallis test, and their corresponding post hoc tests, the Tukey and Dunn tests are offered for group comparisons. Regarding beta diversity, MOCHI provides user-interactive heatmaps to show the Bray–Curtis dissimilarity matrix (Bray and Curtis, 1957) and UniFrac (Lozupone and Knight, 2005). Three widely used dimension-reduction methods, Principal Component Analysis (PCA), PCoA and nonmetric multidimensional scaling (NMDS), are implemented for visualization of the microbiota composition similarities. Notably, MOCHI provides both 2D and 3D plots for PCA and PCoA, with the top six PCs listed for users to select as axes. MOCHI also offers three methods for beta diversity comparison—PERMANOVA (Permutational Multivariate Analysis of Variance) (Anderson, 2005), ANOSIM (Analysis of similarities) (Clarke and Green, 1988) and MRPP (Multiple Response Permutation Procedure) (Mielke )—followed by pairwise tests and Benjamini–Hochberg multiple test corrections (Benjamini, 1995) to obtain corrected P values. In addition to diversity comparisons, users may also identify taxa with significantly different abundance with ANCOM, which detects significantly abundant taxa from microbial compositional data (Mandal ). Users may choose the taxonomic level of interest for this analysis (Mandal ).

2.2.3 Function Analysis

Function Analysis predicts the metabolically or ecologically relevant functions of the microbiome based on the reference database FAPROTAX (Louca ). MOCHI presents the prediction results in a table and a user-interactive bar plot. The relative percentages are calculated from the read counts of a function type divided by the total read counts in the sample. Just like other analysis functions in MOCHI, the user may group the relative function abundancies from the samples using the conditions defined in the uploaded metadata file.

2.3 Input file formats

Users may start their analysis using the Sequence Preprocessing module by uploading sequence files and metadata information for the samples. The demultiplexed sequence files (FASTQ files) must be provided in the gzip compressed format. The metadata of samples must be provided in tab-separated values (TSV) format. For users who have already performed the taxonomy classification, they may start Taxonomy Analysis by uploading metadata information, the ASVs file and the taxonomy table. The latter two files are generated by sequence preprocessing and taxonomy assignment tools such as QIIME2. For the Function Analysis, users may upload the metadata and taxonomy table file for functional prediction of the microbiota. For each analysis step, MOCHI provides demo files, which users may download so as to inspect the input file format.

3 Results

3.1 Demonstration of MOCHI using public datasets

To demonstrate MOCHI, we conducted microbiome analysis of four public datasets (Caporaso ; Hernández ; Quijada ; Suenami ). If available, the same denoising method, taxonomy database and parameters as the one used in the original dataset research articles were used for raw reads processing and taxonomy assignment in MOCHI. The dataset information, taxonomy database and computation time usage are summarized in Table 1. After identifying taxonomy from the four datasets, we explored the most abundant taxa among the results obtained from MOCHI and the original studies (Supplementary Tables S1–S4). The top abundant taxa in each dataset were consistent. Moreover, differences between the number of identified ASVs were <3%.

Table 1.

Features of the datasets analyzed and computation time used in MOCHI

Dataset	Sample size	Sequence type	Number of reads	Variable region	Taxonomy database	Computation time^a
Dataset	Sample size	Sequence type	Number of reads	Variable region	Taxonomy database	SS	SD	TC
Caporaso2011	34	Single-end	263 878	V4	GREENGENES (16S rRNA)	1.78 m	1.9 m	1.4 m
Suenami2019	17	Paired-end	2 197 558	V4	SILVA (16S rRNA)	2.33 m	3.0 m	2.3 h
Hernández2018	65	Paired-end	14 474 241	V3–V4	SILVA (16S rRNA)	16.2 m	41.3 m	3.3 h
Quijada2020	10	Long-read	1 102 834	V1–V9	SILVA (16S rRNA, full-length)	46.0 s	35.9 m	1.2 h

Computation time for Sequence Summary, Sequence Denoising and Taxonomy Classification is tabulated in that order. The analyses were executed on a Linux server with eight CPUs (3.70 GHz) and 64 GB RAM.

Features of the datasets analyzed and computation time used in MOCHI Computation time for Sequence Summary, Sequence Denoising and Taxonomy Classification is tabulated in that order. The analyses were executed on a Linux server with eight CPUs (3.70 GHz) and 64 GB RAM. We further explored the alpha and beta diversity in Suenami , which compared the gut microbiota from two hornets—Vespa mandarinia and Vespa simillima. The alpha diversity boxplots (Fig. 3a) showed that the Shannon diversities in the two species are significantly different. The PCoA plot of beta diversity evaluated by Bray–Curtis dissimilarity distances showed similarity/dissimilarity between the microbial composition from the two species (Fig. 3b). Statistical tests by PERMAONVA, ANOSIM and MRPP show that the microbial composition in the two species differs significantly. These results are consistent with the findings of Suenami . These results demonstrated that MOCHI can perform the same analyses and lead to the same conclusions without the need for advanced programming and statistical skills.

Fig. 3.

Boxplot and PCoA analysis for microbiota diversity in Suenami ) dataset. Suenami compared the gut microbiota originating from two hornets, Vespa mandarinia and Vespa simillima, which are shortened to Vman and Vsim in the figures. (a) The boxplots show the alpha diversity for microbiota identified in two groups. Four different alpha diversity indexes: ACE, Shannon diversity, Faith’s PD and Shannon evenness are shown as examples. MOCHI performed statistical tests on the alpha diversities between the two groups. The KW tests and P values are shown at the bottom. (b) The PCoA plot presents beta diversity and Bray–Curtis distances for 17 samples. Samples from Vman and Vsim are labeled with blue and red, respectively. MOCHI also revealed a significant difference in Bray–Curtis distance between these two groups, using the three statistical tests: PERMANOVA, ANOSIM and MRPP, for which the P values were 0.006, 0.002 and 0.003, respectively (A color version of this figure appears in the online version of this article.) In addition, MOCHI offers detection of differential abundant taxa and prediction of functional profiles in microbiota. To demonstrate, we used the dataset Quijada2020, which includes full-length 16S rRNA gene amplicon sequences from microbiota sampled at different times during cheese ripening (Quijada ). ANCOM identified Lactobacillus as the only significantly populated genus among all samples collected at different times. This provides statistical evidence supporting the observation made by Quijada (Fig. 4a). Furthermore, MOCHI predicted at least one function for 72.5% out of all the identified taxa. Figure 4b shows wide variations in relative abundance of fermentation-capable taxa on different days during cheese ripening. The high abundance at Day 0 (about 4∼10 times as much as at Days 14, 30, 90 and 160) is presumably due to the starter cultures (Lactobacillus and Streptococcus) added at the beginning of the process. Our results demonstrate that MOCHI provides not only a basic of amplicon microbiota exploration but also functional prediction and advanced statistical analysis.

Fig. 4.

Differential abundance analysis and function prediction results for the Quijada2020 dataset. Quijada2020 took microbiota from different time points during cheese ripening. (a) MOCHI identified Lactobacillus as the only significantly different abundant taxon among different days during cheese ripening with ANCOM. (b) Bar plot of one predicted function, fermentation, showing the relative abundance of fermentation-capable taxa at different days. The bar plot shows the abundance of taxa carrying genes involved in fermentation at Days 0, 14, 30, 90 and 160, with average relative abundances 29%, 3%, 7%, 4% and 6%, respectively. Each error bar represents one standard deviation

3.2 Comparison with existing tools

Most microbiome amplicon analysis tools are available in the form of a website (Chong ; Dhariwal ; Huse ; Keegan ; Mitchell ; Zakrzewski ). MOCHI, in addition to being available as a website, may provide a stand-alone GUI tool (Table 2). Stand-alone MOCHI allows users to process data locally, which avoids restrictions imposed by network communication and concerns about data breach. Additionally, given the time and computational power needed for processing the raw sequencing data, all existing tools capable of dealing with raw data, i.e. MGnify, MG-RAST and VAMPS, require registration. The webserver version of MOCHI provides a platform for quick explorations by users to analyze small datasets without registration. For users with large datasets, the stand-alone version allows users to investigate their datasets without waiting in a queue. Moreover, MOCHI is the only web-based tool known that can handle long-read, full-length 16S rRNA produced by third-generation sequencing, and it has gained increasing popularity in microbiota studies (Benítez-Páez ; Kumar ).

Table 2.

Comparison of MOCHI with other GUI tools for microbiota analysis

Tools	MOCHI	MGnify (2020)	MicrobiomeAnalyst (2017 , 2020)	Calypso (2017)	MG-RAST (2016)	VAMPS (2014)
Platform	Website, stand-alone	Website	Website	Website	Website	Website
Registration	No	Yes	No	No	Yes	Yes
Input data type	16S rRNA, 18S rRNA	16S rRNA, 18S rRNA	16S rRNA	16S rRNA	16S rRNA, 18S rRNA	16S rRNA, 18S rRNA
File format	Sequence/count table	Sequences	Count table	Count table	Sequences	Sequences
Full-length 16S rRNA^a	Supported	No	Not applicable	Not applicable	No	Supported (VAMPS2)
Taxonomy database	SILVA, GREENGENES, PR2	SILVA, ITSoneDB, UNITE	No	No	SILVA, GREENGENES, RDP, ITS	SILVA
Rarefaction plot	Yes	No	Yes	Yes	Yes	No
Abundance heatmap	Yes	No	Yes	Yes	Yes	Yes
Alpha diversity^b	Multiple (7)	No	Multiple (6)	Multiple (8)	Shannon	Multiple (5)
Alpha diversity test^c	ANOVA/K-W test	No	ANOVA	ANOVA	No	No
Post hoc test for alpha diversity^d	Tukey test/Dunn test	No	No	No	No	No
Beta diversity	Bray–Curtis, unweighted unifrac, weighted unifrac	No	Bray–Curtis, Jensen–Shannon divergence, Jaccard, unweighted unifrac, weighted unifrac	Unifrac, Bray–Curtis, Jaccard, Yue and Clayton, Chao, Bionomial, Manhattan, Euclidean, Pearson’s cor, Spearman cor, Hamming	Bray–Curtis, Euclidean, Manhattan, maximum, Minkowski	Morisita-Horn
Distance heatmap	Yes	No	No	No	No	Yes
Dimension reduction (beta diversity)	PCA, PCoA, NMDS	No	PCoA, NMDS	PCA, PCoA, NMDS, CCA, RDA	PCoA	PCoA, NMDS
Beta diversity test	PERMANOVA, ANOSIM, MRPP	No	PERMANOVA, ANOSIM, PERMDISP	PERMANOVA, ANOSIM, PERMDISP	No	No
Post hoc test for beta diversity	Yes	No	No	No	No	No
Differential abundant taxa identification	ANCOM	No	metagenomeSeq, edgeR, DESeq2	ANCOM, DESeq2, ALDEx2	No	No
Function prediction/annotation	FAPROTAX	KEGG, Pfam	PICRUSt, Tax4Fun	No	SEED, KEGG, COG, EggNOG	No

MicrobiomeAnalyst and Calypso take count table as input instead of raw sequences. VAMPS2 supports full-length 16S rRNA analysis.

Number within parentheses indicates how many alpha diversity indexes were provided.

The statistical test methods between multiple group for parametric and nonparametric data are ANOVA and K-W test, respectively.

The post hoc test for parametric and non-parametric data are Tukey test and Dunn test, respectively.

Comparison of MOCHI with other GUI tools for microbiota analysis MicrobiomeAnalyst and Calypso take count table as input instead of raw sequences. VAMPS2 supports full-length 16S rRNA analysis. Number within parentheses indicates how many alpha diversity indexes were provided. The statistical test methods between multiple group for parametric and nonparametric data are ANOVA and K-W test, respectively. The post hoc test for parametric and non-parametric data are Tukey test and Dunn test, respectively. MOCHI simplifies the procedures of sequence preprocessing and estimates default parameters for users to conduct the analysis without prerequisite knowledge. MOCHI also provides the most comprehensive biodiversity indexes and statistical methods—substantially more than MGnify, MG-RAST and VAMPS (Table 2). While MicrobiomeAnalyst provides ANOVA for comparing alpha diversities between groups, it does not offer post hoc tests for pairwise comparisons. Calypso provides one alpha diversity index, Shannon, but no statistical comparisons. On the contrary, MOCHI provides seven alpha diversity indexes and offers both the ANOVA/K-W test and post hoc test to compare the differences between alpha diversity indexes. Similarly, even though MicrobiomeAnalyst, Calypso and MOCHI provide PERMANOVA, ANOSIM and MRPP/PERMDISP to compare beta diversity, only MOCHI presents pairwise tests for beta diversity. As regards differential abundant taxa detection, MOCHI provides ANCOM, which is specifically developed for sparse and compositional microbiota data. ANCOM utilizes the inter-taxa ratio to identify differentially abundant taxa from compositional tables (Mandal ). Among all existing tools, MicrobiomeAnalyst and Calypso provide different abundant taxa identification with metagenomeSeq (Paulson ), edgeR (Robinson ), DESeq2 (Love ), ANCOM and ALDEx2 (Fernandes ). Among these, metagenomSeq and ANCOM are designed for sparse high-throughput sequencing data while edgeR, DESeq2 and ALDEx2 were originally developed for RNA-seq data. Among the existing tools, MGnify, MicrobiomeAnalyst and MG-RAST also provide function prediction/annotation, like MOCHI, but they are based on different databases: KEGG (Kanehisa ), Pfam (Finn ), SEED (Overbeek ), COG (Galperin ), EggNOG (Huerta-Cepas ; Jensen ), PICRUSt (Langille ) and Tax4Fun (Aßhauer ). Among these, PICRUSt and Tax4Fun are specifically developed for function prediction of microbiota.

4 Discussions and conclusions

MOCHI is a microbiota amplicon analysis platform equipped with comprehensive analytical and statistical tools for data processing and presentations. It may be used as a web service or implemented locally as a secure and efficient stand-alone operation. The three modules in MOCHI may be used independently and both the raw sequences or processed count tables may be used as input. In the future, approaches for differential abundance analysis and demultiplex modules will be considered incorporating to MOCHI. In summary, MOCHI offers a comprehensive analytical pipeline from raw sequences to statistical visualization. Click here for additional data file.

59 in total

1. Growing unculturable bacteria.

Authors: Eric J Stewart
Journal: J Bacteriol Date: 2012-06-01 Impact factor: 3.490

2. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data.

Authors: Eric Z Chen; Hongzhe Li
Journal: Bioinformatics Date: 2016-05-14 Impact factor: 6.937

Review 3. Microbial genome analysis: the COG approach.

Authors: Michael Y Galperin; David M Kristensen; Kira S Makarova; Yuri I Wolf; Eugene V Koonin
Journal: Brief Bioinform Date: 2019-07-19 Impact factor: 11.622

4. The neuroactive potential of the human gut microbiota in quality of life and depression.

Authors: Sara Vieira-Silva; Jeroen Raes; Mireia Valles-Colomer; Gwen Falony; Youssef Darzi; Ettje F Tigchelaar; Jun Wang; Raul Y Tito; Carmen Schiweck; Alexander Kurilshikov; Marie Joossens; Cisca Wijmenga; Stephan Claes; Lukas Van Oudenhove; Alexandra Zhernakova
Journal: Nat Microbiol Date: 2019-02-04 Impact factor: 17.745

Review 5. The microbiome in infectious disease and inflammation.

Authors: Kenya Honda; Dan R Littman
Journal: Annu Rev Immunol Date: 2012-01-06 Impact factor: 28.527

6. Moving pictures of the human microbiome.

Authors: J Gregory Caporaso; Christian L Lauber; Elizabeth K Costello; Donna Berg-Lyons; Antonio Gonzalez; Jesse Stombaugh; Dan Knights; Pawel Gajer; Jacques Ravel; Noah Fierer; Jeffrey I Gordon; Rob Knight
Journal: Genome Biol Date: 2011 Impact factor: 13.583