| Literature DB >> 32394199 |
Yong-Xin Liu1,2,3, Yuan Qin4,5,6,7, Tong Chen8, Meiping Lu9, Xubo Qian9, Xiaoxuan Guo4,5,6, Yang Bai10,11,12,13.
Abstract
Advances in high-throughput sequencing (HTS) have fostered rapid developments in the field of microbiome research, and massive microbiome datasets are now being generated. However, the diversity of software tools and the complexity of analysis pipelines make it difficult to access this field. Here, we systematically summarize the advantages and limitations of microbiome methods. Then, we recommend specific pipelines for amplicon and metagenomic analyses, and describe commonly-used software and databases, to help researchers select the appropriate tools. Furthermore, we introduce statistical and visualization methods suitable for microbiome analysis, including alpha- and beta-diversity, taxonomic composition, difference comparisons, correlation, networks, machine learning, evolution, source tracing, and common visualization styles to help researchers make informed choices. Finally, a step-by-step reproducible analysis guide is introduced. We hope this review will allow researchers to carry out data analysis more effectively and to quickly select the appropriate tools in order to efficiently mine the biological significance behind the data.Entities:
Keywords: high-throughput sequencing; marker genes; metagenome; pipeline; reproducible analysis; visualization
Mesh:
Year: 2020 PMID: 32394199 PMCID: PMC8106563 DOI: 10.1007/s13238-020-00724-8
Source DB: PubMed Journal: Protein Cell ISSN: 1674-800X Impact factor: 14.870
Figure 1Advantages and limitations of HTS methods used in microbiome research. A Introduction to HTS methods for different levels of analysis. At the molecule-level, microbiome studies are divided into three types: microbe, DNA, and mRNA. The corresponding research techniques include culturome, amplicon, metagenome, metavirome, and metatranscriptome analyses. B The advantages and limitations of various HTS methods for microbiome analysis
Figure 2Workflow of commonly used methods for amplicon (A) and metagenomic (B) sequencing. Blue, orange, and green blocks represent input, intermediate, and output files, respectively. The text next to the arrow represents the method, with frequently used software shown in parentheses. Taxonomic and functional tables are collectively referred to as feature tables. Please see Table 1 for more information about the software listed in this figure
Introduction to software for amplicon and metagenomic analysis
| Name | Link | Description and advantages | Reference |
|---|---|---|---|
| QIIME | The most highly cited and comprehensive amplicon analysis pipeline, providing hundreds of scripts for analyzing various data types and visualizations | (Caporaso et al., | |
| QIIME 2 | This next-generation amplicon pipeline provides integrated command lines and GUI, and supports reproducible analysis and big data. Provides interactive visualization and Chinese tutorial documents and videos | (Bolyen et al., | |
| USEARCH | Alignment tool includes more than 200 subcommands for amplicon analysis with a small size (1 Mb), cross-platform, high-speed calculation, and free 32-bit version. The 64-bit version is commercial ($1485) | (Edgar, | |
| VSEARCH | A free USEARCH-like software tool. We recommend using it alone or in addition to USEARCH. Available as a plugin in QIIME 2 | (Rognes et al., | |
| Trimmomatic | Java based software for quality control of metagenomic raw reads | (Bolger et al., | |
| Bowtie 2 | Rapid alignment tool used to remove host contamination or for quantification | (Langmead and Salzberg, | |
| MetaPhlAn2 | Taxonomic profiling tool with a marker gene database from more than 10,000 species. The output is relative abundance of strains | (Truong et al., | |
| Kraken 2 | A taxonomic classification tool that uses exact | (Wood et al., | |
| HUMAnN2 | Based on the UniRef protein database, calculates gene family abundance, pathway coverage, and pathway abundance from metagenomic or metatranscriptomic data. Provide species’ contributions to a specific function | (Franzosa et al., | |
| MEGAN | A GUI, cross-platform software for taxonomic and functional analysis of metagenomic data. Supports many types of visualizations with metadata, including scatter plot, word clouds, Voronoi tree maps, clustering, and networks | (Huson et al., | |
| MEGAHIT | Ultra-fast and memory-efficient metagenomic assembler | (Li et al., | |
| metaSPAdes | High-quality metagenomic assembler but time-consuming and large memory requirement | (Nurk et al., | |
| MetaQUAST | Evaluates the quality of metagenomic assemblies, including N50 and misassemble, and outputs PDF and interactive HTML reports | (Mikheenko et al., | |
| MetaGeneMark | Gene prediction in bacteria, archaea, metagenome and metatranscriptome. Support Linux/MacOSX system. Provides webserver for online analysis | (Zhu et al., | |
| Prokka | Provides rapid prokaryotic genome annotation, calls metaProdigal (Hyatt et al., | (Seemann, | |
| CD-HIT | Used to construct non-redundant gene catalogs | (Fu et al., | |
| Salmon | Provides ultra-fast quantification of reads counts of genes using a | (Patro et al., | |
| metaWRAP | Binning pipeline includes 140 tools and supports conda install, default binning by MetaBAT, MaxBin, and CONCOCT. Provides refinement, quantification, taxonomic classification and visualization of bins | (Uritskiy et al., | |
| DAS Tool | Binning pipeline that integrates five binning software packages and performs refinement | (Sieber et al., |
Figure 3Overview of statistical and visualization methods for feature tables. Downstream analysis of microbiome feature tables, including alpha/beta-diversity (A/B), taxonomic composition (C), difference comparison (D), correlation analysis (E), network analysis (F), classification of machine learning (G), and phylogenetic tree (H). Please see Table 2 for more details
Introduction to various analysis and visualization methods
| Method | Scientific question | Visualization | Description and example reference |
|---|---|---|---|
| Alpha diversity | Within-sample diversity | Boxplot | Distribution (Edwards et al., |
| Rarefaction curve | Sample diversity changes with sequencing depth or evaluation of sequencing saturation (Beckers et al., | ||
| Venn diagram | Common or unique taxa (Ren et al., | ||
| Beta diversity | Distance among samples or groups | Unconstrained PCoA scatter plot | Major differences of samples showing group differences (Fig. |
| Constrained PCoA scatter plot | Major differences among groups (Zgadzaj et al., | ||
| Dendrogram | Hierarchical clustering of samples (Chen et al., | ||
| Taxonomic composition | Relative abundance of features | Stacked bar plot | Taxonomic composition of each sample (Beckers et al., |
| Flow or alluvial diagram | Relative abundance (RA) of taxonomic changes among seasons (Smits et al., | ||
| Sanky diagram | A variety of Venn diagrams showing changes in RA and common or unique features among groups (Smits et al., | ||
| Difference comparison | Significantly different biomarkers between groups | Volcano plot | A variety of scatter plots showing |
| Manhattan plot | A variety of scatter plots showing | ||
| Extend bar plot | Bar plot of RA combined with difference and confidence intervals (Parks et al., | ||
| Correlation analysis | Correlation between features and sample metadata | Scatter plot with linear fitting | Shows changes in features with time (Metcalf et al., |
| Corrplot | Correlation coefficient or distance triangular matrix visualized by color and/or shape (Zhang et al., | ||
| Heatmap | RA of features that change with time (Subramanian et al., | ||
| Network analysis | Global view correlation of features | Colored based on taxonomy or modules | Finding correlation patterns of features based on taxonomy (Fig. |
| Colors highlight important features | Highlighting important features and showing their positions and connections (Wang et al., | ||
| Machine learning | Classification groups or regression analysis for numeric metadata prediction | Heatmap | Colored block showing classification results (Fig. |
| Bar plot | Feature importance, RA (Zhang et al., | ||
| Treemap | Phylogenetic tree or taxonomy hierarchy | Phylogenetic tree or cladogram | Phylogenetic tree (Fig. |
| Circular tree map | Shows features in a hierarchy color bubble (Carrión et al., |
Useful websites or tools for reproducible analysis
| Resource | Links | Description |
|---|---|---|
| GSA | HTS data deposition and sharing. Fast data transfer, interfaces in both Chinese and English, automated submission, technical support via email or QQ group, and widely recognized by international journals | |
| Qiita | Platform for amplicon data deposition, analysis, and cross-study comparisons | |
| MGnify | Webserver for amplicon and metagenomic data deposition, sharing, analysis, and cross-study comparisons | |
| gcMeta | Webserver for amplicon and metagenomic data analysis, deposition, and sharing | |
| R Markdown | Uses a productive notebook interface to weave together narrative text and code to produce an elegantly formatted report in HTML or PDF format. Is becoming increasingly popular in microbiome research | |
| R Graph Gallery | R code for 42 chart types | |
| GitHub | Online code-saving and sharing platforms with version control systems. Supports searching |