| Literature DB >> 33918473 |
Bei Gao1, Liang Chi2, Yixin Zhu3, Xiaochun Shi4, Pengcheng Tu5, Bing Li6, Jun Yin7, Nan Gao8, Weishou Shen4,9, Bernd Schnabl3,10.
Abstract
The gut microbiome is a microbial ecosystem which expresses 100 times more genes than the human host and plays an essential role in human health and disease pathogenesis. Since most intestinal microbial species are difficult to culture, next generation sequencing technologies have been widely applied to study the gut microbiome, including 16S rRNA, 18S rRNA, internal transcribed spacer (ITS) sequencing, shotgun metagenomic sequencing, metatranscriptomic sequencing and viromic sequencing. Various software tools were developed to analyze different sequencing data. In this review, we summarize commonly used computational tools for gut microbiome data analysis, which extended our understanding of the gut microbiome in health and diseases.Entities:
Keywords: fungi; gut microbiota; virus
Mesh:
Substances:
Year: 2021 PMID: 33918473 PMCID: PMC8066849 DOI: 10.3390/biom11040530
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Commonly used sequencing techniques for the gut microbiome study.
Figure 216S rRNA sequencing data analysis pipeline.
Figure 3Shotgun metagenomic sequencing data analysis pipeline.
Examples of widely used tools to perform next generation sequencing data analysis for the gut microbiome studies.
| Software | Short Description | Ref. |
|---|---|---|
| 16S rRNA, 18S rRNA and ITS sequencing data analysis | ||
| UCLUST/ | UCLUST is an OTU-based clustering method. It employs USEARCH, and UPARSE is a subroutine of USEARCH which constructs OTUs de novo from next-generation reads. The general pipeline procedure of UPARSE is reads filtering, trimming, and then clustering and chimera filtering simultaneously. | [ |
| CD-HIT | CD-HIT is one of the most used OTU-based clustering tool to decrease redundancy of sequence and improve the performance of other analysis. | [ |
| Hc-OTU | Hc-OTU is an OTU-based clustering method for 16S rRNA sequence, employs homopolymer compaction and k-mer profiling. | [ |
| ESPRIT | ESPRIT is an OTU-based hierarchical clustering method consisting of quality filtering, computing pairwise distance, hierarchical clustering and estimate with statistical interference. There are two version of ESPRIT, one for personal computer (small/medium size data) and one for computer clusters (large size data). | [ |
| ESPRIT-Tree | ESPRIT-Tree is an OTU-based online-learning-based hierarchical clustering method. ESPRIT-TREE improves on previous ESPRIT algorithm and uses a pseudometric-based partition tree. | [ |
| DADA2 | DADA2 is an ASV-based analysis pipeline for modeling and error-correcting Illumina sequence reads. | [ |
| UNOISE2 | UNOISE 2 is an ASV-based tool for denoising (error-correcting) Illumina sequence reads. It is improved from UNOISE and clusters unique reads in the sequence. | [ |
| Deblur | Deblur is an ASV-based denoising tool, which uses error profiles to obtain putative error-free sequences. It operates independently on each sample. | [ |
| QIIME/ | QIIME and QIIME2 are bioinformatics platforms for microbial community analysis and visualizations. QIIME 2 is engineered based on QIIME and replaced QIIME. QIIME2 use existing bioinformatics tools as subroutines, such as DADA2, deblur, etc. | [ |
| Mothur | Mothur is a software analyzing raw sequences and generating visualization tools to describe α and β diversity. It is a combination of multiple analytic tools for describing and comparing microbial communities. It provides examples for data acquired from different sequencing platforms. | [ |
| PICRUSt/ | PICRUSt is a software for predicting functional composition based solely on marker gene sequence profiles. PICRUSt2 is the improved version of PICRUSt by having a larger reference database, enhanced prediction ability and more accurate de novo amplicon tree-building. | [ |
| Tax4Fun/ | Tax4Fun is an R package for predicting functional profiles for 16S rRNA data on the basis of SILVA-labeled OUT abundances. Tax4Fun 2 is an improved version of Tax4Fun with more accurate and enhanced prediction power. | [ |
| Piphillin | Piphillin is a web application that produces metagenome predictions based on the nearest-neighbor mappings of 16S rRNA sequences to genome. | [ |
| Vikodak/ | Vikodak is a web service that provides functional prediction on 16S rRNA data. It contains 3 modules: Global Mapper, Inter Sample Feature Analyzer, and Local Mapper. With these 3 modules, it is able to perform functional prediction both globally and in detail and perform pair-wise comparative statistical analysis. iVikodak is an improved version of Vikodak. | [ |
| SSU-ALIGN | SSU-ALIGN is designed primarily to align 16S and 18S small subunit ribosomal RNA, but can also be used for large subunit ribosomal RNA alignment. | [ |
| LotuS2 | LotuS2 is a software pipeline for 16S/18S/ITS rRNA analysis. It is able to calculate denoised, chimera-checked OTUs and construct OTU phylogenetic tree. | [ |
| MICCA | MICCS is a command-line software for the processing of 16S rRNA gene and ITS amplicon sequencing data, from raw sequences to OTU tables, taxonomic classification and phylogenetic tree inference. | [ |
| PEMA | PEMA is a software pipeline for metabarcoding analysis based on third-party tools. Its function includes read pre-processing, OTU clustering, ASV inference, taxonomy assignment, and COI marker gene analysis. | [ |
| ITScan | ITScan is an online pipeline for fungal diversity analysis and identification based on ITS sequences. | [ |
| ITSx | ITSx is a software for detection and extraction of the ITS1 and ITS2 subregions from ITS sequences for fungi and other eukaryotes. It relies on HMMER for profile hidden Markov model analysis. | [ |
| ITSxpress | ITSxpress is a software for ITS1, ITS2 or the entire ITS region trimming. It implements HMMER and BBMerge. It is designed to support the calling of exact sequence variants rather than OTUs. | [ |
| Mycofier | Mycofier is a machine-learning-based fungal ITS1 sequence classifier at the genus level. The final model was based on ITS1 sequences from 510 fungal genera using a Naïve Bayes algorithm. | [ |
| Shotgun metagenomic and metatranscriptomic sequencing data analysis | ||
| Trimmomatic | Trimmomatic is a sequence trimmer for Illumina sequence data. It has multiple processing steps including detection and removal of adapter and other illumine-specific sequences, and quality filtering. | [ |
| Ktrim | Ktrim provides both adapter- and quality-trimming of the sequencing data. | [ |
| Cutadapt | Cutadapt is a sequence trimmer which removes adapter sequences, primers and other types of unwanted sequence from high-throughput sequencing reads. | [ |
| MultiQC | MultiQC creates a summary report visualizing output from different tools across multiple samples, facilitating the identification of global trends and biases. | [ |
| Bowtie2 | Bowtie2 is a software for sequence alignment to reference genome. It supports gapped, local, and paired-end alignments. The software implements full-text minute index and SIMD dynamic programming. | [ |
| DIAMOND | DIAMOND is a sequence aligner for protein and translated DNA searches. It aims to determine all significant alignments for a given input. DIAMOND uses double indexing and spaced seeds. | [ |
| BBMap | BBMap is a sequence aligner that can align DNA and RNA sequencing reads from multiple platforms, including Illumina, 454, Sanger, Ion Torrent, Pac Bio, and Nanopore. BBMap needs to index a reference before mapping to it. | [ |
| Meta-IDBA | Meta-IDBA is a de novo metagenomic assembler. It first constructs de Bruijn graph and then divides graph into connected components. | [ |
| IDBA-UD | IDBA-UD is a de novo single-cell and metagenomic assembler, which can assemble sequences with highly uneven depth. It is based on de Bruijn graph approach. | [ |
| MetaVelvet | MetaVelvet is a de novo short sequence metagenome assembler. It is extended upon the Velvet assembler (single-genome and de Bruijn-graph based) to overcome the limitations of single-genome assembler. | [ |
| MegaHit | MegaHit is a de novo assembler for assembling metagenomics data. It implements succinct de Bruijn graphs. | [ |
| MetaQUAST | MetaQUAST evaluates and compares the quality of metagenome assemblies. It is improved based on QUAST. Its metagenome specific features includes: unlimited number of reference genome, species content detection, chimeric detection, and visualizations. | [ |
| MEGAN | MEGAN is a BLAST-based automated pipeline for taxonomic and functional analysis of metagenomic and metatranscriptomic datasets. | [ |
| MetaPhlAn/ | MetaPhlAn is an automated pipeline that profiles the microbial composition from shotgun metagenomic data at the species-level. The microbial community it can profile includes bacteria, archaea, eukaryotes and viruses. It accomplishes profiling with unique clade-specific marker genes. MetaPhlAn 2 is extended beyond the first version with enhanced metagenomic taxonomic profiling ability. | [ |
| HUMAnN2 | HUMAnN2 is an automated pipeline designed for functional analysis of metagenomic and metatranscriptomic data at the species-level. The general process of HUMAnN2 pipeline is identification of known species, alignment of reads to pangenomes, translated search on unclassified reads, and quantification of gene families and pathways. HUMAnN2 utilizes other pipelines such as MetaPhlAn2 to perform identification of known species. | [ |
| MG-RAST | MG-RAST is a web-based fully automated system for metagenomic analysis. It provides phylogenic and functional analysis. | [ |
| IMG/M | IGM/M is a web-based pipeline that provides comparative analysis for metagenome. It provides structural and functional annotation. Prefer assembled contigs. | [ |
| METAREP | METAREP is a suite of web-based tools to view and compare metagenomic annotated data including both functional and taxonomical assignments. | [ |
| CuffDiff | Cufflinks is a suite of programs that assembles transcriptomes, estimates abundance, and performs gene expression differentiations. It implements a parsimony-based algorithm. | [ |
| Blast2GO | Blast2Go is a Blast-based software that provides automatic functional annotation on DNA/protein sequences. It has multiple annotation styles that can be used for various conditions. | [ |
| Viromic sequencing data analysis | ||
| VICUNA | VICUNA is a de novo assembler targeting viral populations, which have high mutation rates. Its algorithm uses an overlap-layout-consensus based approach. The general process of VICUNA is trimming reads, constructing/clustering contigs, validating contigs, and then extending and merging contigs. | [ |
| Metavir/ | Metavir is a web-based pipeline specifically for viral metagenome analysis. Metavir 2 is developed based on Metavir with additional features such as new tools for assembled virome sequence analysis and new dataset comparison strategies.Pros: User-friendly interface. Able to perform analysis on both raw reads and assembled virome sequencesCons: Focuses on the compositional analysis. Functional annotation is lacking. | [ |
| VMGAP | VMGAP is an automated pipeline for functional annotation of viral shotgun metagenomic data. It first performs a database searches and then functional assignments. | [ |