| Literature DB >> 30666248 |
Muneer Ahmad Malla1, Anamika Dubey2, Ashwani Kumar2, Shweta Yadav1, Abeer Hashem3,4, Elsayed Fathi Abd Allah5.
Abstract
The interaction between the human microbiome and immune system has an effect on several human metabolic functions and impacts our well-being. Additionally, the interaction between humans and microbes can also play a key role in determining the wellness or disease status of the human body. Dysbiosis is related to a plethora of diseases, including skin, inflammatory, metabolic, and neurological disorders. A better understanding of the host-microbe interaction is essential for determining the diagnosis and appropriate treatment of these ailments. The significance of the microbiome on host health has led to the emergence of new therapeutic approaches focused on the prescribed manipulation of the host microbiome, either by removing harmful taxa or reinstating missing beneficial taxa and the functional roles they perform. Culturing large numbers of microbial taxa in the laboratory is problematic at best, if not impossible. Consequently, this makes it very difficult to comprehensively catalog the individual members comprising a specific microbiome, as well as understanding how microbial communities function and influence host-pathogen interactions. Recent advances in sequencing technologies and computational tools have allowed an increasing number of metagenomic studies to be performed. These studies have provided key insights into the human microbiome and a host of other microbial communities in other environments. In the present review, the role of the microbiome as a therapeutic agent and its significance in human health and disease is discussed. Advances in high-throughput sequencing technologies for surveying host-microbe interactions are also discussed. Additionally, the correlation between the composition of the microbiome and infectious diseases as described in previously reported studies is covered as well. Lastly, recent advances in state-of-the-art bioinformatics software, workflows, and applications for analysing metagenomic data are summarized.Entities:
Keywords: bioinformatics; diseases; dysbiosis; host-microbe interactions; human microbiome; metagenomics; microbes; next generation sequencing
Mesh:
Year: 2019 PMID: 30666248 PMCID: PMC6330296 DOI: 10.3389/fimmu.2018.02868
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Timeline of the sequence-based metagenomic projects showing the variety of the environmental samples.
Figure 2Tools and web servers related to gut microbiome studies.
Figure 3Microbiome analysis workflow.
Advantages and limitations of available Next generation sequencing (NGS) platforms.
| Sequencing by ligation or SOLiD sequencing | This sequencing method has been reported to have problems in sequencing particularly palindromic sequences and relatively slower than other methods. | Relatively cheap | SOLiD 5500 Wildfire | 50 (SES) | 80 Gb | ~700 M | 6 days |
| 75 (SES) | 120 Gb | ||||||
| 50 (SES) | 160 Gb | ||||||
| SOLiD 5500xl | 50 (SES) | 160 Gb | ~1.4 bn | 10 days | |||
| 75 (SES) | 240 Gb | ||||||
| 50 (SES) | 320 Gb | ||||||
| BGISEQ-500 FCS155 | 50–100 (SES/PES) | 8–40 Gb | NA | 24 h | |||
| BGISEQ-500 FCL155 | 50–100 (SES/PES) | 40–200 Gb | NA | 24 h | |||
| Sequencing by synthesis:CRT | Equipment are very expensive. Requires high concentration of DNA. | Potential for high sequencing yield, depending upon sequencer model and desired application | Illumina MiniSeq Mid Output | 150 (SES) | 2.1–2.4 Gb | 14–16 M | 17 h |
| Illumina MiniSeq High output | 75 (SES) | 1.6–1.8 Gb | 22–25 M(SES) | 7 h | |||
| 75 (PES) | 3.3–3.7 Gb | 44– 50 M(PES) | 13 h | ||||
| 150 (PES) | 6.6–7.5 Gb | 24 h | |||||
| Illumina MiSeq v2 | 36 (SES) | 540–610 Mb | 12–15M (SES) | 4 h | |||
| 25 (PES) | 750–850 Mb | 24–30 M (PES) | 5.5 h | ||||
| 150 (PES) | 4.5–5.1 Gb | 24 h | |||||
| 250 (PES) | 7.5–8.5 Gb | 39 h | |||||
| Illumina MiSeq v3 | 75 (PES) | 3.3–3.8 Gb | 44–50 M (PES) | 21–56 h | |||
| 300 (PES) | 13.2–15 Gb | ||||||
| Illumina NextSeq 500/550 Mid output | 75 (PES) | 16–20 Gb | Up to 260 M (PES) | 15 h | |||
| 150 (PES) | 32–40 Gb | 26 h | |||||
| Illumina NextSeq 500/550 High output | 75 (SES) | 25–30 Gb | 400 M(SES) | 11 h | |||
| 75 (PES) | 50–60 Gb | 800 M(PES) | 18 h | ||||
| 150 (PES) | 100–120 Gb | 29 h | |||||
| Illumina HiSeq2500v2 Rapid run | 36 (SES) | 9–11 Gb | 300 M(SES) | 7 h | |||
| 50 (PES) | 25–30 Gb | 600 M(PES) | 16 h | ||||
| 100 (PES) | 50–60 Gb | 27 h | |||||
| 150 (PES) | 75–90 Gb | 40 h | |||||
| 250 (PES) | 125–150 Gb | 60 h | |||||
| Illumina HiSeq2500 v3 | 36 (SES) | 47–52 Gb | 1.5 bn (SE) | 2 days | |||
| 50 (PES) | 135–150 Gb | 3 bn(PES) | 5.5 days | ||||
| 100 (PES)+ | 270–300 Gb | 11 days | |||||
| Illumina HiSeq2500 v4 | 36 (SES) | 64–72 Gb | 2 bbn(SES) | 29 h | |||
| 50 (PES) | 180–200 Gb | 4 B (PES) | 2.5 days | ||||
| 100 (PES) | 360–400 Gb | 5 days | |||||
| 125 (PES) | 450–500 Gb | 6 days | |||||
| Illumina HiSeq 3000/4000 | 50 (SES) | 105–125 Gb | 2.5 bn (SES) | 1–3.5 days | |||
| 75 (PES) | 325–375 Gb | ||||||
| 150 (PES) | 650–750 Gb | ||||||
| Illumina HiseqX | 150 (PES) | 800–900 Gb per flow cell | 2.6–3 bn (PES) | < 3 days | |||
| Qiagen Gene Reader | NA | 12 genes; 1,250 mutations | NA | Several days | |||
| Sequencing by synthesis: SBS | Homopolymer errors | Less expensive and relatively fast | 454 GS Junior | Upto 600;400 average (SES,PES) | 35 Mb | ~ 0.1 M | 10 h |
| 454 GS Junior+ | Upto 1,000;700 average (SES,PES) | 70 Mb | ~ 0.1 M | 18 h | |||
| 454GSFLX TitaniumXLR70 | Upto 600;450 mode (SES,PES) | 450 Mb | ~1 M | 10 h | |||
| 454 GS FLX Titanium XL | Up to 1,000; 700 mode (SE, PE) | 700 Mb | ~1 M | 23 h | |||
| Ion PGM 314 | 200 (SES) | 30–50 | 400,000– | 23 h | |||
| 400 (SES) | 60–100 Mb | 550,000 | 3.7 h | ||||
| Ion PGM 316 | 200 (SES) | 300–500 Mb | 2–3 M | 3 h | |||
| 400 (SES) | 600 Mb−1 Gb | 4.9 h | |||||
| Ion PGM 318 | 200 (SES) | 600 Mb−1 Gb | 4–5.5 M | 4 h | |||
| 400 (SES) | 1–2 Gb | 7.3 h | |||||
| Ion Proton | Up to 200 (SES) | Up to 10 Gb | 60–80 M | 2–4 h | |||
| Ion S5 520 | 200 (SES) | 600 Mb−1 Gb | 3–5 M | 2.5 h | |||
| 400 (SES) | 1–2 Gb | 4 h | |||||
| Ion S5 530 | 200 (SES) | 3–4 Gb | 15–20 M | 2.5 h | |||
| 400 (SES) | 6–8 Gb | 4 h | |||||
| Ion S5 540 | 200 (SES) | 10–15 Gb | 60–80 M | 2.5 h | |||
| Single-moleculereal-time long reads or (PacificBioSciences) | Moderate throughput and equipment are very expensive | Fast detection | Pacific BioSciences RSII | ~20 Kb | 500 Mb−1 Gb | ~55,000 | 4 h |
| Pacific BioSciences Sequel | 8–12 Kb | 3.5–7 Gb | ~350,000 | 0.5–6 h | |||
| Oxford Nanopore MK1MinION | Up to 200 Kb | Up to 1.5 Gb | >100,000 | Up to 48 h | |||
| Oxford Nanopore PromethION | NA | Upto 4 Tb | NA | NA |
Manufacturer's data;
Rounded from Field Guide to next-generation DNA sequencers and 2014 update;
Information is not available, as this product has been developed recently.
CRT, Cyclic Reversible Termination; NA, Not Available; PES, Paired End Sequencing; SBS, Sequencing by Synthesis; SES, Single End Sequencing.
Figure 4Timeline of the introduction of the next-generation DNA sequencing technologies and platforms.
Lists of software's used in metagenomics analysis.
| FastQC | FastQC, a java based application is performed via a series of analysis modules. | ( | |
| Fastx-Toolkit | Fastx is a command based tool kit for the quality control of short-reads and allows processing, format conversion, collapsing and cutting on the basis of sequence identity and length. | ( | |
| PRINSEQ | A standalone tool allows integration and analysis into the existing data processing pipelines. PRINSEQ as a tool offers a computational resource that is able to handle huge amount of data generated by next-generation sequencers. It is used for sequences trimming based on in the di-nucleotides occurrences and the sequence duplication (mainly 5′/3′). | ( | |
| NGS QC Toolkit | NGS QC Toolkit encompasses user-friendly standalone tools for the quality control of the sequence data generated by next-generation sequencing platforms. The analysis is performed in a parallel environment. | ( | |
| Meta-QC-Chain | Meta-QC-Chain is a tool for the quality control analysis performed in parallel environment. Performs mapping against 18S rRNA databases in order to remove the eukaryotic contaminant sequences. | ( | |
| Mothur | Mothur is an open-source, expandable software used for the quality analysis of reads to taxonomic classification, ribosomal gene meta-profiling comparison and calculus of diversity estimators. | ( | |
| QIIME | QIIME pipeline is designed for the task of analyzing microbial communities sampled via a marker gene (16S or 18S rRNA) amplicon sequencing. In its heart pipeline QIIME performs quality pre-treatment of raw-reads, calculate estimates diversity estimates, taxonomic annotation and comparison of metagenomic data. | ( | |
| MEGAN | MEGAN is a graphical interface tool that allows both taxonomic as well as functional analysis of metagenomic reads. It is based on the BLAST output of short reads and performs comparative metagenomics. | ( | |
| CARMA | CARMA provides a clear quantitative and statistical characterization of phylogenetic classification of the reads based on Pfam conserved domains. | ( | |
| PICRUSt | PICRUSt is a tool that serves in the field of metagenomic analysis where the prediction of the metabolic potential is done from the taxonomic information obtained | ( | |
| TETRA | TERTA is a web-based stand alone program used for the Taxonomic classification and comparison of tetra nucleotide patterns with in a DNA sequence. | ( | |
| PhylophytiaS | Composition-based classifier of sequences based on reference genomes signatures | ( | |
| MOCAT | MOCAT is a highly configurable and modular pipeline that includes the quality treatment of metagenomic reads based on single copy marker genes classification and gene-coding prediction. The pipeline makes use of a state-of-the-art program to map quality control and assemble reads from metagenome samples sequenced at a very high depth (several billion base pairs). | ( | |
| Parallel-meta | Parallel-meta is a comprehensive and automotive software package that offers fast data mining and metabolic function across large number of metagenomic datasets. The functional annotation is based on BLAST best hit results. | ( | |
| MetaclusterTA | MetaclusterTA is a tool used for the Taxonomic annotation that is based on the binning of reads and contigs. Dependent on reference genomes. | ( | |
| MaxBin | MaxBin software is used for the unsupervised binning of metagenomic sequences based on an Expectation-maximization algorithm. For user's expediency MaxBin reports genome-related statistics including GC content, genome size and completeness. | ( | |
| Amphora and Amphora2 | Amphora and Amphora2 is used for the Metagenomic phylotyping via single copy phylogenetic marker genes classification. | ( | |
| BWA | BWA is an algorithm used for the mapping of short-low-divergent sequences to large references. It is based on Burrows–Wheeler transform. | ( | |
| Bowtie | Bowtie is a fast short read aligner to long reference sequences based on Burrows–Wheeler transform. | ( | |
| Genometa | Genometa is a graphical interface applied for taxonomic and functional annotation of short-reads metagenomic data. | ( | |
| SOrt-ITEMS | SOrt-Items is a tool used for taxonomic annotation via alignment-based orthology of metagenomic reads. | ( | |
| DiScRIBinATE | Taxonomic assignment by BLASTx best hits classification of reads. | ( | |
| IDBA-UD | IDBA-UD is a | ( | |
| MetaVelvet | MetaVelvet is a | ( | |
| RayMeta | RayMeta, a | ( | |
| MetaGeneMark | MetaGeneMark is a gene coding sequences predictor from metagenomic sequences by heuristic model. | ( | |
| GlimmerMG | GlimmerMG is a gene coding sequences predictor from metagenomic sequences by unsupervised clustering. | ( | |
| FragGeneScan | FragGeneScan is a gene coding sequences predictor from short reads. | ( | |
| CD-HIT | CD-HIT is a tool used for clustering and comparing of sequences of nucleotides or protein. | ( | |
| HMMER3 | HMMER3 is a free and commonly used software package for sequence analysis. It is a Hidden Markov based model used to perform sequences alignments. Used for the identification of the homologus nucleotide and protein sequences | ( | |
| BLASTX | Basic local alignment of translated sequences | ( | |
| MetaORFA | MetaORFA is applied for the assembly of peptides obtained from predicted ORFs. | Website not available | ( |
| MinPath | MinPath is a tool used for reconstruction of pathways from protein family predictions. | ( | |
| MetaPath | MetaPath is used for the identification of metabolic pathways that are differentially abundant within the metagenomic samples. | ( | |
| GhostKOALA | GhostKOALA is KEGG's internal annotator of metagenomes by k-number assignment by GHOSTX searches against a non-redundant database of KEGG genes. | ( | |
| RAMMCAP | RAMMCAP is used for the metagenomic functional annotation and data clustering. | ( | |
| ProViDE | ProViDE is a tool for the analysis of viral diversity in metagenomic samples. | ( | |
| Phyloseq | Phyloseq is a tool-kit for raw reads pre-processing, diversity analysis and graphics production. It is an R, Bioconductor package. | ( | |
| Metagenome Seq | MetagenomeSeq is designed to determine the analysis of differential abundance of 16S rRNA gene in metaprofiling data. It is also designed to address the effects of both under-sampling and normalization of microbial communities on the basis of disease association detection. | ( | |
| Shotgun Functionalize R | Shotgun Functionalize is an R-Package for the functional assessment of metagenomic data. The package includes tools designed for importing, annotating and visualizing metagenomic data generated via high-throughput sequencing. | ( | |
| Galaxy portal | Galaxy portal is a web repository of computational tools that can be run without informatics expertise. It is a graphical interface and free service. | ( | |
| MG-RAST | MG-RAST an open source web application is used for the automatic phylogenetic and functional analysis of metagenomes. MG-RAST is one of the biggest repositories for metagenomic data. It is a Graphical interface, web portal and free service. | ( | |
| IMG/M | IMG (Integrated Microbial Genomes) system serves as a community resource for the analysis, functional annotation and phylogenetic distribution of genes and comparative metagenomics. It is a graphical interface, web portal and free in service. | ( | |
| Phinch | Phinch is an open source, interactive exploratory data visualizing tool intended to alleviate the analysis of meta-omic datasets. The main features of this software are streamlined visualization workflow, sleek user interface, novel exploration of larger datasets. Accessible via web browser. | ( | |
| CAMERA | CAMERA is an important tool that aims to bridge the gaps and to develop methods so as to monitor microbial communities of the oceans. CAMERA's databases incorporate both the genomic and metagenomic datasets, metadata, results from the pre computed analysis and softwares that endorse commanding cross-analysis of the environmental metagenomes. | ( | |
| Meta Comp | Meta Comp is a graphical inclusive analysis tool that encompasses a series of statistical analysis approaches along with visualized results for comparative analysis of metagenomics as well as other meta-omics data sets. The software has the features to read files generated via different upstream analysis programs. It has also got the features to automatically choose two-group sample test. | ( |
Figure 5Overview of the workflow used by metagenomic analysis tools (QIIME, Mothur, EBI and MG-RAST).
Comparative workflow of the four most commonly used bioinformatics pipeline for analyzing metagenomic datasets.
| License | Free open-source | Free open-source | Free open-source | Free open-source |
| Implementation (release candidate) | Python | Python | Python | C++ |
| Current Version available (March 2018) | 4.1 | 4.0.3 | 1.9.1 and 2017.6.0, respectively | 1.39.5 |
| Website | ||||
| Primary Usage | GUI | GUI | CL and GUI, respectively | CL |
| Amplicon data Analysis | Yes | Yes | Yes | Yes |
| Whole genome shotgun analysis | Yes | Yes | Yes but only experimental | No |
| Sequencing technology compatibility | Sanger, PacBio, Ion Torrent, Illumina, Nanopore | Sanger, PacBio, Ion Torrent, Illumina, Nanopore | Sanger, PacBio, Ion Torrent, Illumina, Nanopore | Sanger, PacBio, Ion Torrent, Illumina, Nanopore |
| Quality control | Yes | Yes | Yes | Yes |
| 16S rRNA gene Databases searched | Silva, Rfam, MAPSeq, Pfam, TIGRFAM, Prints, Prosite patterns, Gene 3d | Silva, M5RNA, RDP and Greengenes | Greengenes, RDP, Siva and Unite | RDP, Greengenes Silva and Unite |
| Alignment method | PyNAST, MUSCLE, INFERNAL | BLAT | PyNAST, MUSCLE, INFERNAL | Needleman-Wunsch, Blastn, Gotoh |
| Taxonomic assignment | UCLUST, BLAST, Mothur, RDP | BLAT | UCLUST, BLAST, Mothur, RDP | Wang/RDP approach |
| Clustering algorithm | UCLUST, BLAST Mothur, CD-HIT | UCLUST | UCLUST, BLAST Mothur, CD-HIT | Mothur, CD-HIT and adapts DOTUR |
| Diversity analysis | Alpha and beta | Alpha | Alpha and beta | Alpha and beta |
| Phylogenetic Tree | YES | YES | FastTree | Clear cut algorithm |
| Visualization | T, BC, PC, HM, SC, PCA, Krona and Circos | T, BC, PC, HM, SC, PCA, Krona and Circos | T, BC, PC, HM, SC, PCA | T, BC, PC, HM, SC, PCA, Dendrograms, Venn diagrams |
| Submitted projects as on March 2018 | Total: 1,653 | Total: 3,24,846 | NA | NA |
BC, Bar-Charts; BLAT, Blast like Alignment Tool; CL, Command Line; EBI, European Bioinformatics Institute; GUI, Graphical User Interface; HM, Heat Map; MGRAST, Metagenomic Rapid Annotations using Subsystems Technology; OUT, Operational Taxonomic Unit; PC, Pie-Charts; PCA, Principal Component Analysis; QIIME, Quantitative Insights into Microbial Ecology; RDP, Ribosomal Database Project; SC, Stacked Columns; T, Tabulation.
Figure 6Flow chart of basic metagenomics steps and tools currently in use.
Figure 7Timeline showing the sequencing cost (A) per Mb until year 2009, (B) per Mb between year 2009 and 2017, (C) per genome until year 2009, (D) per genome between year 2009 and 2017.