| Literature DB >> 23670539 |
Nicola Segata1, Daniela Boernigen, Timothy L Tickle, Xochitl C Morgan, Wendy S Garrett, Curtis Huttenhower.
Abstract
Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges.Entities:
Mesh:
Year: 2013 PMID: 23670539 PMCID: PMC4039370 DOI: 10.1038/msb.2013.22
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Figure 1Open biological questions in microbial community biology, and emerging technologies and models for their exploration. Microbial communities are complex biological entities interacting with the environment, host organisms, and transient microbes. Predictive models for most of the interactions within these ecosystems are currently rare, but several studies have begun to provide key insights.
Current computational methods for meta'omic analysis
| Method | Description | Reference |
|---|---|---|
| Genovo | Generative probabilistic model of reads | ( |
| khmer | Probabilistic de Bruijn graphs | ( |
| Meta-IDBA | De Bruijn graph multiple alignments | ( |
| metAMOS | A Modular Open-Source Assembler component for metagenomes | ( |
| MetaVelvet | De Brujin graph coverage and connectivity | ( |
| MOCAT | Assembly and gene prediction toolkit | ( |
| SOAPdenovo | Single-genome assembler commonly tuned for metagenomes | ( |
| MetaORFA | Gene-targeted assembly approach | ( |
| Amphora, Amphora2 | Automated pipeline for Phylogenomic Analysis | ( |
| CARMA3 | Taxonomic classification of metagenomic shotgun sequences | ( |
| ClaMS | Classifier for Metagenomic Sequences | ( |
| DiScRIBinATE | Distance Score Ratio for Improved Binning and Taxonomic Estimation | ( |
| INDUS | Composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences | ( |
| MARTA | Suite of Java-based tools for assigning taxonomic status to DNA sequences | ( |
| MetaCluster | Binning algorithm for high-throughput sequencing reads | ( |
| MetaPhlAn | Profiles the composition of microbial communities from metagenomic shotgun sequencing data | ( |
| MetaPhyler | Taxonomic classifier for metagenomic shotgun reads using phylogenetic marker reference genes | ( |
| MTR | Taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks | ( |
| NBC | Naive Bayes Classification tool for taxonomic assignment | ( |
| PaPaRa | Aligning short reads to reference alignments and trees | ( |
| PhyloPythia | Accurate phylogenetic classification of variable-length DNA fragments | ( |
| Phymm, PhymmBL | Classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads | ( |
| RAIphy | Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles | ( |
| RITA | Classifying short genomic fragments from novel lineages using composition and homology | ( |
| SOrt-ITEMS | Sequence orthology-based approach for improved taxonomic estimation of metagenomic sequences | ( |
| SPHINX | Algorithm for taxonomic binning of metagenomic sequences | ( |
| TACOA | Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach | ( |
| Treephyler | Fast taxonomic profiling of metagenomes | ( |
| HUMAnN | Determines the presence/absence and abundance of microbial pathways in meta'omic data | ( |
| metaSHARK | A web platform for interactive exploration of metabolic networks | ( |
| PRMT | Predicted Relative Metabolomic Turnover: determining metabolic turnover from a coastal marine metagenomic dataset | ( |
| RAMMCAP | Rapid analysis of Multiple Metagenomes with Clustering and Annotation Pipeline | ( |
| SparCC | Estimates correlation values from compositional data for network inference | ( |
| CCREPE | Predicts microbial relationships within and between microbial habitats for network inference | ( |
| IDBA-UD | Assembler for single-cell or metagenomic sequencing with uneven depths | ( |
| SmashCell | Software framework for the analysis of single-cell amplified genome sequences | ( |
| GemSIM | Error-model based simulator of next-generation sequencing data | ( |
| MetaSim | A sequencing simulator for genomics and metagenomics | ( |
| Metastats | Statistical analysis software for comparing metagenomic samples | ( |
| LefSe | Nonparametric test for biomarker discovery in proportional microbial community data | ( |
| ShotgunFunctionalizeR | A statistical test based on a Poisson model for metagenomic functional comparisons | ( |
| SourceTracker | A Bayesian approach to identify and quantify contaminants in a given community | ( |
| CAMERA | Dashboard for environmental metagenomic and genomic data, metadata, and comparative analysis tools | ( |
| IMG/M | Integrated metagenome data management and comparative analysis system | ( |
| MEGAN | Software for metagenomic, metatranscriptomic, metaproteomic, and rRNA analysis | ( |
| METAREP | Online storage and analysis environment for meta'omic data | ( |
| MG-RAST | Storage, quality control, annotation and comparison of meta'omic samples. | ( |
| SmashCommunity | Stand-alone annotation and analysis pipeline suitable for meta'omic data | ( |
| STAMP | Comparative meta'omics software package | ( |
| VAMPS | Visualization and analysis of microbial population structure | ( |
Common steps needed for metagenome and metatranscriptome interpretation include assembly, taxonomic profiling, functional profiling, ecological interaction network construction, single-cell sequencing, synthetic data simulators, and downstream statistical tests.
Figure 2Community diversity and metagenome depth interact to influence assembly quality. Five hundred and twenty-two metagenomic assemblies from the Human Microbiome Project (HMP) are shown here to demonstrate the complex interaction of underlying microbial α-diversity (x axis, diversity within a sample measured as species richness) and assembly quality (y axis). The latter was measured as the size of the smallest contig such that the cumulative length of longer contigs exceeds 4 Mbp, normalized by the total sequenced microbial nucleotide count (The Human Microbiome Project Consortium, 2012a). Communities from each of the seven available body sites are highlighted in different colors, with each point's area proportional to the total input nucleotides for assembly. Microbial composition, metagenome depth, and assembly approach (not shown) all interact to greatly influence the resulting assembly quality.
Figure 3Intrinsic versus extrinsic metagenomic analysis can minimally, partially, or completely rely on prior knowledge from sequenced reference genomes. Methods that do not rely on any reference sequence information typically perform a sequence-based clustering of meta'omic reads, resulting in unlabeled clusters of sequences that can later be assigned to taxonomic or functional classes (analogous to Operational Taxonomic Unit clustering for 16S sequences). Available genomes can alternatively be used more extensively as references for short-read mapping, typically incurring an expense of high computational cost and possible ambiguous assignments for reads from nonunique regions. Intermediate approaches typically rely on a combination of pre-processing extrinsic reference genome information (e.g., to train a composition-based classifier) and intrinsic information (e.g., reads' nucleotide composition) to improve the discrimination power and focus the subsequent mapping operation to the most discriminative sequence-based markers.
Figure 4A typical current computational meta'omic pipeline to analyze and contrast microbial communities. After collecting microbiome samples, community DNA or RNA is extracted and sequenced, generating WMS samples (i.e., metagenomes) generally consisting of several million short reads each. This example uses 20 WMS samples from the oral cavity (10 from the buccal mucosa, and 10 from the tongue dorsum (The Human Microbiome Project Consortium, 2012b)). Complementary methods reconstruct the taxonomic characteristics (left) and metabolic potential (right) of the microbial communities. MetaPhlAn (Segata et al, 2012) is one of many alternatives to detect and quantify microbial clades with species-level resolution (see Section 3), whereas HUMAnN (Abubucker et al, 2012) quantitatively characterizes genes, pathways, and metabolic modules from each community (see Section 4). Differentially abundant clades or pathways can then be identified and assessed by tools such as LEfSe (Segata et al, 2011) and represented graphically (e.g., here by GraPhlAn, http://huttenhower.sph.harvard.edu/graphlan). The step-by-step computational pipeline used to produce the analyses reported here is included as a tutorial in Supplementary Information and can also be downloaded from https://bitbucket.org/nsegata/metaphlan/wiki/MetaPhlAn_Pipelines_Tutorial. See Table I for alternative computational approaches to each of these currently common steps in meta'omic analysis.