| Literature DB >> 26734060 |
Alejandra Escobar-Zepeda1, Arturo Vera-Ponce de León2, Alejandro Sanchez-Flores1.
Abstract
The study of microorganisms that pervade each and every part of this planet has encountered many challenges through time such as the discovery of unknown organisms and the understanding of how they interact with their environment. The aim of this review is to take the reader along the timeline and major milestones that led us to modern metagenomics. This new and thriving area is likely to be an important contributor to solve different problems. The transition from classical microbiology to modern metagenomics studies has required the development of new branches of knowledge and specialization. Here, we will review how the availability of high-throughput sequencing technologies has transformed microbiology and bioinformatics and how to tackle the inherent computational challenges that arise from the DNA sequencing revolution. New computational methods are constantly developed to collect, process, and extract useful biological information from a variety of samples and complex datasets, but metagenomics needs the integration of several of these computational methods. Despite the level of specialization needed in bioinformatics, it is important that life-scientists have a good understanding of it for a correct experimental design, which allows them to reveal the information in a metagenome.Entities:
Keywords: bioinformatics; functional genomics; high-throughput sequencing; metagenomics; microbiology; taxonomy
Year: 2015 PMID: 26734060 PMCID: PMC4681832 DOI: 10.3389/fgene.2015.00348
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Metagenomics timeline and milestones. Timeline showing advances in microbial communities studies from Leeuwenhoek to NGS (Ottman et al., 2012; Yarza et al., 2014).
Direct comparison among sequencing technologies suitable for metagenomics.
| Maximum read length (bp) | 1200 | 400 | 300 | 50,000 |
| Output per run (Gb) | 1 | 2 | 1000 | 1 |
| Amplification for library construction | Yes | Yes | Yes | No |
| Cost/Gb (USA Dollar) | $9538.46 | $460.00 | $29.30 | $600 |
| Error kind | Indel | Indel | Substitution | Indel |
| Error rate (%) | 1 | ~1 | ~0.1 | ~13 |
| Run time | 20 h | 7.3 h | 6 days | 2 h |
Adapted from Glenn, T. 2014 NGS Field Guide—Table 2a—Run time, Reads, Yield|The Molecular Ecologist. Available online at:
.
P6-C4 chemistry.
MiSeq read length.
Illumina HiSeq 2500 Dual flowcell yield.
Examples of software used in metagenomic and metaprofiling analysis.
| FastQC | Quality control tool for high-throughput sequence data using modular options and giving graphic results of quality per base sequence, GC content, N numbers, duplication, and over represent | Andrews, | |
| Fastx-Toolkit | Command line tools for Short-reads quality control. These allow processing, cutting, format conversion, and collapsing by sequence length and identity | NP | |
| PRINTSEQ | Quality control tool for sequence trimming based in dinucleotide occurrence and sequence duplication (mainly 5′/3′) | Schmieder and Edwards, | |
| NGS QC Toolkit | Tool for quality control analysis performed in parallel environment | Patel and Jain, | |
| Meta-QC-Chain | Parallel environment tool for quality control. This performs a mapping against 18S rRNA databases for removing eukaryotic contaminant sequences | Zhou et al., | |
| Mothur | From reads quality analysis to taxonomic classification, calculus of diversity estimators and ribosomal gene metaprofiling comparison | Schloss et al., | |
| QIIME | Quality pre-treatment of raw reads, taxonomic annotation, calculus of diversity estimators, and comparison of metaprofiling or metagenomic data | Caporaso et al., | |
| MEGAN | Taxonomy and functional analysis of metagenomic reads. It based on BLAST output of short reads and performs comparative metagenomics. Graphical interface | Huson and Weber, | |
| CARMA | Phylogenetic classification of reads based on Pfam conserved domains | Krause et al., | |
| PICRUSt | Predictor of metabolic potential from taxonomic information obtained of 16S rRNA metaprofiling projects | Langille et al., | |
| Parallel-meta | Taxonomic annotation of ribosomal gene markers sequences obtained by metaprofiling or metagenomic reads. Functional annotation based on BLAST best hits results. Comparative metagenomics | Su et al., | |
| MOCAT | Pipeline that includes quality treatment of metagenomic reads, taxonomic annotation based on single copy marker genes classification, and gene-coding prediction | Kultima et al., | |
| TETRA | Taxonomic classification by comparison of tetranucleotide patterns. Web service available | Teeling et al., | |
| PhylophytiaS | Composition-based classifier of sequences based on reference genomes signatures | McHardy et al., | |
| MetaclusterTA | Taxonomic annotation based on binning of reads and contigs. Dependent of reference genomes | Wang et al., | |
| MaxBin | Unsupervised binning of metagenomic short reads and contigs | Wu et al., | |
| Amphora and Amphora2 | Metagenomic phylotyping by single copy phylogenetic marker genes classification | Wu and Eisen, | |
| BWA | Algorithm for mapping short-low-divergent sequences to large references. Based on Burrows–Wheeler transform | Li and Durbin, | |
| Bowtie | Fast short read aligner to long reference sequences based on Burrows–Wheeler transform | Langmead and Salzberg, | |
| Genometa | Taxonomic and functional annotation of short-reads metagenomic data. Graphical interface | Davenport and Tümmler, | |
| SORT-Items | Taxonomic annotation by alignment-based orthology of metagenomic reads | Monzoorul Haque et al., | |
| DiScRIBinATE | Taxonomic assignment by BLASTx best hits classification of reads | Ghosh et al., | |
| IDBA-UD | Assembler | Peng et al., | |
| MetaVelvet | Namiki et al., | ||
| Ray Meta | Assembler of | Boisvert et al., | |
| MetaGeneMark | Gene coding sequences predictor from metagenomic sequences by heuristic model | Zhu et al., | |
| GlimmerMG | Gene coding sequences predictor from metagenomic sequences by unsupervised clustering | Kelley et al., | |
| FragGeneScan | Gene coding sequences predictor from short reads | Rho et al., | |
| CD-HIT | Clustering and comparing sequences of nucleotides or protein | Li and Godzik, | |
| HMMER3 | Hidden Markov models applied in sequences alignments | Eddy, | |
| BLASTX | Basic local alignment of translated sequences | Altschul et al., | |
| MetaORFA | Assembly of peptides obtained from predicted ORFs | Ye and Tang, | NA |
| MinPath | Reconstruction of pathways from protein family predictions | Ye and Doak, | |
| MetaPath | Identification of metabolic pathways differentially abundant among metagenomic samples | Liu and Pop, | |
| GhostKOALA | KEGG's internal annotator of metagenomes by k-number assignment by GHOSTX searches against a non-redundant database of KEGG genes | NP | |
| RAMMCAP | Metagenomic functional annotation and data clustering | Li, | |
| ProViDE | Analysis of viral diversity in metagenomic samples | Ghosh et al., | |
| Phyloseq | Tool-kit to row reads pre-processing, diversity analysis and graphics production. R, Bioconductor package | McMurdie and Holmes, | |
| MetagenomeSeq | Analysis of differentially abundance of 16S rRNA gene in metaprofiling data. R, Bioconductor package | Paulson et al., | |
| ShotgunFunctionalizeR | Metagenomic functional comparison at level of individual genes (COG and EC numbers) and complete pathways. R, Bioconductor package | Kristiansson et al., | |
| Galaxy portal | Web repository of computational tools that can be run without informatic expertise. Graphical interface and free service | Goecks et al., | https://usegalaxy.org/ |
| MG-RAST | Taxonomic and functional annotation, comparative metagenomics. Graphical interface, web portal, and free service | Meyer et al., | |
| IMG/M | Functional annotation, phylogenetic distribution of genes and comparative metagenomics. Graphical interface, web portal, and free service | Markowitz et al., | https://img.jgi.doe.gov/cgi-bin/m/main.cgi |
NP, Not published in an indexed Journal; NA, Not web site available.