| Literature DB >> 25673291 |
Marie Lisandra Zepeda Mendoza, Thomas Sicheritz-Pontén, M Thomas P Gilbert.
Abstract
DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets.Entities:
Keywords: DNA metabarcoding; environment; genome; metagenomics; software development
Mesh:
Year: 2015 PMID: 25673291 PMCID: PMC4570204 DOI: 10.1093/bib/bbv001
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Environmental sample analysis framework. (A) A sample can come from any environment that contains DNA; e.g. one of the most studied environments to date is the human gut microbiome. (B) DNA is extracted from the sample and sequenced according to the intended analyses. Shotgun sequencing produces genomic reads from the species present in the sample, while targeted sequencing produces amplicons with the aim of identifying a specific group of organisms. (C) Depending on the initial aim, whether functional and taxonomic characterization or only taxonomic characterization, the appropriate data set needs to be generated to be analyzed with the appropriate software.
Methods comparison
| Type of study and aimed characterization | Metagenomics: taxonomic and functional | Metabarcoding: taxonomic | Metabarcoding: taxonomic | Metabarcoding: taxonomic | Metabarcoding: taxonomic |
|---|---|---|---|---|---|
| Laboratory method | Shotgun sequencing | Shotgun sequencing | Shotgun sequencing | PCR based | PCR based |
| Target region | Genome-wide | Multi-loci | Single locus [ | Customized barcodes | Conventional barcodes, including 16S, COI, etc. |
| DNA quantity | Care should be taken for samples coming from a body part of a macro organism so that the shotgun sequencing is not mostly host DNA | The percentage of marker genes in shotgun data sets is small [ | Only a small fraction of the reads come from a specific marker gene | Lots of customized targeted genes can be obtained | Lots of amplicons from universally targeted genes can be obtained |
| Reference database | Databases of the entire genomes can be customized | The source of the reads is largely unknown and difficult to characterize with the currently existing databases, thus many reads will not be assigned a taxonomy [ | Single marker genes can be extracted from the data set using a reference database | There are good databases for standard barcodes, however if another region is targeted there are few and mostly not curated reported sequences. | There are several large 16S and COI databases, some of them are well curated, such as Greengenes |
| Laboratory bias | May present library build biases due to e.g. genomic nucleotide composition | May present library build biases | May present library build biases | May present primer bias if primers target wide taxonomic distributions | May present primer bias if using ‘universal’ primers for marker gene |
| Taxonomic resolution | The identification of multiple loci (marker or not) can even recover almost entire genomes of species | The phylogenies of more than one gene can provide a better consensus of the species present in the sample | It can provide good taxonomic resolution up to the species level. The taxonomic accuracy increases [ | Sequences other than marker genes may not provide satisfactory taxonomic resolution because one sequence can be assigned to more than one species | The completeness of the well-characterized marker gene databases can provide good taxonomic resolution up to the species level |
| Cost | Deals with various challenges due to the complexity of the mixture of DNA in the sample | It may be unattainable due to the computational requirements | The ratio of used and discarded sequences that do not come from the single mined marker gene is cost inefficient | Low cost when generated on HTS platforms | Generally low cost—especially when generated on HTS platforms |
Note. Comparison of the advantages and disadvantages of various methods that are used to achieve the goals of the DNA metabarcoding and metagenomic fields.
Figure 2Considerations and challenges for metagenomics and DNA metabarcoding. Both fields face a variety of challenges that are ideal candidates for future software development. While some of such problems are specific to one of the fields (right and left boxes), others are common to both (middle boxes).
Figure 3Metabarcoding approaches. (A) Although PCR-free data sets are typically large, usually only a small percentage of the sequence reads map to a reference database. In such database, each entry has an assigned taxonomy so that phylogenetic placing approaches can be used for the taxonomic assignation. (B) PCR-based data sets consist of amplicon sequences that can be analyzed with the use of a reference database or without the need of it. If no database is used, the sequences are compared among themselves and are clustered by a similarity threshold; a representative sequence can be drawn from each cluster to then be compared with a reference database. On the other hand, if a database is used, the sequences are compared against the database and are assigned the taxonomy of the sequence they match under a given similarity threshold. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.
Figure 4Metagenomic approaches. (A) Metagenomic reference-based approaches start by mapping the reads to a genome database and then apply various algorithms to assign taxonomy, such as phylogenetic placement, or the use of unique mapping reads to the genome of a species in the database. (B) Alternatively, the reads can be de novo assembled and the scaffolds, or the open reading frames predicted on the scaffolds, can be searched against the database, thus reducing the search time. (C) Metagenomic reference-free methods usually start by de novo assembling the reads, then the number of reads mapping back to the assembled sequences (the scaffolds or the open reading frames predicted from the scaffolds) can be used to create a count matrix that can be further clustered, with each cluster representing a metagenomic species. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.
Figure 5Method classification placement map. As observed in the placement of the methods, there is lack of software in some areas while there is wealth in others, especially at the borderlines where at first they might seem difficult to classify. (A) Metagenomic reference based. (B) Metagenomic reference free. (C) DNA metabarcoding reference based. (D) DNA metabarcoding reference free.