| Literature DB >> 27551678 |
Jacobo de la Cuesta-Zuluaga1, Juan S Escobar1.
Abstract
Next-generation sequencing technologies have found a widespread use in the study of host-microbe interactions due to the increase in their throughput and their ever-decreasing costs. The analysis of human-associated microbial communities using a marker gene, particularly the 16S rRNA, has been greatly benefited from these technologies - the human gut microbiome research being a remarkable example of such analysis that has greatly expanded our understanding of microbe-mediated human health and disease, metabolism, and food absorption. 16S studies go through a series of in vitro and in silico steps that can greatly influence their outcomes. However, the lack of a standardized workflow has led to uncertainties regarding the transparency and reproducibility of gut microbiome studies. We, here, discuss the most common challenges in the archetypical 16S rRNA workflow, including the extraction of total DNA, its use as template in PCR with primers that amplify specific hypervariable regions of the gene, amplicon sequencing, the denoising and removal of low-quality reads, the detection and removal of chimeric sequences, the clustering of high-quality sequences into operational taxonomic units, and their taxonomic classification. We recommend the essential technical information that should be conveyed in publications for reproducibility of results and encourage non-experts to include procedures and available tools that mitigate most of the problems encountered in microbiome analysis.Entities:
Keywords: 16S rRNA; gut microbiome; next-generation sequencing; personalized medicine; personalized nutrition
Year: 2016 PMID: 27551678 PMCID: PMC4976105 DOI: 10.3389/fnut.2016.00026
Source DB: PubMed Journal: Front Nutr ISSN: 2296-861X
Figure 1Schematic view of the archetypical workflow in 16S rRNA studies, and some of the problems associated with each step. Dotted lines link the workflow with steps beyond the scope of the review, and dashed lines represent non-standard steps.
Specifications of the most commonly used sequencing platforms in microbial community characterization studies.
| Platform | Raw ER | ER after denoise | Read length (bp) | Throughput (Gb/run) | Cost/Gb (USD) | Known problems | Reference |
|---|---|---|---|---|---|---|---|
| 454 FLX Titanium | 1.0–2.0 | <0.02 | 450 | 0.4 | 15,500 | High error rate in homopolymer regions. Sequence quality decreases in a lengthwise fashion. Soon to be phased out | ( |
| Illumina MiSeq v2 | 0.8–1.0 | <0.02 | 2 × 250 | 7.5 | 142 | Sequence quality decreases in a lengthwise fashion. The second read has a higher error rate than the first read. Increased single-base errors in association with GGC motifs | ( |
| Ion Torrent PGM 316 chip | 1.5 | NA | 400 | 1 | 674 | Premature sequence truncation caused by organism- and orientation-dependent biases. Low accuracy in homopolymer regions | ( |
| PacBio RS II | 1.8 | 0.3 | 10,000 | 0.1 | 1,100 | Systematic and non-random errors; G and C are more likely to be deleted than A and C. Preferential loading of shorter sequences into zero-mode waveguides | ( |
.
.
Recommendations to reduce the impact of biases introduced in the different steps of the analysis of microbial communities using the 16S rRNA gene.
| Step | Main challenge | Possible solution | Importance |
|---|---|---|---|
| DNA extraction | Uneven representation of the microbial community under scrutiny. | The use of a DNA extraction method that includes a bead-beating step results in a more comprehensive representation of the microbial community. | Moderate |
| Differential representation of microbial communities due to differences in DNA extraction kits. | Direct comparisons should be carried only between studies using the same DNA extraction kit. | Moderate | |
| Contamination by microbial DNA from the DNA extraction and PCR reagents. | In order to reduce the risk of contamination, the starting biomass should be maximized. To control it, the samples must be processed in random order, the kit lots must be included as metadata and technical controls from the reagents must be sequenced. | Moderate | |
| Multi-template PCR | Differences in the estimated phylogenetic diversity between hypervariable regions of the 16S rRNA gene. | The region that best approximates the phylogenetic diversity given by the whole gene should be selected. The V4 region has been shown to approximate the phylogenetic diversity given by the whole gene and to result in best taxonomy labeling. | Moderate |
| Uneven coverage of different microbial taxa by the PCR primers. | Bioinformatic tools, such as SILVA TestPrime, allow the evaluation of primers, and the ones with the highest coverage rate for the taxa known to be present in the microbial community of interest should be selected. | Moderate | |
| The microbial coverage is maximized by using degenerate primers. | High | ||
| Direct comparisons should be carried only between studies using the same set of primers. | Moderate | ||
| Amplicon sequencing by NGS | Sequencing platform selection. | The selection of the sequencing platform should be made prioritizing error rate over sequencing depth and read length. | High |
| Assessment of the quality of the sequencing run. | The sequencing of a mock community allows the quality assessment of each individual amplicon sequencing run. | High | |
| Culling of dubious sequences | Overestimation of diversity caused by spurious sequences. | Apply a stringent sequence denoising and curation procedures and assess their effectiveness by determining the final error rate using a sequenced mock community. | High |
| Chimera removal | Overestimation of diversity caused by non-existent organisms (chimeric sequences). | The use of database-free approaches, especially when studying poorly characterized environments, is encouraged. | Moderate |
| OTU clustering and taxonomy assignment | Overestimation of diversity caused by clustering algorithms. | Database-free OTU-based methods should be preferred over taxonomic-dependent (phylotyping) approaches. | Moderate |
| If computationally possible, the use of hierarchical methods such as average or complete linkage should be used, otherwise, a heuristic method such as CD-HIT is suggested. | Moderate | ||
| Erroneous taxonomic classification of OTUs. | The taxonomic assignment should be carried by majority consensus of the sequences within the OTU. | Moderate | |
| Copy number variation | Over- or underestimation of diversity caused by erroneous abundance assessment. | While algorithms that correct CNV exist, they depend on whole genome sequence data, which may not be available for poorly described microorganisms, thus, their use is not encouraged | Low |