| Literature DB >> 35106579 |
Margaret D Weinroth1, Aeriel D Belk2,3, Chris Dean4, Noelle Noyes4, Dana K Dittoe5, Michael J Rothrock1, Steven C Ricke5, Phillip R Myer6, Madison T Henniger6, Gustavo A Ramírez7, Brian B Oakley7, Katie Lynn Summers8, Asha M Miles8, Taylor B Ault-Seay6, Zhongtang Yu9, Jessica L Metcalf2, James E Wells10.
Abstract
Microbiome studies in animal science using 16S rRNA gene sequencing have become increasingly common in recent years as sequencing costs continue to fall and bioinformatic tools become more powerful and user-friendly. The combination of molecular biology, microbiology, microbial ecology, computer science, and bioinformatics-in addition to the traditional considerations when conducting an animal science study-makes microbiome studies sometimes intimidating due to the intersection of different fields. The objective of this review is to serve as a jumping-off point for those animal scientists less familiar with 16S rRNA gene sequencing and analyses and to bring up common issues and concerns that arise when planning an animal microbiome study from design through analysis. This review includes an overview of 16S rRNA gene sequencing, its advantages, and its limitations; experimental design considerations such as study design, sample size, sample pooling, and sample locations; wet lab considerations such as field handing, microbial cell lysis, low biomass samples, library preparation, and sequencing controls; and computational considerations such as identification of contamination, accounting for uneven sequencing depth, constructing diversity metrics, assigning taxonomy, differential abundance testing, and, finally, data availability. In addition to general considerations, we highlight some special considerations by species and sample type.Entities:
Keywords: 16S rRNA gene; amplicon sequencing; bacteriome; bioinformatics; microbiome
Mesh:
Substances:
Year: 2022 PMID: 35106579 PMCID: PMC8807179 DOI: 10.1093/jas/skab346
Source DB: PubMed Journal: J Anim Sci ISSN: 0021-8812 Impact factor: 3.159
Glossary of commonly used microbiome terms
| Term | Definition |
|---|---|
| 16S rRNA gene | Gene encoding the RNA component of the 30S subunit of a prokaryotic ribosome; ubiquitous to bacteria and archaea |
| Alpha diversity | The variance within a sample, used to evaluate the number of different species (usually represented by the number of ASVs) in each sample |
| Amplicon | The fragment of DNA resulting from a primer set after amplification using PCR |
| ASV | Amplicon Sequence Variant: individual sequence variants differing by as little as one nucleotide with no fixed dissimilarity threshold |
| Barcoding | Unique DNA sequences attached to broad range primers before amplification. These unique barcodes allow different samples to be pooled and sequenced together in the same run and later separated during analysis (see demultiplexing) |
| Beta diversity | The variance between samples, usually expressed as a distance matrix |
| Demultiplexing | Separation of sequencing reads from a sequenced pooled library by unique barcodes and assignment to the corresponding samples |
| Evenness | Balance of the features (ASVs, species, etc.) within a sample |
| Extraction Controls | Blank or non-DNA samples (such as an empty sponge) added to a study to assess background laboratory contamination (see also library controls and NTC) |
| Feature Table | Also known as a count table (as when using OTUs, OTU Tables). Table that contains the number of sequences counted for each feature (ASV or OTU most commonly), per sample in a matrix |
| GUI | Graphical User Interface: Computer program that allows users to “point-and-click” as opposed to the command line |
| HPC | High-performance computing cluster: More powerful computer than a local system many universities have shared HPC for high computational jobs |
| Library Controls | Controls included with PCR libraries to assess primer performance and contamination (see NTC) |
| Library pooling | Combines barcoded DNA during library preparation to make one pooled sample of DNA for sequencing. Individual identity is maintained through barcoding |
| Long-read | DNA fragments generated that range in length from 5 kb+, most commonly on a PacBio or Nanopore sequencer |
| Metadata | Data that represent biological data collected, describing the information surrounding the data to provide context for analysis and interpretation |
| Metagenome | Refers to all the genomes represented in a biological mixture |
| Mock Community | A bacterial mixture (internally generated or commercially available) with known proportions of bacterial to assess sequencing quality and act as a positive control |
| NTC | No-template controls: Controls included with PCR libraries to assess primer performance and contamination (see Library control) |
| Normalization | Transformation of raw read numbers to account for uneven read numbers— usually in this method, the ASV numbers are multiplied by a value or proportion. |
| OTU | Operational Taxonomic Unit: clusters of sequencing reads that differ by less than a fixed dissimilarity threshold (usually 3%) see also ASV |
| Paired-end sequencing | A DNA fragment is sequenced from both ends (usually 100- to 300-bp long) |
| Phylogenetic trees | Tree representative of the evolutionary relationship between sequences in the sample can be constructed de novo from only sequences in a dataset or compared with a reference tree |
| Pipeline | A collection of tools, programs, and other codes that are run in succession to produce results (common pipelines include QIIME2, Mothur, and RCP) |
| Rarifying | Randomly subsampling ASVs or OTUs within a sample without replacement to a preselected depth |
| Raw reads | Number of reads generated from each sample; due to sequencing inefficiency, this number will not be the same across samples and thus normalization is needed |
| Relative abundance | Percentage of a total population attributed to one taxon such as phyla or species in relation to other features in the community |
| Richness | Number of different species within a sample, regardless of how they are distributed |
| Sample pooling | Combination of raw sample material (such as equal amounts of rumen fluid) or DNA (not to be confused with library pooling, here no individual identity is maintained) |
| Short read | DNA fragments generated that range in length from 75 to 300 bp, most commonly on an Illumina sequencer |
| Shotgun metagenomics | All DNA within a mixed microbe environment, fragmented, and sequenced. Differs from the amplicon 16S approach as it is not amplifying one target but any piece of the genome. |
| Single-end sequencing | A fragment is sequenced only from one end to the other (usually ~75- to 100-bp long) |
| Taxonomy | Represents the identification and classification of each microorganism, represented by an ASV, present in the community; this is distinct from phylogeny, which represents evolutionary relatedness of the ASVs |
| V1 to V9 | Hypervariable regions studied on the 16S rRNA gene |
| V4 | A common hypervariable region for 16S studies, also the target for the Earth Microbiome Project |
Figure 1.Overview of considerations when conducting a 16S rRNA gene sequencing study. Created with Biorender.com.
Figure 2.Illustration of conserved and variable regions of the 16S rRNA gene. Hypervariable regions are labeled in blue with conserved regions indicated by low entropy. In red, four commonly used primer pairs are highlighted. The figure was made by using Shannon entropy data generated from Johnson et al. (2019) (https://github.com/TheJacksonLaboratory/weinstock_full_length_16s) and ggplot2 in R (v. 4.0.2).
Figure 3.The common types of mechanical and nonmechanical lysis procedures utilized in animal science research applications. Created with Biorender.com.
Figure 4.Illustration of considerations for diversity analysis. (A) Example of differences in sample composition based on sampling depth showing that different sampling depths between samples within an experiment can lead to false differences in diversity. This demonstrates the importance of using a normalization method before diversity analysis. (B) Illustration of communities that represent different features included in diversity metrics, specifically the relationship between richness and evenness in how diversity is calculated. (C) Demonstration of the differences in alpha and beta diversity. Alpha diversity represents the diversity within a sample and could be similar even in samples with different taxonomic compositions. Beta diversity describes the differences between samples and can only be calculated by comparing communities. This also demonstrates how samples can have similar alpha diversities but different beta dissimilarities.
Summary of the classifications and features of commonly used alpha and beta diversity metrics[1]
| Metric | Alpha or Beta | Richness | Evenness | Phylogenetic |
|---|---|---|---|---|
| Observed features | Alpha | X | ||
| Pielou’s Evenness | Alpha | X | ||
| Shannon’s Index | Alpha | X | X | |
| Faith’s Phylogenetic Diversity | Alpha | X | X | |
| Jaccard’s Distance | Beta | X | ||
| Bray–Curtis Distance | Beta | X | ||
| Unweighted UniFrac | Beta | X | X | |
| Weighted Unifrac | Beta | X | X | X |
1X indicates the metric includes this feature.