| Literature DB >> 28476139 |
Dorothy Kim1, Casey E Hofstaedter1, Chunyu Zhao1, Lisa Mattei1, Ceylan Tanes1, Erik Clarke2, Abigail Lauder2, Scott Sherrill-Mix2, Christel Chehoud2, Judith Kelsen1, Máire Conrad1, Ronald G Collman3, Robert Baldassano1, Frederic D Bushman2, Kyle Bittinger4.
Abstract
Research on the human microbiome has yielded numerous insights into health and disease, but also has resulted in a wealth of experimental artifacts. Here, we present suggestions for optimizing experimental design and avoiding known pitfalls, organized in the typical order in which studies are carried out. We first review best practices in experimental design and introduce common confounders such as age, diet, antibiotic use, pet ownership, longitudinal instability, and microbial sharing during cohousing in animal studies. Typically, samples will need to be stored, so we provide data on best practices for several sample types. We then discuss design and analysis of positive and negative controls, which should always be run with experimental samples. We introduce a convenient set of non-biological DNA sequences that can be useful as positive controls for high-volume analysis. Careful analysis of negative and positive controls is particularly important in studies of samples with low microbial biomass, where contamination can comprise most or all of a sample. Lastly, we summarize approaches to enhancing experimental robustness by careful control of multiple comparisons and to comparing discovery and validation cohorts. We hope the experimental tactics summarized here will help researchers in this exciting field advance their studies efficiently while avoiding errors.Entities:
Keywords: 16S rRNA gene; Best practices; Environmental contamination; Metagenomics; Methods; Shotgun metagenomics; Study design
Mesh:
Year: 2017 PMID: 28476139 PMCID: PMC5420141 DOI: 10.1186/s40168-017-0267-5
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Example of cage effects dominating a mouse study of fungal communities. Fungal lineages in the murine gut were inferred from ITS rRNA gene sequencing of pellets [87]. The heat maps summarize taxonomic assignments derived from the sequence data. The color scale to the right indicates the proportions of each lineage; white indicates not detected. Caging dominated over treatment in this study. The three conditions studied were continuous exposure to antibiotics (Condition 1), short-term exposure to antibiotics (Condition 2), and no exposure to antibiotics (Condition 3). For details see [87]
Fig. 2Effects of sample storage methods on community structure inferred for oral swabs. Oral swab samples were acquired from three human individuals and DNA extracted. DNAs were amplified using 16S rRNA gene primers binding to the V1-V2 region then sequenced using the Illumina platform using our standard procedures [88]. Unweighted Unifrac (C [129].) was used to generate distances between all pairs of samples then results were displayed using Principal Coordinate Analysis (PCoA). a Samples from each of the three subjects are color coded (red, blue, and green). b Nine storage conditions were compared, indicated by the different colors. The key to storage conditions is at the right
Fig. 3Wrestling with kit contamination—similar bacterial composition in placental samples and negative controls. Relative abundances of bacterial lineages were inferred from 16S V1-V2 rRNA marker gene sequence information [22]. Samples studied included negative controls, fetal side (FS) placental swabs, maternal side (MS) placental swabs, saliva, and vaginal swabs. Replicates of each sample were extracted using two different kits—the kit type is indicated above each panel. Operating room (OR) air swabs are swabs that were waved in the air at the time of sample collection to be used as negative controls. Saliva samples, which are high in microbial biomass, showed similar compositions for each of the two extractions; placental samples resemble the kit-specific negative controls
Fig. 4Analysis of three negative control sample types reveals contaminating taxa. Data for negative controls was acquired using 16S V1-V2 rRNA marker gene sequencing analyzed on the Illumina MiSeq platform. Data from 11 experiments were pooled. a Comparison of average read counts. Experimental samples had an average read count of 137,243 and negative control samples had an average read count of 6613. b Heat map summary of bacterial lineages present in negative control samples. Different OTUs are present in DNA-extraction controls (“blank extraction” and “blank swab”) and library preparation controls (“library blank”) collected over multiple sequencing runs
Fig. 5Synthetic non-biological 16S DNA as a positive control for 16S rRNA marker gene sequencing. a A diagram of the gene block design. At the top is a typical 16S rRNA gene amplicon, with primer binding sites for the widely used 27F and 338R primers. To generate recognizable sequences that would not be found authentically in samples, synthesized DNAs with the forward (27F) and reverse (338R) primer landing sites added to Archaeal DNA sequences, creating molecules not found in nature but readily analyzed using conventional pipelines. b Control sequence mixtures using the gene blocks show consistent relative abundances. Note that the eight gene blocks annotate as five archaeal taxa. c Heat map displaying the relative abundance of control gene blocks, where each square represents one well on a 96-well plate of a typical 16S rRNA marker gene sequencing project. Positive control wells where gene block was added and amplified alongside experimental samples are denoted with “x”
Fig. 6Contamination in shotgun metagenomic data. a Lineages observed in shotgun metagenomic sequencing of negative control samples using standard (DNeasy PowerSoil) and low-contaminant (QiaAmp UCP Pathogen) kits. b Detecting Bacillus phage phi29 polymerase reads in a blank sample. Twenty-one reads from a blank sample aligned to the DNA polymerase gene (1145 to 2863 bp) of Bacillus phage phi29. The protein was purchased as a reagent from a commercial supplier, suggestive of contamination of the protein with cloned DNA encoding the polymerase gene used in protein over-expression