| Literature DB >> 32518181 |
Jeffrey L Curtis1,2,3, Sara D Adar4, John R Erb-Downward5, Nicole R Falkowski1, Jennifer C D'Souza4, Lisa M McCloskey1, Roderick A McDonald1, Christopher A Brown1, Kerby Shedden4, Robert P Dickson1,6,7, Christine M Freeman1,8,2, Kathleen A Stringer1,9, Betsy Foxman4,10, Gary B Huffnagle1,2,6.
Abstract
The bacterial microbiome of human body sites, previously considered sterile, remains highly controversial because it can be challenging to isolate signal from noise when low-biomass samples are being analyzed. We tested the hypothesis that stochastic sequencing noise, separable from reagent contamination, is generated during sequencing on the Illumina MiSeq platform when DNA input is below a critical threshold. We first purified DNA from serial dilutions of Pseudomonas aeruginosa and from negative controls using three DNA purification kits, quantified input using droplet digital PCR, and then sequenced the 16S rRNA gene in four technical replicates. This process identified reproducible contaminant signal that was separable from an irreproducible stochastic noise, which occurred as bacterial biomass of samples decreased. This approach was then applied to authentic respiratory samples from healthy individuals (n = 22) that ranged from high to ultralow bacterial biomass. Using oral rinse, bronchoalveolar lavage (BAL) fluid, and exhaled breath condensate (EBC) samples and matched controls, we were able to demonstrate (i) that stochastic noise dominates sequencing in real-world low-bacterial-biomass samples that contain fewer than 104 copies of the 16S rRNA gene per sample, (ii) that critical examination of the community composition of technical replicates can be used to separate signal from noise, and (iii) that EBC is an irreproducible sampling modality for sampling the microbiome of the lower airways. We anticipate that these results combined with suggested methods for identifying and dealing with noisy communities will facilitate increased reproducibility while simultaneously permitting characterization of potentially important low-biomass communities.IMPORTANCE DNA contamination from external sources (reagents, environment, operator, etc.) has long been assumed to be the main cause of spurious signals that appear under low-bacterial-biomass conditions. Here, we demonstrate that contamination can be separated from another, random signal generated during low-biomass-sample sequencing. This stochastic noise is not reproduced between technical replicates; however, results for any one replicate taken alone could look like a microbial community different from the controls. Using this information, we investigated respiratory samples from healthy humans and determined the narrow range of bacterial biomass where samples transition from producing reproducible microbial sequences to ones dominated by noise. We present a rigorous approach to studies involving low-bacterial-biomass samples to detect this source of noise and provide a framework for deciding if a sample is likely to be dominated by noise. We anticipate that this work will facilitate increased reproducibility in the characterization of potentially important low-biomass communities.Entities:
Keywords: 16S rRNA gene; contamination; exhaled breath condensate; low biomass; lung microbiome; next-generation sequencing; sequencing noise
Mesh:
Substances:
Year: 2020 PMID: 32518181 PMCID: PMC7373192 DOI: 10.1128/mBio.00258-20
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1Serial dilutions of P. aeruginosa DNA purified using three separate DNA isolation kits and sequenced in quadruplicate: effects of low biomass on results. (A) An idealized model of contamination effects on a single sample purified using 3 separate DNA isolation kits and 3 technical replicates for each of those kits. Contamination within each kit is assumed to be 100% different from that of reagents in all other kits. The distribution of a hypothetical similarity score is plotted along the x axis for similarity between technical replicates (in red) and between kits (in blue). Where overlap occurs, the curves appear purple. The first column depicts the condition of high concentrations of DNA, the second column depicts low concentrations of DNA where kit or reagent contamination is dominant, and the third column depicts low concentrations of DNA where random noise dominates. (B) Heat map of the top 100 OTUs (horizontal axis) broken down by dilution (vertical axis) and grouped using a complete linkage clustering. Note emergence of increasing numbers and diversity of low-abundance reads with increasing dilution. (C) Kernel density estimates for the intrareplicate (within-kit) and interreplicate (between-kit) Bray-Curtis distance for each dilution series. (D) Heat map showing the individual results from reagent controls with technical replicates (n = 3/sample). Samples are grouped using a complete linkage clustering. The DNA isolation kit is indicated by the color displayed to the left of the heatmap. (E) Graph depicting the interreplicate Bray-Curtis from technical replicates of reagent controls; bars are colored by kit as in panel D.
Demographics and clinical data of human research participants
| Sex | Age (yrs) | Race | Pack-yrs | Smoking status | Vol (liters) (% predicted) | FEV1/FVC | |
|---|---|---|---|---|---|---|---|
| FEV1 | FVC | ||||||
| Female | 25 | White | 0 | Never | 3.37 (96) | 3.84 (95) | 0.88 |
| Male | 68 | White | 0 | Never | 4.96 (109) | 6.20 (114) | 0.80 |
| Female | 61 | White | 0 | Never | 2.36 (89) | 2.55 (75) | 0.93 |
| Female | 73 | White | 0 | Never | 1.94 (93) | 2.21 (84) | 0.88 |
| Female | 59 | White | 0 | Never | 2.24 (94) | 2.54 (98) | 0.88 |
| Male | 22 | White | 0 | Never | 4.62 (100) | 5.92 (106) | 0.78 |
| Female | 24 | Black | 0 | Never | 2.72 (82) | 2.98 (82) | 0.91 |
| Male | 53 | White | 0 | Never | 3.95 (111) | 4.17 (94) | 0.95 |
| Female | 55 | Black | 20 | Former | 2.21 (82) | 2.82 (84) | 0.78 |
| Female | 55 | Black | 20 | Former | 2.30 (91) | 2.51 (91) | 0.92 |
| Female | 71 | White | 21 | Former | 1.72 (85) | 1.97 (79) | 0.88 |
| Male | 58 | White | 48 | Former | 3.30 (98) | 4.16 (98) | 0.79 |
| Male | 43 | White | 22.5 | Current | 4.41 (119) | 5.34 (116) | 0.83 |
| Female | 59 | White | 10.5 | Current | 2.59 (81) | 3.00 (86) | 0.86 |
| Male | 58 | White | 28.5 | Current | 3.50 (89) | 4.59 (95) | 0.76 |
| Male | 53 | White | 10.5 | Current | 3.02 (83) | 3.46 (77) | 0.87 |
| Male | 38 | White | 33 | Current | 4.74 (108) | 5.63 (106) | 0.84 |
| Male | 63 | White | 35 | Current | 3.12 (88) | 3.83 (87) | 0.81 |
| Male | 67 | Black | 25 | Current | 2.28 (63) | 3.56 (80) | 0.64 |
| Female | 59 | White | 20 | Current | 1.38 (53) | 2.00 (60) | 0.69 |
FEV1, forced expiratory volume in 1s; FVC, forced vital capacity; Pack-yrs, number of packs smoked per day times number of years smoked.
FIG 2Relationship between the number of 16S rRNA gene copies in a sample and the reproducibility of the result. (A) Intrareplicate Bray-Curtis distance by sample type. Relationship between the number of bacterial 16S rRNA gene copies in a sample and the intrareplicate Bray-Curtis distance between replicates of respiratory specimens and their individual controls. (B) Mean (± SEM) number of 16S rRNA gene copies per sample by sample type. (C) Concentration of the number of bacterial 16S rRNA gene copies in a sample plotted against the intrareplicate Bray-Curtis distance between replicates of respiratory specimens and their individual controls. Each value is the mean ± SEM.
FIG 3Comparison of the representation of the lung microbiome in EBC versus CBAL. (A) Principal component analysis (PCA) graph depicting CBAL samples (red) and scope prewash controls (blue). (B) PCA graph depicting EBC (red) and EBC controls (blue). (C) 3D scatterplot of CBAL sample OTU abundances, where each replicate is plotted on a separate axis. Common signals between each replicate should appear along the diagonal of the 3D box. Drop lines anchor the points to a position in the x-y plane, whereas color reflects higher abundances along the z axis. (D) 3D scatterplot of EBC sample OTU abundances, where each replicate is plotted on a separate axis. Drop lines anchor the points to a position in the x-y plane, whereas color reflects higher abundances along the z axis. (E) Rank abundance plots of the means of replicate medians of EBC (top) compared to CBAL fluid (bottom). Plots are ordered according to the mean abundances of the CBAL samples. Insets are the sample controls (EBC control and scope prewash control, respectively) ordered by mean abundances of CBAL samples. Bars show means of replicate medians ± SEM and are colored by the phylum of the OTU.