| Literature DB >> 30201048 |
Jeffrey M Bender1, Fan Li2, Helty Adisetiyo2, David Lee3, Sara Zabih3, Long Hung2, Thomas A Wilkinson2, Pia S Pannaraj1, Rosemary C She4, Jennifer Dien Bard1, Nicole H Tobin3, Grace M Aldrovandi5.
Abstract
BACKGROUND: Recent advances in sequencing technologies and bioinformatics tools have allowed for large-scale microbiome studies that are rapidly advancing medical research. However, small changes in technique or analysis can significantly alter the results and lead to conflicting findings. Quantifying the technical versus biological variation expected in targeted 16S rRNA gene sequencing studies and how this variation changes with input biomass is critical to guide meaningful interpretation of the current literature and plan future research.Entities:
Keywords: Accuracy; Biological variation; Biomass; Precision; Technical variation
Mesh:
Substances:
Year: 2018 PMID: 30201048 PMCID: PMC6131952 DOI: 10.1186/s40168-018-0543-z
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Overview of 19 sequencing runs with bacterial mock positive controls, repeat stool samples, and negative controls over a 2.5-year time period. All samples were processed in the same manner and run on the same sequencing machine. Run 18 includes a prospective dilution study of the mock bacterial controls. Numbers in parenthesis include the samples initially in analysis before negative control filtering
| Sample type | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Run 11 | Run 12 | Run 13 | Run 14 | Run 15 | Run 16 | Run 17 | Run 18 | Run 19 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bacterial mock | 5 | 7 | 2 | 19 | 4 | 3 | 3 | 2 | 2 | 3 | 12 | 1 | 13 | 4 | 14 | 105 (110) | 14 | 213 (218) | ||
| Stool | 8 | 15 | 6 | 29 | ||||||||||||||||
| Negatives | (5) | (7) | (12) | (1) | (3) | (7) | (5) | (6) | (7) | (11) | (38) | (10) | (33) | (10) | 1(27) | (13) | 1 (27) | 2(222) | ||
| Analysis | ||||||||||||||||||||
| Mocks over time | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ^ | ● | ● | ● | * | ● | 117 | ||
| Biological vs technical variation | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ^ | ● | ● | ● | * | ● | 146 |
| Biomass | ● | 105 |
●Included in the analysis
^Excluded due to only having a single mock sample in this run
*Only includes 10 undiluted mock samples from this run
Fig. 1Bacterial mock community samples (n = 117) over time. a Taxonomic composition of bacterial mock community samples over the course approximately 2 years shown at the family level. Labeled boxes along the bottom denote individual sequencing runs. Only taxa with an average abundance of at least 1% are shown. b Principal coordinates analysis (PCoA) of bacterial mock community samples using Bray-Curtis distances. Numbers in brackets denote percent of variation explained. c Bray-Curtis distances (boxplots) and intraclass correlation coefficients (ICC) (line plots) stratified by sequencing run. ICC values are shown as means. d Heatmap of the coefficient of variation (CV) values for individual bacterial genera across sequencing runs. Grayscale cells on the left indicate mean relative abundances for each genus (also given as percentages in parentheses)
Fig. 2Overview of QC-filtered samples (n = 146). a Principal coordinates analysis (PCoA) on Bray-Curtis distances. Numbers in brackets denote percent of variation explained. b Boxplot of Bray-Curtis distances for bacterial mock and stool samples. Biological variation is shown between three samples from the same individual. Technical variation examines the same sample over multiple sequencing runs
Fig. 3Variation as a function of input biomass. a Spearman correlation between expected 16S rRNA gene copies per microliter and calculated 16S rRNA gene copies per microliter. Values are log10-transformed. b Bray-Curtis distances (boxplots) and intraclass correlation coefficients (ICC) (line plots) stratified by dilution constant (e.g., 1:1 means stock, 1:1000 means diluted 1000-fold). ICC values are shown as means. c Heatmap of the coefficient of variation (CV) values for individual genera stratified by dilution constant. Grayscale cells on the left indicate mean relative abundances for each genus (also given as percentages in parentheses). d Shannon diversity as a function of dilution constant
Fig. 4Modeling variation as a function of biomass and relative abundance. Spearman correlation between standard deviation in relative abundances (y-axis) and 16S rRNA gene copies/microliter (a) and mean relative abundance (b). All values are log10-transformed
Linear regression modeling of variation versus input biomass and mean relative abundance. Summary of the linear regression model (a) and predicted variation values for a subset of 16S rRNA gene copies/microliter and mean relative abundance values (b)
| a. Model | Estimate | Standard error | ||||
| (Intercept) | 1.43238 | 0.12556 | 11.408 | < 2.2E−16 | ||
| log10 copies/microliter | − 0.2816 | 0.04905 | − 5.741 | 6.93E−8 | ||
| log10 mean relative abundance | − 0.54936 | 0.06927 | − 7.931 | 1.13E−12 | ||
| Residual standard error | 0.5196 | |||||
| Multiple | 0.4496 | |||||
| Adjusted | 0.4407 | |||||
| 50.25 on DF (2123) | ||||||
| b. Prediction | Mean relative abundance (%) | |||||
| Copies/microliter | 1 | 5 | 10 | 25 | 50 | |
| Low biomass | 10 | 0.5723 | 1.7313 | 2.7888 | 5.2376 | 8.4370 |
| Medium biomass | 1000 | 0.2865 | 0.8668 | 1.3963 | 2.6224 | 4.2242 |
| High biomass | 100,000 | 0.1435 | 0.4340 | 0.6991 | 1.3130 | 2.1150 |