| Literature DB >> 29051756 |
Martin F Laursen1, Marlene D Dalgaard2, Martin I Bahl1.
Abstract
Profiling of microbial community composition is frequently performed by partial 16S rRNA gene sequencing on benchtop platforms following PCR amplification of specific hypervariable regions within this gene. Accuracy and reproducibility of this strategy are two key parameters to consider, which may be influenced during all processes from sample collection and storage, through DNA extraction and PCR based library preparation to the final sequencing. In order to evaluate both the reproducibility and accuracy of 16S rRNA gene based microbial profiling using the Ion Torrent PGM platform, we prepared libraries and performed sequencing of a well-defined and validated 20-member bacterial DNA mock community on five separate occasions and compared results with the expected even distribution. In general the applied method had a median coefficient of variance of 11.8% (range 5.5-73.7%) for all 20 included strains in the mock community across five separate sequencing runs, with underrepresented strains generally showing the largest degree of variation. In terms of accuracy, mock community species belonging to Proteobacteria were underestimated, whereas those belonging to Firmicutes were mostly overestimated. This could be explained partly by premature read truncation, but to larger degree their genomic GC-content, which correlated negatively with the observed relative abundances, suggesting a PCR bias against GC-rich species during library preparation. Increasing the initial denaturation time during the PCR amplification from 30 to 120 s resulted in an increased average relative abundance of the three mock community members with the highest genomic GC%, but did not significantly change the overall evenness of the community distribution. Therefore, efforts should be made to optimize the PCR conditions prior to sequencing in order to maximize accuracy.Entities:
Keywords: 16S rRNA gene sequencing; accuracy; genomic GC content; ion torrent PGM; mock community; reproducibility
Year: 2017 PMID: 29051756 PMCID: PMC5633598 DOI: 10.3389/fmicb.2017.01934
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Counts of raw sequencing reads, after primer/length trim and after quality filtering.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Run 1 | 23,908 | 17,295 | 72.3 | 15,849 | 66.3 |
| Run 2 | 19,258 | 13,673 | 71.0 | 12,525 | 65.0 |
| Run 3 | 31,729 | 20,279 | 63.9 | 18,680 | 58.9 |
| Run 4 | 53,484 | 40,148 | 75.1 | 37,721 | 70.5 |
| Run 5 | 23,521 | 13,672 | 58.1 | 12,565 | 53.4 |
| Average | 30,380 | 21,013 | 69.2 | 19,468 | 62.8 |
Collapsing of the 31 identified OTUs into the respective mock community species based on BLAST identity score against the 16S rRNA gene database at NCBI.
|
|
|
| |
|---|---|---|---|
| 1 |
| OTU_12 | 100 |
| 2 |
| OTU_8 | 99 |
| 3 |
| OTU_2 | 100 |
| 4 |
| OTU_14 | 100 |
| OTU_24 | 99 | ||
| OTU_31 | 97 | ||
| 5 |
| OTU_1 | 100 |
| OTU_27 | 99 | ||
| 6 |
| OTU_19 | 98 |
| OTU_29 | 99 | ||
| 7 |
| OTU_10 | 100 |
| OTU_28 | 98 | ||
| 8 |
| OTU_18 | 100 |
| OTU_23 | 99 | ||
| 9 |
| OTU_11 | 100 |
| OTU_26 | 98 | ||
| 10 |
| OTU_9 | 100 |
| 11 |
| OTU_5 | 100 |
| 12 |
| OTU_16 | 100 |
| OTU_30 | 98 | ||
| 13 |
| OTU_3 | 100 |
| 14 |
| OTU_17 | 100 |
| OTU_25 | 99 | ||
| 15 |
| OTU_15 | 100 |
| 16 |
| OTU_6 | 100 |
| 17 |
| OTU_20 | 99 |
| OTU_21 | 99 | ||
| OTU_22 | 99 | ||
| 18 |
| OTU_7 | 100 |
| 19 |
| OTU_4 | 100 |
| 20 |
| OTU_13 | 100 |
BLAST identity score against S. epidermidis = 98%.
BLAST identity score against S. aureus = 97%.
BLAST identity score against S. mutans < 97% and S. pneumoniae < 97%.
BLAST identity score against S. agalactiae < 97% and S. pneumoniae < 97%.
BLAST identity score against S. mutans < 97% and S. agalactiae < 97%.
Figure 1Relative abundance estimates of the 20-member mock community compared with the expected. Columns from left to right: Expected relative abundances, average relative abundances using the raw reads, average relative abundances using the processed reads and relative abundances in each of the five sequencing runs.
Relative abundance estimates of the 20 species in the mock community in 5 separate runs and on average with coefficient of variation.
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
|
| 5.00 | 0.18 | 0.11 | 0.32 | 0.44 | 0.53 | 0.32 | 55.3 |
|
| 5.00 | 4.25 | 4.85 | 5.11 | 5.88 | 4.60 | 4.94 | 12.5 |
|
| 5.00 | 7.53 | 9.33 | 6.30 | 6.80 | 6.88 | 7.37 | 16.0 |
|
| 5.00 | 5.17 | 4.89 | 3.69 | 3.39 | 3.76 | 4.18 | 19.0 |
|
| 5.00 | 8.68 | 6.70 | 8.47 | 8.89 | 8.59 | 8.27 | 10.8 |
|
| 5.00 | 6.03 | 5.60 | 6.32 | 6.69 | 6.15 | 6.16 | 6.5 |
|
| 5.00 | 7.02 | 6.05 | 6.53 | 6.30 | 6.43 | 6.47 | 5.5 |
|
| 5.00 | 6.13 | 4.70 | 6.16 | 5.44 | 6.11 | 5.71 | 11.2 |
|
| 5.00 | 4.90 | 3.98 | 5.23 | 5.21 | 4.46 | 4.76 | 11.2 |
|
| 5.00 | 4.67 | 4.10 | 4.12 | 4.17 | 4.27 | 4.27 | 5.5 |
|
| 5.00 | 8.98 | 8.83 | 8.01 | 7.41 | 8.97 | 8.44 | 8.3 |
|
| 5.00 | 6.60 | 5.55 | 5.82 | 5.64 | 5.78 | 5.88 | 7.1 |
|
| 5.00 | 3.29 | 2.83 | 2.73 | 2.76 | 2.56 | 2.83 | 9.7 |
|
| 5.00 | 14.64 | 19.88 | 15.41 | 16.68 | 15.68 | 16.46 | 12.4 |
|
| 5.00 | 1.34 | 2.10 | 2.32 | 2.02 | 1.86 | 1.93 | 19.1 |
|
| 5.00 | 0.37 | 0.61 | 3.42 | 2.09 | 3.33 | 1.96 | 73.7 |
|
| 5.00 | 4.71 | 5.92 | 3.15 | 3.28 | 3.15 | 4.04 | 30.7 |
|
| 5.00 | 0.71 | 0.14 | 1.98 | 1.00 | 1.90 | 1.15 | 68.8 |
|
| 5.00 | 4.30 | 3.44 | 3.87 | 3.80 | 3.91 | 3.86 | 7.9 |
|
| 5.00 | 0.50 | 0.39 | 1.04 | 2.11 | 1.07 | 1.02 | 66.7 |
Figure 2Accuracy of the abundance estimates across all five sequencing runs for each bacterial species, expressed as the Log10 ratio of measured relative abundance to the expected relative abundance. Boxplot show the median with 25 and 75 percentiles within the box and whiskers show the range. Dashed line indicates the expected relative abundance of 5%.
Figure 3Correlation between genomic GC content and average abundance estimates for the 20-member mock community. Dots are colored according to phylum (Blue: Bacteroidetes, Purple: Proteobacteria, Green: Actinobacteria, Red: Firmicutes, and Yellow: Deinococcus-Thermus). Spearman's rank correlation coefficient (rho) and resulting p-value for the association is shown in the box.
Figure 4Relative abundance estimates of the 20-member mock community following different initial denaturing times of 30 s (n = 13) or 120 s (n = 8) during the library preparation PCR. (A) Bar plots of mean relative abundances of all mock community members, (B) Evenness of the mock community with dashed line indicating the expected evenness of 1 and (C–E) Bar plots showing mean relative abundance + sd for the three mock community members with highest GC-content. Statistical significance is evaluated by t-test.