| Literature DB >> 30333179 |
Zhengyao Xue1, Mary E Kable1, Maria L Marco2.
Abstract
DNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and genomic DNA (gDNA) mock communities encompassing nine bacterial species commonly found in milk and dairy products. The two communities comprised strain-specific DNA that was pooled before (gDNA) or after (PCR amplicon) the PCR step. The communities were sequenced on the Illumina MiSeq and Ion Torrent PGM platforms and then analyzed using the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) analysis pipelines with taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) that ranged from 13 to 118 and were dependent on the DNA sequencing method and read assembly steps. The additional 4 to 109 OTUs/ASVs (from 9 OTUs/ASVs) included assignments to spurious taxa and sequence variants of the 9 species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNAs from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs/ASVs. However, the DNA sequencing method and paired-end read assembly steps conferred the largest effects on predictions of bacterial diversity, with effect sizes of 0.88 (Bray-Curtis) and 0.32 (weighted Unifrac), independent of the mock community type. Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity.IMPORTANCE Validated methods are urgently needed to improve DNA sequence-based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk- and dairy-associated environments.Entities:
Keywords: 16S rRNA; DNA sequencing; dairy; microbiome; microbiota; milk
Mesh:
Substances:
Year: 2018 PMID: 30333179 PMCID: PMC6193606 DOI: 10.1128/mSphere.00410-18
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
FIG 1Schematic diagram of the experimental design. Genomic DNAs were individually prepared from nine bacterial broth cultures, purified, and combined for the gDNA mock community. Additionally, each gDNA was amplified separately and pooled for the PCR amplicon mock community.
Bacterial strains and expected relative abundances in the gDNA mock community
| Strain | No. of 16S | % of | Genome |
|---|---|---|---|
| 10 | 16.67 | NA | |
| 6 | 2.29 | ||
| 1 | 2.78 | ||
| 4 | 9.28 | ||
| 7 | 8.95 | NA | |
| 6 | 17.82 | ||
| 6 | 7.08 | ||
| 5 | 12.44 | NA | |
| 7 | 22.7 | NA |
Number of 16S rRNA gene copies per genome based on genome reference.
Percentage of total bacterial 16S rRNA gene in the mock community according to DNA concentration.
NA, not available. For strains that lack whole-genome sequences, the genome sizes and 16S rRNA gene copy numbers of the reference strain were used (69–72).
OTU/ASV distribution of the 16S rRNA PCR amplicon mock community following Illumina MiSeq DNA sequencing and paired-end assembly
| Taxonomy | No. (%) of OTUs/ASVs by | |||
|---|---|---|---|---|
| QIIME 1 | DADA2 | |||
| Greengenes | RDP | Greengenes | RDP | |
| 1 (100) | ||||
| 10 (83) | 3 (99) | 2 (90) | 2 (90) | |
| 3 (73) | 3 (51) | 1 (100) | ||
| 3 (87) | 5 (99) | 4 (86) | 5 (85) | |
| 14 (80) | 7 (80) | 2 (91) | 2 (91) | |
| 1 (100) | ||||
| 4 (99) | 2 (99) | 3 (85) | 3 (85) | |
| 4 (99) | 9 (71) | 2 (88) | ||
| 2 (88) | ||||
| 6 (99) | 1 (100) | 3 (89) | 3 (89) | |
| 10 (74) | 6 (75) | 3 (85) | 3 (85) | |
| 9 (99) | 4 (80) | 3 (89) | 3 (89) | |
| 2 (80) | 2 (99) | 2 (89) | 2 (89) | |
| Other | 20 (17) | 26 (71) | 13 (20) | 13 (20) |
| Sum | 85 | 70 | 38 | 38 |
Each value represents the average number of OTUs/ASVs (n = 3) and mean percentage of sequence reads assigned to the most abundant OTU/ASV within that taxon.
OTU/ASV distribution of the 16S rRNA PCR amplicon mock community following Illumina MiSeq DNA sequencing without paired-end assembly
| Taxonomy | No. (%) of OTUs/ASVs by | |||
|---|---|---|---|---|
| QIIME 1 | DADA2 | |||
| Greengenes | RDP | Greengenes | RDP | |
| 9 (97) | 2 (99) | 3 (67) | 1 (100) | |
| 1 (100) | 1 (100) | 1 (100) | ||
| 3 (99) | 3 (99) | 3 (99) | 3 (99) | |
| 9 (96) | 7 (95) | 2 (75) | 2 (75) | |
| 1 (100) | ||||
| 2 (99) | 2 (99) | 2 (60) | 2 (60) | |
| 4 (99) | 8 (93) | |||
| 1 (100) | ||||
| 1 (100) | ||||
| 6 (99) | 2 (99) | 1 (100) | 1 (100) | |
| 11 (91) | 4 (99) | 2 (65) | 2 (65) | |
| 12 (98) | 3 (99) | 2 (76) | 2 (76) | |
| 4 (92) | 2 (99) | 1 (100) | 1 (100) | |
| Other | 6 (28) | 2 (91) | 22 (12) | 25 (89) |
| Sum | 68 | 36 | 40 | 40 |
Each value represents the average number of OTUs/ASVs (n = 3) and mean percentage of sequence reads assigned to the most abundant OTU/ASV within that taxon.
OTU/ASV distribution of the 16S rRNA PCR amplicon mock community following Ion Torrent PGM sequencing
| Taxonomy | No. (%) of OTUs/ASVs by | |||
|---|---|---|---|---|
| QIIME 1 | DADA2 | |||
| Greengenes | RDP | Greengenes | RDP | |
| 3 (88) | ||||
| 21 (75) | 8 (95) | 2 (95) | 1 (100) | |
| 5 (70) | 7 (99) | 1 (100) | 1 (100) | |
| 15 (75) | 11 (88) | 1 (100) | 1 (100) | |
| 2 (56) | 2 (50) | |||
| 7 (98) | 2 (99) | 1 (100) | 1 (100) | |
| 13 (95) | 17 (60) | 1 (100) | ||
| 1 (100) | ||||
| 6 (99) | 1 (100) | 2 (99) | 2 (99) | |
| 13 (71) | 8 (66) | 2 (99) | 2 (99) | |
| 21 (97) | 7 (65) | 2 (99) | 2 (99) | |
| 6 (83) | 2 (99) | 1 (100) | 1 (100) | |
| Other | 6 (40) | 2 (71) | 0 | 1 (100) |
| Sum | 118 | 67 | 13 | 13 |
Each value represents the average number of OTUs/ASVs (n = 3) and mean percentage of sequence reads assigned to the most abundant OTU/ASV within that taxon.
ASV distribution of the gDNA mock community following different sequencing methods
| Taxonomy | No. (%) of OTUs/ASVs by | ||
|---|---|---|---|
| Illumina | Ion Torrent | ||
| Paired end | Single end | ||
| 4 (86) | 2 (96) | 2 (95) | |
| 3 (92) | 1 (100) | ||
| 4 (80) | 2 (94) | 1 (100) | |
| 2 (96) | 1 (100) | 1 (100) | |
| 3 (88) | 1 (100) | 1 (100) | |
| 2 (96) | 1 (100) | 1 (100) | |
| 3 (88) | 3 (67) | 2 (99) | |
| 3 (94) | 1 (100) | 2 (99) | |
| 2 (90) | 2 (99) | 2 (99) | |
| 3 (90) | 2 (99) | 1 (100) | |
| Other | 116 (12) | 111 (10) | 0 |
| Sum | 145 | 127 | 13 |
Results are based on the DADA2 analysis pipeline with the Greengenes database.
Each value represents the average number of OTUs/ASVs (n = 3) and mean percentage of sequence reads assigned to the most abundant OTU/ASV within that taxon.
FIG 2α diversity measurements of mock community samples. Shown is the Shannon index of (A) the PCR amplicon mock community and (B) the gDNA mock community. The results shown were analyzed following the DADA2 pipeline and Greengenes database. Each bar represents the mean ± standard deviation (SD) from three replicates. α diversity measurements for each community were compared to expected values using ANOVA with Bonferroni’s multiple-comparison test. P values of <0.05 were considered to be significantly different from the expected values and are indicated by an asterisk above each bar plot.
FIG 3Relative proportions of taxa and UPGMA hierarchical clustering of the mock communities. UPGMA hierarchical clustering was based on the (A) Bray-Curtis dissimilarity matrix and (B) weighted Unifrac distance matrix. Expected taxa (9 bacterial species) are labeled with the corresponding taxonomic level from the DNA sequencing results. Each bar contains the results from each of the three mock community replicates tested using different DNA sequencing methods. The results shown were analyzed following the DADA2 pipeline with the Greengenes database.
FIG 4Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock communities. Relative abundances of expected taxa are labeled with the corresponding taxonomic level from sequencing results. “Amplicon” represents the 16S rRNA PCR amplicon mock community, and “gDNA” represents the gDNA mock community. The results shown were analyzed following the DADA2 pipeline with the Greengenes database. Each bar represents the mean ± SD from three replicates. Proportions for each community were compared to expected proportions using ANOVA with Bonferroni’s multiple-comparison test. P values of <0.05 were considered to be significantly different from the expected values and are indicated by an asterisk above each bar plot.