| Literature DB >> 31906254 |
Raf Winand, Bert Bogaerts, Stefan Hoffman, Loïc Lefevre, Maud Delvoye, Julien Van Braekel, Qiang Fu, Nancy Hc Roosens, Sigrid Cj De Keersmaecker, Kevin Vanneste.
Abstract
Rapid, accurate bacterial identification in biological samples is an important task for microbiology laboratories, for which 16S~rRNA gene Sanger sequencing of cultured isolates is frequently used. In contrast, next-generation sequencing does not require intermediate culturing steps and can be directly applied on communities, but its performance has not been extensively evaluated. We present a comparative evaluation of second (Illumina) and third (Oxford Nanopore Technologies (ONT)) generation sequencing technologies for 16S targeted genomics using a well-characterized reference sample. Different 16S gene regions were amplified and sequenced using the Illumina MiSeq, and analyzed with Mothur. Correct classification was variable, depending on the region amplified. Using a majority vote over all regions, most false positives could be eliminated at the genus level but not the species level. Alternatively, the entire 16S gene was amplified and sequenced using the ONT MinION, and analyzed with Mothur, EPI2ME, and GraphMap. Although >99\% of reads were correctly classified at the genus level, up to $\approx$40\% were misclassified at the species level. Both~technologies, therefore, allow reliable identification of bacterial genera, but can potentially misguide identification of bacterial species, and constitute viable alternatives to Sanger sequencing for rapid analysis of mixed samples without requiring any culturing steps.Entities:
Keywords: 16S~rRNA; Illumina; Nanopore; bacterial identification; orientation; public health; targeted genomics; targeted metagenomics
Mesh:
Substances:
Year: 2019 PMID: 31906254 PMCID: PMC6982111 DOI: 10.3390/ijms21010298
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Bacterial composition of the ZymoBIOMICS™ Microbial Community DNA Standard expressed as 16S rRNA gene percentages [45].
Overview of samples generated for MiSeq sequencing, their corresponding amplified regions, primer pairs employed for amplification, fragment lengths, and references.
| Sample Name | Amplified Region | Primer Pair | Fragment Length | Reference |
|---|---|---|---|---|
| 16S1 | V4 | 515F/806R1 | 292 | [ |
| 16S2 | V3–V4 | 341F1/806R2 | 466 | [ |
| 16S3 | V3–V4 | 341F2/805R | 400–500 | [ |
| 16S4 | V4–V5 | 515F/907R | 393 | [ |
| 16S5 | V4–V6 | 515F/1061R | 546 | [ |
| 16S6 | V6 | 926F/1061R | 135 | [ |
| 16S7 | V1–V3 | 8F/516R | 488 (614) | [ |
| 16S8 | V3–V4 | 341B4F/806R2 | 400–500 | [ |
| 16S9 | V8 | 1243F/1459R | 217 | [ |
| 16S10 | V8–V9 | 1522F/1189R1 | 370 | [ |
| 16S11 | V8–V9 | 1522F/1189R2 | 370 | [ |
Figure 2Correctly classified reads for all 16S rRNA gene regions at different taxonomic levels using the (A) SILVA and (B) NCBI 16S databases for Illumina MiSeq data. Percentages are expressed against the total number of reads classified at any taxonomic level (SILVA only allows classification until the genus).
Classification results for MiSeq data for the different gene regions and taxonomic levels using the SILVA and NCBI 16S databases. Read numbers are expressed as percentages.
| Sample | Number of Raw Contigs | SILVA | NCBI | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Classified Contigs | Family | Genus | Classified Contigs | Family | Genus | Species | ||||||||||||
| CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | ||||
| 16S1 | 442,223 | 387,306 | 99.71% | 0.03% | 0.27% | 48.18% | 0.03% | 51.79% | 384,501 | 99.84% | 0.03% | 0.13% | 80.92% | 0.03% | 19.05% | 41.41% | 8.09% | 50.50% |
| 16S2 | 370,429 | 307,971 | 99.97% | 0.02% | 0.02% | 88.16% | 0.02% | 11.81% | 308,254 | 99.98% | 0.02% | 0.00% | 99.91% | 0.02% | 0.07% | 57.57% | 23.29% | 19.14% |
| 16S3 | 480,933 | 351,488 | 99.82% | 0.03% | 0.16% | 87.05% | 0.08% | 12.87% | 352,246 | 99.90% | 0.03% | 0.07% | 99.52% | 0.03% | 0.45% | 59.70% | 22.89% | 17.41% |
| 16S4 | 336,363 | 193,320 | 99.61% | 0.06% | 0.33% | 71.24% | 0.09% | 28.67% | 206,531 | 99.70% | 0.05% | 0.24% | 90.40% | 0.05% | 9.55% | 52.19% | 13.34% | 34.48% |
| 16S5 | 377,697 | 114,743 | 99.77% | 0.00% | 0.23% | 99.74% | 0.00% | 0.26% | 114,777 | 99.88% | 0.00% | 0.12% | 93.01% | 0.00% | 6.99% | 48.56% | 14.02% | 37.42% |
| 16S6 | 550,931 | 506,640 | 68.91% | 0.20% | 30.89% | 66.94% | 0.01% | 33.05% | 522,128 | 58.16% | 0.01% | 41.83% | 57.34% | 0.01% | 42.66% | 33.32% | 0.01% | 66.68% |
| 16S7 | 455,347 | 268,527 | 99.98% | 0.02% | 0.00% | 99.98% | 0.02% | 0.00% | 242,467 | 99.98% | 0.02% | 0.00% | 99.98% | 0.02% | 0.00% | 54.54% | 27.47% | 17.99% |
| 16S8 | 557,407 | 440,145 | 99.94% | 0.01% | 0.05% | 80.17% | 0.04% | 19.79% | 440,811 | 99.98% | 0.01% | 0.01% | 99.93% | 0.01% | 0.05% | 58.32% | 23.11% | 18.57% |
| 16S9 | 259 | 103 | 49.51% | 50.49% | 0.00% | 29.13% | 0.00% | 70.87% | 50 | 100.00% | 0.00% | 0.00% | 64.00% | 0.00% | 36.00% | 16.00% | 24.00% | 60.00% |
| 16S10 | 158 | 0 | - | - | - | - | - | - | 0 | - | - | - | - | - | - | - | - | - |
| 16S11 | 210 | 16 | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 100.00% | 9 | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 100.00% |
Abbreviations: CC: correctly classified; MC: misclassified; UC: unclassified; * Unclassified reads are reads that are not classified at the given level but at any higher level.
Overview of bacteria identified in each gene region for MiSeq data using the SILVA database at different taxonomic levels. The first column lists the identification at the taxonomic level considered. The second column lists the total number of gene regions where the bacterium was identified. The next columns list the different samples (0 = not detected/1 = detected). Taxonomic names are colored per taxonomic level by a green gradient for bacteria present in the mock community (darker = correctly identified in more samples), and an orange gradient for bacteria not present in the mock community (darker = incorrectly identified in more samples).
| Total | 16S1 | 16S2 | 16S3 | 16S4 | 16S5 | 16S6 | 16S7 | 16S8 | |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
|
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
|
| |||||||||
|
| 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
|
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
|
|
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
|
|
| 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
|
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
|
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
* Bleed-through originating from sequencing another mixed sample on the same sequencing run containing almost exclusively Neisseria meningitidis. † Known as potential Illumina contaminants.
Overview of bacteria identified in each gene region for MiSeq data using the NCBI 16S database at different taxonomic levels. The first column lists the taxonomic name at the taxonomic level considered. The second column lists the total number of gene regions where the bacterium was identified. The next columns list the different samples (0 = not detected/1 = detected). Taxonomic names are colored per taxonomic level by a green gradient for bacteria present in the mock community (darker = correctly identified in more samples), and an orange gradient for bacteria not present in the mock community (darker = incorrectly identified in more samples).
| Total | 16S1 | 16S2 | 16S3 | 16S4 | 16S5 | 16S6 | 16S7 | 16S8 | |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
|
| |||||||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
|
| |||||||||
|
|
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
|
|
| 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
|
|
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
|
|
| 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
|
|
| 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
|
|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
* Bleed-through originating from sequencing another mixed sample on the same sequencing run containing almost exclusively Neisseria meningitidis. † Known as potential Illumina contaminants.
Figure 3Correctly classified reads for all bioinformatics workflows at different taxonomic levels using the (A) SILVA and (B) NCBI 16S databases for ONT MinION data. Percentages are expressed against the total number of reads classified at any taxonomic level (SILVA only allows classification until the genus, and EPI2ME only supports the NCBI 16S database).
Classification results for MinION data for the different bioinformatics workflows and taxonomic levels using the SILVA and NCBI 16S databases. The total number of analyzed reads was 10,000 for all three bioinformatics workflows while the number of reads each bioinformatics workflow could identify at any taxonomic level is indicated. The values of correctly classified, misclassified, and unclassified reads at a given level sum to the total number of classified reads at any level.
| Workflow | SILVA | NCBI 16S | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Family | Genus | Family | Genus | Species | |||||||||||
| CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | CC | MC | UC * | |
| Mothur | 4799 | 0 | 0 | 4799 | 0 | 0 | 4429 | 0 | 0 | 4428 | 1 | 0 | 2593 | 1836 | 0 |
| EPI2ME | - | - | - | - | - | - | 8673 | 30 | 1135 | 7265 | 68 | 2505 | 4436 | 2897 | 2505 |
| GraphMap | 6949 | 145 | 0 | 5783 | 1311 | 0 | 7777 | 35 | 0 | 7649 | 163 | 0 | 4739 | 3073 | 0 |
Abbreviations: CC: correctly classified; MC: misclassified; UC: unclassified. * Unclassified reads are reads that are not classified at the given level but at any higher level.