| Literature DB >> 31040829 |
Nilay Peker1, Sharron Garcia-Croes1, Brigitte Dijkhuizen1, Henry H Wiersma1, Evert van Zanten2, Guido Wisselink2, Alex W Friedrich1, Mirjam Kooistra-Smid1,2, Bhanu Sinha1, John W A Rossen1, Natacha Couto1.
Abstract
Rapid and reliable identification of bacterial pathogens directly from patient samples is required for optimizing antimicrobial therapy. Although Sanger sequencing of the 16S ribosomal RNA (rRNA) gene is used as a molecular method, species identification and discrimination is not always achievable for bacteria as their 16S rRNA genes have sometimes high sequence homology. Recently, next generation sequencing (NGS) of the 16S-23S rRNA encoding region has been proposed for reliable identification of pathogens directly from patient samples. However, data analysis is laborious and time-consuming and a database for the complete 16S-23S rRNA encoding region is not available. Therefore, a better, faster, and stronger approach is needed for NGS data analysis of the 16S-23S rRNA encoding region. We compared speed and diagnostic accuracy of different data analysis approaches: de novo assembly followed by Basic Local Alignment Search Tool (BLAST), operational taxonomic unit (OTU) clustering, or mapping using an in-house developed 16S-23S rRNA encoding region database for the identification of bacterial species. De novo assembly followed by BLAST using the in-house database was superior to the other methods, resulting in the shortest turnaround time (2 h and 5 min), approximately 2 h less than OTU clustering and 4.5 h less than mapping, and a sensitivity of 80%. Mapping was the slowest and most laborious data analysis approach with a sensitivity of 60%, whereas OTU clustering was the least laborious approach with 70% sensitivity. Although the in-house database requires more sequence entries to improve the sensitivity, the combination of de novo assembly and BLAST currently appears to be the optimal approach for data analysis.Entities:
Keywords: OTU clustering; clinical microbiology; de novo assembly; diagnostics; mapping; metagenomics; next-generation sequencing
Year: 2019 PMID: 31040829 PMCID: PMC6476902 DOI: 10.3389/fmicb.2019.00620
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Proportion of reads corresponding to bacteria identified at the genus level and at the species level using three different sequencing protocols.
FIGURE 2The workflow used for processing of the NGS data.
Comparison of BLAST analysis results for NGS positive samples using the local database and the NCBI database.
| BLAST on local database | BLAST on NCBI database | ||||
|---|---|---|---|---|---|
| Sample | % of reads against contig | Species | % Identity | Species | % Identity |
| 1# | 0.5% | 100% | 100% | ||
| 2# | 0.4% | 100% | 100% | ||
| 10# | 0.4% | 100% | 98% | ||
| 0.7% | 100% | 100% | |||
| 11# | 0.4% | 100% | 99% | ||
| 15## | 0.4% | 93% | 93% | ||
| 17# | 0.7% | 100% | 99% | ||
| 18# | 95.8% | 100% | 99% | ||
| 0.32% | 96% | – | – | ||
| 20## | 74.3% | 100% | 100% | ||
| 21## | 61.1% | 100% | 99% | ||
| 25# | 88.9% | 100% | 100% | ||
| 26# | 80.2% | 100% | 99% | ||
| 2.0% | – | – | 100% | ||
| 27## | 10.0% | 97% | 97% | ||
| 6.2% | 100% | 99% | |||
| 1.4% | 99% | 99% | |||
| 1.0% | 99% | 96% | |||
| Time∗ | CLC analysis | ∼1 h 20 min | ∼1 h 20 min | ||
| Hands on | ∼45 min | ∼4 h | |||
| Total | ∼2 h 5 min | ∼5 h 20 min | |||
Bacterial species identified by NGS of 16S–23S rRNA encoding region using three different data analysis approaches and 16S rRNA gene Sanger sequencing and culturing.
| NGS of 16S–23S rRNA encoding region | Conventional methods | |||||
|---|---|---|---|---|---|---|
| OTU clustering (cut-off: 0.2%) | Mapping (cut-off: 0.4%) | 16S rRNA gene Sanger sequencing | Culturing | |||
| 1# | Negative | Negative | ||||
| 3# | Negative | Negative | Negative | Negative | ||
| 6## | Negative | Negative | Negative | Negative | ||
| 10# | Negative | Negative | ||||
| 11# | Negative | Negative | Negative | Negative | ||
| 15## | Negative | Negative | Negative | Negative | ||
| 17# | Negative | Negative | Negative | |||
| 18# | Negative | |||||
| 20## | Negative | |||||
| 21## | Negative | |||||
| 23# | Negative | Negative | Negative | Negative | ||
| 24# | Negative | Negative | Negative | Negative | ||
| 25# | Negative | |||||
| 26# | Negative | |||||
| 27## | Negative | |||||
| Time∗ | CLC analysis | ∼1 h 20 min | ∼3 h | ∼2 h 30 min | ||
| Hands on | ∼45 min | ∼1 h | ∼4 h | |||
| Total | ∼2 h 5 min | ∼4 h | ∼6 h 30 min | |||
Sensitivity and specificity for all three data analysis approaches∗.
| Sensitivity in % (95% CI) | Specificity in % (95% CI) | |
|---|---|---|
| 80 (44.4–97.5) | 88 (61.6–98.5) | |
| OTU clustering | 70 (34.7–93.3) | 94 (69.8–99.8) |
| Mapping | 60 (26.2–87.8) | 94 (69.8–99.8) |