| Literature DB >> 32229603 |
Stefan Moritz Neuenschwander1, Miguel Angel Terrazos Miani1, Heiko Amlang1, Carmen Perroulaz1, Pascal Bittel1, Carlo Casanova1, Sara Droz1, Jean-Pierre Flandrois2, Stephen L Leib1, Franziska Suter-Riniker1, Alban Ramette3.
Abstract
Amplicon sequencing of the 16S rRNA gene is commonly used for the identification of bacterial isolates in diagnostic laboratories and mostly relies on the Sanger sequencing method. The latter, however, suffers from a number of limitations, with the most significant being the inability to resolve mixed amplicons when closely related species are coamplified from a mixed culture. This often leads to either increased turnaround time or absence of usable sequence data. Short-read next-generation sequencing (NGS) technologies could solve the mixed amplicon issue but would lack both cost efficiency at low throughput and fast turnaround times. Nanopore sequencing developed by Oxford Nanopore Technologies (ONT) could solve those issues by enabling a flexible number of samples per run and an adjustable sequencing time. Here, we report on the development of a standardized laboratory workflow combined with a fully automated analysis pipeline LORCAN (long read consensus analysis), which together provide a sample-to-report solution for amplicon sequencing and taxonomic identification of the resulting consensus sequences. Validation of the approach was conducted on a panel of reference strains and on clinical samples consisting of single or mixed rRNA amplicons associated with various bacterial genera by direct comparison to the corresponding Sanger sequences. Additionally, simulated read and amplicon mixtures were used to assess LORCAN's behavior when dealing with samples with known cross-contamination levels. We demonstrate that by combining ONT amplicon sequencing results with LORCAN, the accuracy of Sanger sequencing can be closely matched (>99.6% sequence identity) and that mixed samples can be resolved at the single-base resolution level. The presented approach has the potential to significantly improve the flexibility, reliability, and availability of amplicon sequencing in diagnostic settings.Entities:
Keywords: 16S RNA gene; bioinformatics; clinical methods; diagnostics; nanopore; sequencing; taxonomy
Mesh:
Substances:
Year: 2020 PMID: 32229603 PMCID: PMC7269405 DOI: 10.1128/JCM.00060-20
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 5.948
FIG 1(A) Overview of the wet laboratory workflow. Steps of the LORCAN analysis (B) and corresponding sections of the generated report (C). (Step 1) Demultiplexing and adapter trimming. (Step 2) Read filtering by size. (Step 3) Mapping to a reference database. (Step 4) Read extraction, binning by species, and remapping. (Step 5) Consensus calling. (Step 6) Selection of the closest references by BLAST. (Step 7) Taxonomic tree building.
Validation of taxonomic classification of ATCC reference strains
| ATCC strain reference no. | Taxonomy | SANGER consensus sequence leBIBI QBPP taxonomy | |||
|---|---|---|---|---|---|
| leBIBI QBPP taxonomy | |||||
| 33560 | [ | [ | 99.77 | ||
| 43504 | [ | [ | 99.54 | ||
| 29212 | [ | [ | 100.00 | ||
| 25922 | [ | [ | 99.57 | ||
| 49247 | [ | [ | 98.94 | ||
| 49226 | [ | [ | 100.00 | ||
| 27853 | [ | [ | 99.78 | ||
| 25923 | [ | [ | 99.79 | ||
| 49619 | [ | [ | 99.79 | ||
| 29741 | [ | [ | 99.78 | ||
| 43055 | [ | [ | 99.32 | ||
| 51299 | [ | [ | 100.00 | ||
| 8176 | [ | [ | 100.00 | ||
| BAA-1705 | [ | [ | 98.93 | ||
| 13637 | [ | [ | 100.00 | ||
Samples were analyzed in parallel by Sanger sequencing and with the LORCAN approach. The resulting consensus sequences were submitted to the online taxonomic identification platform leBIBI QBPP.
Square brackets indicate proximal clusters. Asterisks indicate closest sequences based on patristic distances.
FIG 3Influence of reference database completeness on consensus sequence accuracy. Each consensus sequence was compared to a consensus sequence produced with a perfectly matching reference sequence. Additionally, each consensus sequence was identified by BLAST similarity search against the full reference database. The uneven spacing of the data points reflects the database composition after subsetting. Missing values are a result of insufficient numbers of reads mapping to the reference database. (A) Filled circles indicate correct taxonomic identification of the ATCC strains. The low identities and unsuccessful identification of Eggerthella lenta are a result of a low-level contamination in combination with unsuccessful mapping of the Eggerthella reads. (B) The diameter of the circles is proportional to the number of reads mapped and further used in the consensus generation step (obtained from the LORCAN output). Additional detail is provided in Table S3 and Fig. S10 in the supplemental material.
FIG 2Taxonomic analysis of amplicon mixtures by LORCAN. Amplicons from Staphylococcus aureus, Enterococcus faecalis, and Pseudomonas aeruginosa mixed after PCR amplification (A) and mixed in silico from reads obtained from pure amplicons (B). Standard deviations indicate the variability across three independent replicate samples. None of the observed ratios was significantly different from the expected ratios (chi-square test for expected probabilities; P > 0.99). (C) In silico mixtures of Mycobacterium gordonae and Mycobacterium avium.
FIG 4Cost estimate based on current list prices in Switzerland (currency CHF, December 2019). Prices for Illumina and Nanopore sequencing include reagents and consumables; prices for Sanger sequencing correspond to the rates at a large local service provider. The lines of MiniSeq and MiSeq v3 are confounded in the figure. Detail is provided in Table S4 in the supplemental material.