| Literature DB >> 30083440 |
Andrew J Page1,2, Jacqueline A Keane2.
Abstract
Genome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types (STs), allowing, in many cases, to rule a sample out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long-read sequencing technologies, such as from Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short-read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a ST directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 700 isolates sequenced using long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore. It provides STs for isolates on average within 90 s, with a sensitivity of 94% and specificity of 97% on real sample data, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.Entities:
Keywords: Bioinformatics; Microbial; Multi-locus sequence typing; Nanopore; Pacbio
Year: 2018 PMID: 30083440 PMCID: PMC6074768 DOI: 10.7717/peerj.5233
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Flowchart of the Krocus method.
The square boxes denote processes that the data undergoes, the diamond shapes denote a decision point, the box with a wavy lower line denotes a data file, and the tabs denote an output.
NCTC 3000 PacBio sequenced samples with results after analysis with Krocus.
| Species | No. of samples | No. of unique STs | No. in agreement | Mean wall time (s) | Mean reads |
|---|---|---|---|---|---|
| 226 | 204 | 204 | 102 | 17,524 | |
| 11 | 10 | 10 | 42 | 19,900 | |
| 113 | 101 | 109 | 62 | 22,297 | |
| 22 | 21 | 18 | 127 | 41,444 | |
| 114 | 92 | 111 | 122 | 16,255 | |
| 16 | 16 | 16 | 32 | 10,412 | |
| 48 | 46 | 47 | 107 | 16,348 | |
| 47 | 47 | 47 | 37 | 7,714 | |
| 112 | 101 | 106 | 57 | 15,024 | |
| Total | 709 | 638 | 668 | 86 | 17,439 |
Note:
An ST is said to be in agreement if it matches the ST called by TS-mlst from a de novo assembly.
Figure 2The reads and time to correctly predict an ST for each PacBio NCTC species analysed.
(A) Number of reads analysed before the Krocus correctly predicted an ST for each PacBio NCTC species analysed. (B) Time in seconds before Krocus correctly predicted an ST for each PacBio NCTC species analysed.