| Literature DB >> 31704961 |
Tonya L Taylor1,2, Jeremy D Volkening3, Eric DeJesus2, Mustafa Simmons2, Kiril M Dimitrov1,4, Glenn E Tillman2, David L Suarez1, Claudio L Afonso5,6.
Abstract
U.S. public health agencies have employed next-generation sequencing (NGS) as a tool to quickly identify foodborne pathogens during outbreaks. Although established short-read NGS technologies are known to provide highly accurate data, long-read sequencing is still needed to resolve highly-repetitive genomic regions and genomic arrangement, and to close the sequences of bacterial chromosomes and plasmids. Here, we report the use of long-read nanopore sequencing to simultaneously sequence the entire chromosome and plasmid of Salmonella enterica subsp. enterica serovar Bareilly and Escherichia coli O157:H7. We developed a rapid and random sequencing approach coupled with de novo genome assembly within a customized data analysis workflow that uses publicly-available tools. In sequencing runs as short as four hours, using the MinION instrument, we obtained full-length genomes with an average identity of 99.87% for Salmonella Bareilly and 99.89% for E. coli in comparison to the respective MiSeq references. These nanopore-only assemblies provided readily available information on serotype, virulence factors, and antimicrobial resistance genes. We also demonstrate the potential of nanopore sequencing assemblies for rapid preliminary phylogenetic inference. Nanopore sequencing provides additional advantages as very low capital investment and footprint, and shorter (10 hours library preparation and sequencing) turnaround time compared to other NGS technologies.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31704961 PMCID: PMC6841976 DOI: 10.1038/s41598-019-52424-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of the final raw data from MinION and Illumina.
| Sequence Method | Average Read Length | Total Bases | Min Read Length | Max Read Length | Average Read Quality | Read Number | Mean Depth |
|---|---|---|---|---|---|---|---|
| MiSeq ( | 149.51 | 288,633,579 | 35 | 151 | 36.66 | 1,930,511 | 57.72 |
| MinION ( | 8638.36 | 2,879,148,408 | 113 | 120,119 | 19.36 | 333,298 | 599.06 |
| MiSeq ( | 242.61 | 556,035,081 | 35 | 251 | 34.96 | 2,291,825 | 111.2 |
| MinION ( | 8979.55 | 3,860,389,678 | 85 | 112,643 | 19.38 | 429,909 | 692.19 |
aMiSeq Quality Standards = Q ≥ 30.
bMinION Quality Standards = Q ≥ 10.
Assembly data for MinION sequencing.
| Duration (min) | Reads | Subsampled Reads | Assembly Size | Circular Contigsa | Linear Contigs | Longest Contig | Longest Circular Contig | NG50b | Average identity in % | Reference Coverage |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| 15 | 7229 | 7229 | 1135723 | 0 | 19 | 163786 | 0 | 0 | 99.13 | 24.36 |
| 30 | 14888 | 14888 | 4577215 | 0 | 18 | 841969 | 0 | 471499 | 99.55 | 95.38 |
| 60 | 29132 | 29132 | 4722179 | 1 | 0 | 4722179 | 4722179 | 4722179 | 99.79 | 98.4 |
| 120 | 51226 | 51226 | 4805334 | 2 | 0 | 4723663 | 4723663 | 4723663 | 99.84 | 100 |
|
|
|
|
|
|
|
|
|
|
|
|
| 480 | 132137 | 20193 | 4806518 | 2 | 0 | 4724724 | 4724724 | 4724724 | 99.87 | 100 |
| 960 | 248910 | 16221 | 4806892 | 2 | 0 | 4725103 | 4725103 | 4725103 | 99.89 | 100 |
| 1500 | 333298 | 15249 | 4806995 | 2 | 0 | 4725191 | 4725191 | 4725191 | 99.89 | 100 |
|
| ||||||||||
| 15 | 8731 | 8731 | 1352560 | 0 | 19 | 154626 | 0 | 0 | 99.18 | |
| 30 | 18053 | 18053 | 5141583 | 0 | 14 | 1565772 | 0 | 518218 | 99.63 | |
| 60 | 35335 | 35335 | 5481126 | 1 | 0 | 5481126 | 5481126 | 5481126 | 99.82 | |
| 120 | 62415 | 60362 | 5570410 | 1 | 1 | 5481662 | 5481662 | 5481662 | 99.87 | |
|
|
|
|
|
|
|
|
|
|
| |
| 480 | 164641 | 15265 | 5577346 | 2 | 0 | 5482831 | 5482831 | 5482831 | 99.90 | |
| 960 | 317698 | 12941 | 5577818 | 2 | 0 | 5483284 | 5483284 | 5483284 | 99.91 | |
| 1500 | 429909 | 12403 | 5577934 | 2 | 0 | 5483397 | 5483397 | 5483397 | 99.91 | |
aTwo circular contigs indicates both the chromosome and the plasmid.
bNG50 - 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.
E. coli MinION sequencing data analyzed for completeness and accuracy before and after two rounds of polishing.
| Seq Duration (min) | Avg ID | SNPs/kba | indels/kbb | BUSCO completec,d | BUSCO fragmentedc,e | BUSCO missingc,f |
|---|---|---|---|---|---|---|
|
| ||||||
| 15 | 98.62 | 4.06 | 9.69 | 0.01 | 0.06 | 0.93 |
| 30 | 99.16 | 2.67 | 5.71 | 0.13 | 0.5 | 0.37 |
| 60 | 99.36 | 2.31 | 4.07 | 0.2 | 0.57 | 0.23 |
| 120 | 99.4 | 2.22 | 3.74 | 0.23 | 0.55 | 0.22 |
|
|
|
|
|
|
|
|
| 480 | 99.38 | 2.21 | 4 | 0.22 | 0.57 | 0.21 |
| 960 | 99.41 | 2.25 | 3.71 | 0.23 | 0.55 | 0.22 |
| 1500 | 99.4 | 2.24 | 3.72 | 0.22 | 0.58 | 0.2 |
|
| ||||||
| 15 | 99.13 | 2.11 | 6.61 | 0.04 | 0.11 | 0.86 |
| 30 | 99.6 | 1.02 | 2.96 | 0.35 | 0.46 | 0.19 |
| 60 | 99.79 | 0.55 | 1.51 | 0.51 | 0.41 | 0.08 |
| 120 | 99.85 | 0.39 | 1.14 | 0.58 | 0.35 | 0.07 |
|
|
|
|
|
|
|
|
| 480 | 99.87 | 0.35 | 0.94 | 0.66 | 0.3 | 0.04 |
| 960 | 99.88 | 0.35 | 0.87 | 0.66 | 0.3 | 0.04 |
| 1500 | 99.88 | 0.35 | 0.87 | 0.64 | 0.31 | 0.05 |
|
| ||||||
| 15 | 99.18 | 1.92 | 6.31 | 0.04 | 0.11 | 0.86 |
| 30 | 99.63 | 0.88 | 2.81 | 0.37 | 0.44 | 0.19 |
| 60 | 99.82 | 0.41 | 1.39 | 0.57 | 0.36 | 0.06 |
| 120 | 99.87 | 0.26 | 1.04 | 0.64 | 0.3 | 0.06 |
|
|
|
|
|
|
|
|
| 480 | 99.9 | 0.19 | 0.8 | 0.72 | 0.26 | 0.02 |
| 960 | 99.91 | 0.19 | 0.75 | 0.73 | 0.25 | 0.02 |
| 1500 | 99.91 | 0.18 | 0.74 | 0.73 | 0.24 | 0.03 |
aSNPs/kb – single nucleotide polymorphisms per kilobase.
bIndels/kb – insertions or deletions per kilobase.
cBUSCO- Benchmarking Universal Single-Copy Orthologs.
dComplete-fraction of expected gene complement with full-length reading frames.
eFragmented- decreased length alignment of genes.
fMissing- no significant matches.
Figure 1Polishing Results of the MinION-only Assemblies Using Multiple Rounds of Nanopolish. Due to the errors remaining in the MinION-only assemblies, a signal-level consensus software, Nanopolish, was used to increase the assembly accuracy. The overall accuracy, the Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness, BUSCO Fragmented, BUSCO Missing, number of indels per kb, and number of SNPs per kb are shown after 0, 1, 2, 3 and 4 rounds of Nanopolish. After two rounds of polishing, the overall accuracy and the number of Indels and SNPs per kb did not considerably change.
Salmonella Bareilly MinION sequencing data analyzed for completeness and accuracy before and after two rounds of polishing.
| Seq Duration (min) | Reference Coverage | Avg. ID | rela | tb | invc | insd | ins sum | SNPs/kbe | Indels/kbf | BUSCO completeg,h | BUSCO fragmentedg,i | BUSCO missingg,j |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| 15 | 24.36 | 98.6 | 0 | 0 | 1 | 12 | 506 | 4.04 | 9.43 | 0.02 | 0.07 | 0.91 |
| 30 | 95.38 | 99.05 | 0 | 0 | 1 | 10 | 292 | 3.01 | 6.43 | 0.14 | 0.51 | 0.35 |
| 60 | 98.4 | 99.3 | 0 | 0 | 1 | 0 | 0 | 2.54 | 4.46 | 0.19 | 0.59 | 0.22 |
| 120 | 100 | 99.32 | 0 | 0 | 1 | 1 | 3613 | 2.43 | 4.39 | 0.2 | 0.57 | 0.22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 | 100 | 99.37 | 0 | 0 | 1 | 1 | 3618 | 2.42 | 3.89 | 0.21 | 0.55 | 0.24 |
| 960 | 100 | 99.36 | 0 | 0 | 1 | 1 | 3612 | 2.38 | 4.05 | 0.2 | 0.57 | 0.23 |
| 1500 | 100 | 99.37 | 0 | 0 | 1 | 2 | 3606 | 2.4 | 3.85 | 0.23 | 0.55 | 0.22 |
|
| ||||||||||||
| 15 | 24.36 | 99.1 | 0 | 0 | 1 | 11 | 494 | 2.19 | 6.52 | 0.04 | 0.11 | 0.85 |
| 30 | 95.38 | 99.52 | 0 | 0 | 1 | 10 | 292 | 1.22 | 3.48 | 0.32 | 0.5 | 0.18 |
| 60 | 98.4 | 99.77 | 0 | 0 | 1 | 0 | 0 | 0.6 | 1.72 | 0.46 | 0.44 | 0.1 |
| 120 | 100 | 99.81 | 0 | 0 | 1 | 1 | 3610 | 0.49 | 1.38 | 0.54 | 0.38 | 0.08 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 | 100 | 99.86 | 0 | 0 | 1 | 1 | 3616 | 0.4 | 1.08 | 0.61 | 0.33 | 0.06 |
| 960 | 100 | 99.85 | 0 | 0 | 1 | 1 | 3612 | 0.44 | 1.02 | 0.62 | 0.33 | 0.06 |
| 1500 | 100 | 99.86 | 0 | 0 | 1 | 2 | 3610 | 0.41 | 1.01 | 0.62 | 0.31 | 0.06 |
|
| ||||||||||||
| 15 | 24.36 | 99.13 | 0 | 0 | 1 | 11 | 492 | 2.06 | 6.35 | 0.05 | 0.12 | 0.84 |
| 30 | 95.38 | 99.55 | 0 | 0 | 1 | 10 | 292 | 1.1 | 3.33 | 0.34 | 0.48 | 0.18 |
| 60 | 98.4 | 99.79 | 0 | 0 | 1 | 0 | 0 | 0.48 | 1.61 | 0.5 | 0.42 | 0.08 |
| 120 | 100 | 99.84 | 0 | 0 | 1 | 1 | 3610 | 0.34 | 1.26 | 0.58 | 0.35 | 0.07 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 | 100 | 99.87 | 0 | 0 | 1 | 1 | 3616 | 0.24 | 0.99 | 0.66 | 0.29 | 0.05 |
| 960 | 100 | 99.89 | 0 | 0 | 1 | 1 | 3612 | 0.23 | 0.89 | 0.67 | 0.28 | 0.04 |
| 1500 | 100 | 99.89 | 0 | 0 | 1 | 2 | 3610 | 0.23 | 0.86 | 0.69 | 0.27 | 0.04 |
aRelocations – rearrangement of genetic material within a chromosome or between chromosomes.
bTranslocations- rearrangement of parts between nonhomologous chromosomes.
cInversions - rearrangement in which a segment of a chromosome is reversed end to end.
dInsertions - the addition of a larger nucleotide sequence into a chromosome.
eSNPs/kb – single nucleotide polymorphisms per kilobase.
fIndels/kb – insertions or deletions per kilobase.
gBUSCO- Benchmarking Universal Single-Copy Orthologs.
hComplete-fraction of expected gene complement with full-length reading frames.
iFragmented- decreased length alignment of genes.
jMissing- no significant matches.
Figure 2Annotation of the MinION assembly of Escherichia coli. (a) The E. coli O157:H7 chromosome was sequenced and assembled into a final consensus of 5,482,542 nucleotides. The annotation of the genome provided the location of 5,748 coding sequences (CDS), 106 tRNAs, 29 rRNAs, 6 regulatory regions, and 1 repeat regions. For imaging purposes, only the 6 regulatory regions (green), the one repeat region (brown) and the CDS of two virulence factors (yellow) are shown magnified. The LEE (locus of enterocyte effacement) is highlighted at position 4,603,699 to 4,636,299, and the Shiga Toxin subunits are shown at position 3,181,004 to 3,180,992 for demonstration purposes. (b) The E. coli pO157 plasmid was sequenced and assembled into a final consensus of 94,503 nucleotides. The annotation shows all 124 coding sequences (CDS) in yellow. The CDS of three well-known virulence factors are highlighted: hemolysin (ehx) at position 16,584 to 19,578, catalase-peroxidase (katP) at position 76,704 to 78,356, and the type II secretion system (T2SS) at position 64,056 to 85,694 for demonstration purposes.
Figure 3SNPs trees of Salmonella reference datasets and data obtained with MinION. (a) Constructed with SNPs of twenty-three Salmonella reference datasets which were used for phylogenetic pipeline validation for foodborne pathogen surveillance[35]; (b) The CFSAN000189 data is replaced with SNPs from the 240 mins and 1500 mins MinION-only assemblies obtained in this study; (c) The tree includes both the reference dataset and the MinION-only data for the CFSAN000189 strain along with the SNPs of the remaining 22 Salmonella reference datasets.