| Literature DB >> 31361781 |
Narjol González-Escalona1, Marc A Allard1, Eric W Brown1, Shashi Sharma1, Maria Hoffmann1.
Abstract
Whole genome sequencing can provide essential public health information. However, it is now known that widely used short-read methods have the potential to miss some randomly-distributed segments of genomes. This can prevent phages, plasmids, and virulence factors from being detected or properly identified. Here, we compared assemblies of three complete Shiga toxin-producing Escherichia coli (STEC) O26:H11/H- genomes from two different sequence types (ST21 and 29), each acquired using the Nextera XT MiSeq, MinION nanopore-based sequencing, and Pacific Biosciences (PacBio) sequencing. Each closed genome consisted of a single chromosome, approximately 5.7 Mb for CFSAN027343, 5.6 Mb for CFSAN027346, and 5.4 MB for CFSAN027350. However, short-read whole genome sequencing (WGS) using Nextera XT MiSeq failed to identify some virulence genes in plasmids and on the chromosome, both of which were detected using the long-read platforms. Results from long-read MinION and PacBio allowed us to identify differences in plasmid content: a single 88 kb plasmid in CFSAN027343; a 157kb plasmid in CFSAN027350; and two plasmids in CFSAN027346 (one 95 Kb, one 72 Kb). These data enabled rapid characterization of the virulome, detection of antimicrobial genes, and composition/location of Stx phages. Taken together, positive correlations between the two long-read methods for determining plasmids, virulome, antimicrobial resistance genes, and phage composition support MinION sequencing as one accurate and economical option for closing STEC genomes and identifying specific virulence markers.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31361781 PMCID: PMC6667211 DOI: 10.1371/journal.pone.0220494
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the characteristics of the 3 STEC O26:H11 strains sequenced in this study.
| strain | CFSAN No. | Serotype | ST | Date | location | source |
|---|---|---|---|---|---|---|
| 99.085 | CFSAN027343 | O26:H11 | 21 | 1999 | Argentina | clinical |
| 99.1773 | CFSAN027346 | O26:H11 | 21 | 1999 | USA | clinical |
| 12.1843 | CFSAN027350 | O26:H11 | 29 | 2012 | USA | environmental |
aThese results were confirmed with in silico serotyping.
bDetermined by in silico MLST.
MiSeq assembly statistics for the 3 O26:H11 STECs.
| CFSAN No. | Contigs | No. | Q>30 | Total | N50 | Total | Average |
|---|---|---|---|---|---|---|---|
| CFSAN027346 | 274 | 2.14E+06 | 194 | 4.17E+08 | 100,594 | 5.42E+06 | 77 |
| CFSAN027343 | 321 | 2.56E+06 | 188 | 4.81E+08 | 93,611 | 5.41E+06 | 89 |
| CFSAN027350 | 234 | 2.45E+06 | 221 | 5.42E+08 | 99,163 | 5.28E+06 | 103 |
STECs MinION sequencing output statistics by replicate and library kit.
| Run | Total | total output (GB) | Average coverage (X) |
|---|---|---|---|
| 4,027,578 | 8.21 | 1127 | |
| 692,153 | 3.33 | 585 | |
| 611,279 | 6.82 | 1199 | |
| 267,065 | 1.08 | 189 | |
| 1,369,072 | 5.23 | 920 | |
| 173,937 | 0.93 | 163 | |
| 155,628 | 0.90 | 158 | |
| 131,153 | 0.31 | 54 |
1Runs: a) first, and b) second run.
21D Genomic DNA by ligation (SQK-LSK108)
3rapid sequencing kit (SQK-RAD002)
Assembly statistics MinION.
| Sample | Chromosome(s) | plasmids | Chromosome | plasmid(s) |
|---|---|---|---|---|
| 1 | 1 | 5,688,712 | 88,561 | |
| 1 | 1 | 5,688,145 | 88,702 | |
| 1 | 2 | 5,588,947 | 1 (95,599); | |
| 1 | 2 | 5,592,589 | 1 (95,821); | |
| 1 | 2 | 5,592,692 | 1 (95,696); | |
| 1 | 1 | 5,422,984 | 157,276 | |
| 3 | 1 | 5,448,646 | 157,340 | |
| 15 | 1 | 5,451,905 | 157,300 |
1Runs: a) first, and b) second run.
21D Genomic DNA by ligation (SQK-LSK108)
3rapid sequencing kit (SQK-RAD002)
PacBio sequencing output for each SMRT cell.
| Strains | SMRT Cell# | Total Reads | Total Output (Gb) | Average Read Length of Insert |
|---|---|---|---|---|
| CFSAN027343 | 1 | 79,102 | 0.79 | 9,965 |
| 2 | 77,649 | 0.8 | 10,326 | |
| 3 | 75,856 | 0.74 | 9,743 | |
| CFSAN027346 | 1 | 91,744 | 1.03 | 11,252 |
| 2 | 82,672 | 0.88 | 10,677 | |
| 3 | 87,886 | 0.93 | 10,484 | |
| CFSAN027350 | 1 | 81,504 | 0.95 | 11,715 |
| 2 | 82,543 | 0.96 | 11,641 | |
| 3 | 87,866 | 1.01 | 11,541 |
Assembly statistics per SMRT cell# for PacBio data using HGAP3.0 and Quiver.
| Strains | Chromosome contig(s) | Chromosome size (bp) | Plasmids contig(s) | Plasmid(s) size(s) (bp) | Coverage (X) |
|---|---|---|---|---|---|
| SMRT Cell # 1 | 3 | 5,351,371; 281,039; 81,920 | 1 | 88,847 | 134 |
| SMRT Cell # 2 | 2 | 5,525,151; 164,081 | 1 | 88,848 | 134 |
| SMRT Cell # 3 | 2 | 5,525,031; 164,076 | 1 | 88,847 | 129 |
| SMRT Cell # 1 | 1 | 5,592,579 | 2 | 1 (96,016), 2 (73,152) | 147 |
| SMRT Cell # 2 | 1 | 5,592,570 | 2 | 1 (96,016), 2 (73,152) | 156 |
| SMRT Cell # 3 | 1 | 5,592,582 | 2 | 1 (96,017), 2 (73,152) | 153 |
| SMRT Cell # 1 | 1 | 5,436,071 | 1 | 157,534 | 170 |
| SMRT Cell # 2 | 1 | 5,436,072 | 1 | 157,534 | 166 |
| SMRT Cell # 3 | 1 | 5,436,082 | 1 | 157,535 | 176 |
aThe statistic is listed for each SMRT cell# per isolate.
bThe # represents the assembled contigs that belong to the chromosome.
cThe size (bp) for each assembled contig that belongs to the chromosome.
dThe # represents the assembled contigs that belong to the plasmid(s).
eThe size (bp) for each assembled contig that belongs to the plasmid(s).
fThe number represents the mean coverage for the assembly for each SMRT cell per isolate.
Virulence genes present in the O26:H11 STEC MinION assemblies by in silico analysis.
Plasmid borne genes were: espP, toxB, katP, and ehxA.
| Assemblies | ||||||
|---|---|---|---|---|---|---|
| - | - | + | - | + | + | |
| - | - | + | - | + | + | |
| - | - | + | + | + | + | |
| - | - | + | + | + | + | |
| - | - | + | + | + | + | |
| + | + | - | + | - | - | |
| + | + | - | + | - | - | |
| + | + | - | + | - | - |
All assemblies were positive for astA, cif, eae, ehxA (plasmid), espA, espB, espF, espJ, espP (plasmid), gad, iha, iss, lpfA, nleA, nleB, nleC, tir, and toxB (plasmid) genes.
1Runs: a) first, and b) second run.
21D Genomic DNA by ligation (SQK-LSK108).
3rapid sequencing kit (SQK-RAD002).
4,5shiga toxin genes and their variants.
Identification of chromosomal insertion sites for stx phages in the 3 STEC O26:H11 MinION genomes, their stx gene type, regions and coordinates in the genome, and their stx phage sizes.
| Insertion of Stx phage in | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Strains | Region phaster- stx phage (coordinates genome) | ||||||||
| CFSAN027343 | - | - | - | - | + | - | 14 (3518081–3575683) | 57.6 | 1a |
| CFSAN027346 | + | - | - | - | - | - | 8 (2329980–2401033) | 69.3 | 1a |
| CFSAN027350 | - | - | - | - | - | + | 14 (3749826–3852495) | 102.6 | 2a |
| 11368 (NC_013361.1) | + | - | - | - | - | - | 9 (2347644–2418764) | 69.3 | 1a |
a All chromosomes started at the dnaA gene.
Fig 1Schematic representation of the analysis pipeline used in this study for assembly and polishing of the MinION sequencing output.
Fig 2Phylogenetic analysis of the O26:H11/H- E. coli strains sequenced in this study by MiSeq, MinION, and PacBio and 195 genomes that are available at GenBank by cgMLST analysis.
The SNPs were extracted from the core loci (1303) and the SNP matrix (5089 SNPs) was used to determine the genetic relationships among the strains. The evolutionary history was inferred by using Neighbor-Joining (NJ) tree built using the genetic distance and showing the existence of high diversity and that O26:H11 strains were polyphyletic. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The tree was rooted to E. coli O111:H- strain 11128 (NC_013364). A) The genomes generated by any of the 3 technologies still clustered together by the cgMLST analysis. Snapshot of the clusters formed by the genomes generated by the 3 technologies for B) CFSAN27343, C) CFSAN027346, and D) CFSAN027350 strains, respectively. The names of the strains can be discerned in S5 Fig.
Fig 3Comparison of the synteny mapping of chromosomes generated by either MinION or PacBio, using the Sanger generated genome for STEC O26:H11 strain 11368 (AP010953) as reference with MAUVE.
Each chromosome sequence is laid out in a horizontal track. Matching colors indicate homologous segments and are connected across genomes. Respective scales show the sequence coordinates in base pairs. A colored similarity plot is shown for each genome, the height of which is proportional to the level of sequence identity in that region. Only strain CFSAN027343 synteny is shown for illustration purposes (The other two strains (CFSAN027346 and CFSAN027350) synteny can be found in S4 Fig).
Fig 4Circular map of virulence plasmid pCFSAN027350 compared to the other two virulence plasmids (pCFSAN027343 and pCFSAN027346), generated using CGView [63].
Blue block arrows in the outer circle denote coding regions in the plasmid, indicating the ORF transcription direction. G+C content is shown in the middle circle and the deviation from average G+C content (47.71%) is displayed in the innermost circle. BLAST comparisons with the other two EHEC plasmids are shown in light red (pCFSAN027343) and green (pCFSAN027346).