| Literature DB >> 33809423 |
Dustin A Therrien1, Kranti Konganti2, Jason J Gill1, Brian W Davis3, Andrew E Hillhouse2, Jordyn Michalik1, H Russell Cross1, Gary C Smith1, Thomas M Taylor1, Penny K Riggs1.
Abstract
In 2013, the U.S. Department of Agriculture Food Safety and Inspection Service (USDA-FSIS) began transitioning to whole genome sequencing (WGS) for foodborne disease outbreak- and recall-associated isolate identification of select bacterial species. While WGS offers greater precision, certain hurdles must be overcome before widespread application within the food industry is plausible. Challenges include diversity of sequencing platform outputs and lack of standardized bioinformatics workflows for data analyses. We sequenced DNA from USDA-FSIS approved, non-pathogenic E. coli surrogates and a derivative group of rifampicin-resistant mutants (rifR) via both Oxford Nanopore MinION and Illumina MiSeq platforms to generate and annotate complete genomes. Genome sequences from each clone were assembled separately so long-read, short-read, and combined sequence assemblies could be directly compared. The combined sequence data approach provides more accurate completed genomes. The genomes from these isolates were verified to lack functional key E. coli elements commonly associated with pathogenesis. Genetic alterations known to confer rifR were also identified. As the food industry adopts WGS within its food safety programs, these data provide completed genomes for commonly used surrogate strains, with a direct comparison of sequence platforms and assembly strategies relevant to research/testing workflows applicable for both processors and regulators.Entities:
Keywords: Escherichia coli; bacterial surrogate; closed genome; high throughput sequencing; long reads; short reads; whole genome sequence
Year: 2021 PMID: 33809423 PMCID: PMC8001026 DOI: 10.3390/microorganisms9030608
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Long-read Oxford Nanopore MinION assembly sequence statistics.
| Bacterial Strains | O Type | H Type | MLST 1 | Contigs | Assembled Length | Largest Contig | Average Coverage |
|---|---|---|---|---|---|---|---|
| BAA-1427 | - | 4 | n/a | 74 | 5,034,864 bps | 4,743,343 bps | 323.673× |
| BAA-1428 | 154 | 16 | n/a | 67 | 5,050,340 bps | 4,806,641 bps | 311.819× |
| BAA-1429 | 166 | 12 | n/a | 20 | 4,856,504 bps | 4,816,131 bps | 362.642× |
| BAA-1430 | 28ac/42 | 21 | n/a | 19 | 5,217,837 bps | 5,022,067 bps | 310.567× |
| BAA-1431 | - | 4 | n/a | 34 | 4,982,422 bps | 4,753,397 bps | 306.167× |
1 MLST (Multi-locus Sequence Typing)–types could not be determined due to imperfect matches.
Short-read Illumina MiSeq assembly sequence statistics.
| Bacterial Strains | O Type | H Type | MLST | Contigs | Assembled Length | Largest Contig | Average Coverage |
|---|---|---|---|---|---|---|---|
| BAA-1427 | - | 4 | 10 | 91 | 4,825,300 bps | 434,834 bps | 51.211× |
| BAA-1428 | 154 | 16 | 165 | 127 | 4,758,825 bps | 319,570 bps | 57.141× |
| BAA-1429 | 166 | 12 | 10 | 87 | 4,739,915 bps | 523,910 bps | 60.601× |
| BAA-1430 | 28ac/42 | 21 | 278 | 103 | 5,009,161 bps | 421,121 bps | 49.422× |
| BAA-1431 | - | 4 | 10 | 91 | 4,829,685 bps | 404,666 bps | 47.608× |
Hybrid assembly sequence statistics.
| Bacterial Strains | O Type | H Type | MLST | Pilon 1 | BUSCO 2 | Contigs | Assembled Length (bps) | Largest Contig (bps) | Average Coverage | GenBank |
|---|---|---|---|---|---|---|---|---|---|---|
| BAA-1427 | - | 4 | 10 | 6 | 99.9% | 1 | 4,886,306 | 4,886,306 | 152× | CP063979 |
| BAA-1428 | 154 | 16 | 165 | 5 | 99.8% | 2 | 4,876,786 | 4,870,024 | 151× | CP063956-CP063967 |
| BAA-1429 | 166 | 12 | 10 | 4 | 99.9% | 1 | 4,812,017 | 4,812,017 | 186× | CP063969 |
| BAA-1430 | 28ac/42 | 21 | 278 | 8 | 99.9% | 5 | 5,106,612 | 4,988,672 | 138× | CP063970-CP063974 |
| BAA-1431 | - | 4 | 10 | 6 | 99.9% | 1 | 4,889,455 | 4,889,455 | 135× | CP063958 |
1 Indicates the number of rounds of error correction each assembly underwent during Pilon processing.2 Indicates the predicted completeness of each assembly generated by BUSCO (Benchmarking Universal Single-Copy Orthologs) after comparison to the lineage enterobacteriales.
Virulence attributes observed in bacterial surrogates. Subunits of virulence factors that were detected in each strain are indicated. An e-value limit of <0.00001 was adopted as a cut-off for protein identity (Blastx analysis). GenBank accession numbers for the virulence factors and their corresponding subunits within this table are provided in Table S1.
| Virulence Factors | BAA-1427 | BAA-1428 | BAA-1429 | BAA-1430 | BAA-1431 |
|---|---|---|---|---|---|
| Bundle-forming pili subunits (BFP) | bfpB *(-), bfpE *(-), bfpHI(-) | - | - | bfpB IH(-), bfpE IH(-), bfpH IH(-) | bfpB *(-), bfpE *(-), bfpH I(-) |
| Plasmid-encoded regulator (Per) | - | - | perC/bfpW *(-) | perC/bfpW *(-) | - |
| Cytolethal distending toxin (CDT) | cdtA *(56.03%), cdtB *(68.87%), cdtC *(40.56%) | - | - | - | cdtA *(56.03%), cdtB *(68.87%), cdtC *(40.56%) |
| Adhesive fimbriae | csnA IH(-), cswA I(-) | csnAI(-), cswAIH(-) | csnA I(-), cswA IH(-) | cfaB *(-), cooA *(-), csbA IH(-), csnA *(-), cswA IH(-) | csnA IH(-), cswA I(-) |
| Cytotoxic necrotizing factor 1 (CNF1) | cnf1 *(53.48%) | - | - | - | cnf1 *(53.48%) |
| P fimbriae | - | - | papE *(-), papG *(-), papJ *(-) | - | - |
(%) Indicates the percent identity (%ID) that the protein subunit shares with its virulent counterpart.(-) indicates that a %ID was not calculated because the virulence factor was not present or lacked required subunits. I Gene was detected in the assembly from Illumina MiSeq. H Gene was detected in the hybrid assembly. * Gene was detected in every dataset.