| Literature DB >> 34946161 |
Abdolrahman Khezri1, Ekaterina Avershina1, Rafi Ahmad1,2.
Abstract
Emerging new sequencing technologies have provided researchers with a unique opportunity to study factors related to microbial pathogenicity, such as antimicrobial resistance (AMR) genes and virulence factors. However, the use of whole-genome sequence (WGS) data requires good knowledge of the bioinformatics involved, as well as the necessary techniques. In this study, a total of nine Escherichia coli and Klebsiella pneumoniae isolates from Norwegian clinical samples were sequenced using both MinION and Illumina platforms. Three out of nine samples were sequenced directly from blood culture, and one sample was sequenced from a mixed-blood culture. For genome assembly, several long-read, (Canu, Flye, Unicycler, and Miniasm), short-read (ABySS, Unicycler and SPAdes) and hybrid assemblers (Unicycler, hybridSPAdes, and MaSurCa) were tested. Assembled genomes from the best-performing assemblers (according to quality checks using QUAST and BUSCO) were subjected to downstream analyses. Flye and Unicycler assemblers performed best for the assembly of long and short reads, respectively. For hybrid assembly, Unicycler was the top-performing assembler and produced more circularized and complete genome assemblies. Hybrid assembled genomes performed substantially better in downstream analyses to predict putative plasmids, AMR genes and β-lactamase gene variants, compared to MinION and Illumina assemblies. Thus, hybrid assembly has the potential to reveal factors related to microbial pathogenicity in clinical and mixed samples.Entities:
Keywords: Illumina; Oxford Nanopore; antimicrobial resistance; blood culture; clinical isolates; hybrid assembly; long-read; plasmids; short-read; virulence factors
Year: 2021 PMID: 34946161 PMCID: PMC8704702 DOI: 10.3390/microorganisms9122560
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
An overview of basic sequence information statistics and quality of reads after trimming and filtering. The mixed culture was obtained from the co-culturing of E. coli 4 and K. pneumoniae 5 isolates. Coverage of E. coli isolates was calculated by dividing the number of bp in each read over the number of bp in reference genome (E. coli NCTC 13441). Coverage of K. pneumoniae isolates was calculated by dividing the number of bp reads over the number of bp reads in the K. pneumoniae reference genome (median genome size of all K. pneumoniae isolates in NCBI database). Coverage of mixed culture sample was calculated by dividing the number of bp in mix culture sample over the sum of pb of E. coli NCTC 13441 and median genome size of all K. pneumoniae isolates in NCBI database.
| MinION Long Reads | Illumina Short Reads | |||||||
|---|---|---|---|---|---|---|---|---|
| Read Length N50 (bp) | Mean Read Quality (Q) | Number of Reads | Total | Coverage (X) | Number of Reads | Total | Coverage (X) | |
| 2520 | 11.4 | 63,036 | 92,626,571 | 17.4 | 670,985 | 91,989,902 | 17.2 | |
| 1466 | 11.3 | 67,331 | 88,553,163 | 16.6 | 597,154 | 141,802,031 | 26.6 | |
| 1384 | 11.5 | 41,103 | 39,979,342 | 7.5 | 1,786,471 | 396,985,933 | 74.4 | |
| 5956 | 9.8 | 81,317 | 256,369,935 | 48.0 | 1,419,582 | 353,790,894 | 66.3 | |
| 2832 ± 2147 | 11 ± 0.8 | 63,197 ± 16,669 | 119,382,253 ± 94,404,708 | 22.4 ± 18 | 1,118,548 ± 579,915 | 246,142,190 ± 151,648,580 | 46.1 ± 28 | |
| 1428 | 11.3 | 13,694 | 25,125,702 | 4.5 | 889,410 | 222,836,627 | 39.8 | |
| 7302 | 11.5 | 199,822 | 859,067,656 | 153.5 | 559,060 | 131,573,009 | 23.5 | |
| 4250 | 11.5 | 51,624 | 136,843,964 | 24.5 | 744,422 | 111,911,073 | 20.0 | |
| 2044 | 9.9 | 329,042 | 375,495,020 | 67.1 | 1,302,920 | 313,973,441 | 56.1 | |
| 3941 | 9.3 | 48,463 | 64,316,017 | 11.5 | 712,218 | 178,050,866 | 31.8 | |
| 3793 ± 2302 | 11 ± 1 | 128,529 ± 133,041 | 292,169,672 ± 344,844,995 | 52.2 ± 62 | 841,606 ± 283,335 | 191,669,003 ± 80,759,064 | 34.2 ± 14 | |
| Mixed culture sample | 4200 | 9.8 | 143,076 | 387,311,832 | 35.4 | 2,131,800 | 531,841,759 | 48.7 |
An overview of statistics for different E. coli and K. pneumoniae assemblies produced by the top-performing assemblers. IllumASM was produced using Unicycler, MinIONASM using Flye and HybASM created using Unicycler. The top values are highlighted in bold. The mixed culture was obtained from the co-culturing of E. coli 4 and K. pneumoniae 5 isolates. Numbers show the average ± SD.
| Number of Dead Ends | Number of Contigs | Total Length (bp) | N50 (bp) | ||
|---|---|---|---|---|---|
|
| IllumASM | 4 ± 4 | 138 ± 90 | 5,232,982 ± 335,084 | 225,244 ± 82,435 |
| MinIONASM | 77 ± 94 |
| 3,870,499 ± 2,664,510 | 343,234 ± 504,598 | |
| HybASM |
| 50 ± 28 |
|
| |
|
| IllumASM | 10 ± 7 | 78 ± 13 | 5,577,253 ± 181,931 | 247,095 ± 138,114 |
| MinIONASM | 44 ± 48 | 35 ± 32 | 4,694,978 ± 2,235,357 | 1,996,101 ± 2,279,327 | |
| HybASM |
|
|
|
| |
| Mixed culture sample | IllumASM | 2 | 371 | 11,193,506 | 147,235 |
| MinIONASM | 65 | 120 |
| 344,695 | |
| HybASM |
|
| 11,495,693 |
|
Figure 1Box plots for BUSCO results of the best-performing assemblers. IllumASM was produced using Unicycler, MinIONASM using Flye and HybASM was created using Unicycler. Each box extends from Min to Max values in each group and the middle black line in each box indicates the mean value. The BUSCO percentage for mixed samples is not included in the graph.
Figure 2Representative assembly graphs for some of the isolates including E. coli 2 and 4, K. pneumoniae 2 and 4 as well as a mixed sample from the co-culturing of E. coli 4 and K. pneumoniae 5 isolates. The GFA files produced by the top-performing assemblers (Unicycler for Illumina short reads, Flye for MinION long reads and Unicycler for hybrid reads) were used to construct the assembly graphs using Bandage. Illumina assemblies were fragmented, and putative plasmids were limited. MinION produced much larger contigs and more putative plasmids. However, proper circular chromosomes were not observed for the majority of isolates using either IllumASM or MinIONASM. However, hybrid assemblies provided us with clear and close chromosome/putative plasmids.
Average values for annotating the genomic features of different assemblies from monocultures and mixed cultures of E. coli and K. pneumoniae isolates. IllumASM was produced using Unicycler, MinIONASM using Flye and HybASM was created using Unicycler. The mixed culture was obtained from the co-culturing of E. coli 4 and K. pneumoniae 5 isolates. Numbers show the average ± SD.
| CDS | rRNA | tRNA | tmRNA | ||
|---|---|---|---|---|---|
|
| IllumASM | 4952 ± 392 | 5 ± 1 | 83 ± 5 | 1 ± 0 |
| MinIONASM | 6715 ± 4615 | 12 ± 10 | 63 ± 44 | 1 ± 1 | |
| HybASM | 5042 ± 532 | 15 ± 9 | 88 ± 11 | 1 ± 0 | |
|
| IllumASM | 5201 ± 185 | 4 ± 1 | 79 ± 1 | 1 ± 0 |
| MinIONASM | 8120 ± 3933 | 20 ± 10 | 67 ± 36 | 1 ± 1 | |
| HybASM | 5261 ± 217 | 21 ± 8 | 84 ± 4 | 1 ± 0 | |
| Mixed culture sample | IllumASM | 10,660 | 8 | 164 | 2 |
| MinIONASM | 20,158 | 47 | 181 | 2 | |
| HybASM | 10,995 | 44 | 184 | 2 |
Figure 3An overview of downstream analysis results for different assemblies created using the top-performing assemblers. Venn diagrams prepared using the Venny online platform to plot differences in the number of annotations obtained, in which data for four E. coli and five K. pneumoniae isolates were merged. Numbers in the overlap area indicate the mutual hit names (hits identified in the exact same isolates). (A) The number of annotated CDSs (putative and hypothetical proteins not plotted). (B) The number of identified and confirmed plasmid contigs using PlasmidFinder and Bandage visualization tools, respectively. (C) The number of AMR genes, including both acquired and point mutations. (D) The number of identified VFs.