| Literature DB >> 31697231 |
Thomas V Sydenham1,2,3, Søren Overballe-Petersen4, Henrik Hasman4, Hannah Wexler5, Michael Kemp1,2, Ulrik S Justesen1,2.
Abstract
Bacteroides fragilis constitutes a significant part of the normal human gut microbiota and can also act as an opportunistic pathogen. Antimicrobial resistance (AMR) and the prevalence of AMR genes are increasing, and prediction of antimicrobial susceptibility based on sequence information could support targeted antimicrobial therapy in a clinical setting. Complete identification of insertion sequence (IS) elements carrying promoter sequences upstream of resistance genes is necessary for prediction of AMR. However, de novo assemblies from short reads alone are often fractured due to repeat regions and the presence of multiple copies of identical IS elements. Identification of plasmids in clinical isolates can aid in the surveillance of the dissemination of AMR, and comprehensive sequence databases support microbiome and metagenomic studies. We tested several short-read, hybrid and long-lead assembly pipelines by assembling the type strain B. fragilis CCUG4856T (=ATCC25285=NCTC9343) with Illumina short reads and long reads generated by Oxford Nanopore Technologies (ONT) MinION sequencing. Hybrid assembly with Unicycler, using quality filtered Illumina reads and Filtlong filtered and Canu-corrected ONT reads, produced the assembly of highest quality. This approach was then applied to six clinical multidrug-resistant B. fragilis isolates and, with minimal manual finishing of chromosomal assemblies of three isolates, complete, circular assemblies of all isolates were produced. Eleven circular, putative plasmids were identified in the six assemblies, of which only three corresponded to a known cultured Bacteroides plasmid. Complete IS elements could be identified upstream of AMR genes; however, there was not complete correlation between the absence of IS elements and antimicrobial susceptibility. As our knowledge on factors that increase expression of resistance genes in the absence of IS elements is limited, further research is needed prior to implementing AMR prediction for B. fragilis from whole-genome sequencing.Entities:
Keywords: Bacteroides fragilis; Oxford Nanopore; antimicrobial resistance; genome sequencing; hybrid assembly; insertion sequences; plasmid
Mesh:
Year: 2019 PMID: 31697231 PMCID: PMC6927303 DOI: 10.1099/mgen.0.000312
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Genome assemblers and polishing tools tested
|
Genome assembler and version |
Link |
Reference |
|---|---|---|
|
Wtdbg2 v2.3 |
|
[ |
|
Miniasm v0.3r179 |
|
[ |
|
Flye v2.3.7 |
|
[ |
|
Canu v1.8 |
|
[ |
|
SPAdes (including HybridSPAdes) v3.13.0 |
|
[ |
|
Skesa v2.3.0 |
|
[ |
|
Unicycler v0.4.7 |
|
[ |
|
|
|
|
|
Nanopolish v0.10.2 |
|
[ |
|
Racon v1.3.1 |
|
[ |
|
Pilon v1.22 |
|
[ |
Selected quality indicators for the best genome assembly of CCUG4856T per assembly pipeline
RefSeq accession GCF_000025985.1 was used as a reference. CM, Canu-corrected with option corMinCoverage=0; CO, Canu-corrected with option coroutCoverage=999; CS, Canu-corrected standard settings; OF, ONT reads filtered with Filtlong; PI[n], Pilon polishing with Illumina reads, [n] rounds; RI, Racon polishing with Illumina reads; RO2, two rounds of Racon polishing with ONT reads. Full results are available in Table S2.
|
Assembly |
No. of contigs |
Largest contig |
Total length |
Mis-assemblies |
Genome fraction (%) |
Mismatches per 100 kbp |
Indels per 100 kbp |
ANI |
CheckM completeness |
busco score: complete and single-copy/ complete and duplicate/ fragment (of 443) |
Prokka genes |
Prokka rRNA |
Prokka tRNA |
Total ale score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
GCF_000025985.1 |
2 |
5 205 140 |
5 241 700 |
0 |
100.000 |
0 |
0 |
100.000 |
99.26 |
442/0/1 |
4439 |
19 |
73 |
−17071758.95 |
|
Skesa |
46 |
553 341 |
5 201 945 |
3 |
99.237 |
0.23 |
0.15 |
99.998 |
99.26 |
440/2/1 |
4391 |
2 |
62 |
−20926329.69 |
|
SPAdes |
23 |
1 779 941 |
5 212 217 |
4 |
99.396 |
0.44 |
0.17 |
99.987 |
99.26 |
440/2/1 |
4407 |
3 |
56 |
−19676529.39 |
|
Canu.OF.CO.RO2.RI.PI3 |
2 |
5 247 938 |
5 350 432 |
8 |
99.972 |
4.94 |
15.9 |
99.975 |
99.26 |
442/0/1 |
4634 |
19 |
73 |
−19283611.73 |
|
Flye.OF.CS.PI5.RI |
5 |
2 282 650 |
5 269 269 |
4 |
99.917 |
1.07 |
6.24 |
99.978 |
99.26 |
441/1/1 |
4476 |
19 |
73 |
−18222322.23 |
|
Miniasm.OF.CM.RO2.PI5 |
3 |
5 204 445 |
5 277 434 |
2 |
99.972 |
5.21 |
17.75 |
99.969 |
98.88 |
442/0/1 |
4607 |
19 |
73 |
−17789234.97 |
|
Wtdbg2.OF.CO.RO2.PI6.RI |
3 |
5 192 352 |
5 234 448 |
7 |
99.723 |
3.23 |
3.04 |
99.981 |
99.26 |
442/0/1 |
4437 |
19 |
73 |
−18750266.21 |
|
SPAdesHybrid.CS |
5 |
3 093 122 |
5 242 724 |
7 |
99.987 |
1.89 |
0.53 |
99.986 |
99.26 |
440/2/1 |
4441 |
19 |
73 |
−18535980.68 |
|
Unicycler.OF.CS |
2 |
5 205 133 |
5 241 693 |
2 |
99.972 |
0.84 |
0.48 |
100.000 |
99.26 |
442/0/1 |
4435 |
19 |
73 |
−17200232.52 |
Hybrid Unicycler assemblies of CCUG4856T
RefSeq accession GCF_000025985.1 was used as a reference. CM, Canu-corrected with option corMinCoverage=0; CO, Canu-corrected with option coroutCoverage=999; CS, Canu-corrected standard settings; OF, ONT reads filtered with Filtlong; RI, Racon polishing with Illumina reads. Unicycler performs assembly polishing with Racon (ONT reads) and Pilon. Full results are available in Table S2.
|
Assembly |
Total length (bp) |
Largest contig (bp) |
Local mis-assemblies |
Genome fraction (%) |
Mismatches per 100 kbp |
Indels per 100 kbp |
K-mer-based compl. (%) |
K-mer-based misjoins |
ANI |
Prokka CDSs |
Prokka genes |
Total ale score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
GCF_000025985.1 |
5 241 700 |
5 205 140 |
0 |
100.000 |
0 |
0 |
100.00 |
0 |
100.000 |
4346 |
4439 |
−17071758.95 |
|
OF |
5 241 602 |
5 205 042 |
3 |
99.970 |
1.11 |
0.65 |
99.96 |
0 |
99.999 |
4343 |
4436 |
−17245134.52 |
|
OF.RI |
5 241 606 |
5 205 046 |
3 |
99.970 |
1.09 |
0.67 |
99.96 |
3 |
99.999 |
4345 |
4438 |
−17247815.86 |
|
OF.CS |
5 241 693 |
5 205 133 |
2 |
99.972 |
0.84 |
0.48 |
99.97 |
1 |
100.000 |
4342 |
4435 |
−17200232.52 |
|
OF.CS.RI |
5 241 698 |
5 205 138 |
2 |
99.972 |
0.88 |
0.52 |
99.96 |
1 |
100.000 |
4346 |
4439 |
−17206271.66 |
|
OF.CM |
5 241 691 |
5 205 131 |
2 |
99.972 |
0.88 |
0.5 |
99.96 |
1 |
100.000 |
4343 |
4436 |
−17201292.44 |
|
OF.CM.RI |
5 241 696 |
5 205 136 |
2 |
99.972 |
0.95 |
0.55 |
99.97 |
1 |
100.000 |
4343 |
4436 |
−17193184.79 |
|
OF.CO |
5 241 693 |
5,205,133 |
2 |
99.972 |
0.84 |
0.48 |
99.97 |
1 |
100.000 |
4342 |
4435 |
−17200232.52 |
|
OF.CO.RI |
5 241 698 |
5 205 138 |
2 |
99.972 |
0.88 |
0.52 |
99.96 |
1 |
100.000 |
4346 |
4439 |
−17206271.66 |
Fig. 1.Dot plot matrix of the alignment of the reference assembly and the hybrid Unicycler assembly using Gepard v1.40 [81]. The NCTC9343 (RefSeq accession number GCF_000025985.1) reference assembly derived from Sanger sequencing is on the x-axis and the hybrid Unicycler assembly on the y-axis. On this otherwise near-perfect alignment with high similarity, an 88 045 bp inversion with 100 % ID is observed at nucleotide positions 2 941 962…3 030 006 on the Unicycler assembly (2 005 742…2 093 786 on the reference sequence) (indicated by the blue arrow).
Fig. 2.Evolution of genome assemblies with added data and manual finishing. The best SPAdes assembly graphs by Unicycler with short reads only are shown on the far left. Supplying ONT reads improved the assemblies overall, but only three were circularized with singular chromosome contigs with data from the initial MinION sequencing runs. Adding additional ONT data and correcting reads with Canu did not improve assemblies for all isolates. Manual finishing was necessary to finish assemblies for three isolates. Assembly graph images generated with Bandage. Read information can be found in Table S1.
Putative plasmid sequences of the complete assemblies
Putative plasmid sequences from the hybrid assemblies of CCUG4856T and the six MDR isolates were screened using the PLSDB. The best hit to plasmids from cultured isolates is shown. Only three putative plasmids from the MDR isolate assemblies could be identified with confident % ID. For most sequences, plasmid replication family proteins were identified in the putative plasmids using ABRicate with a database of sequences downloaded from the Pfam database, strengthening the interpretation that the circularized putative plasmid sequences do in fact represent plasmids harboured by the isolates.
|
Strain |
Sequence |
Length (bp) |
Relative read depth |
mol% G+C |
PLSDB results |
Plasmid replicon family (% COV, % ID) | |||
|---|---|---|---|---|---|---|---|---|---|
|
Best hit accession no. |
Plasmid hit name |
% ID |
Length of the sequence of best hit (bp) | ||||||
|
CCUG4856T |
Chr |
5 205 133 |
1.00× |
43.19 |
– |
– |
– |
– |
– |
|
pBF9343 |
36 560 |
7.42× |
32.19 |
NC_006873.1 |
pBF9343 |
100 |
36 560 |
Rep_3 (100/100) | |
|
BFO17 |
Chr |
5 474 541 |
1.00× |
43.51 |
– |
– |
– |
– |
– |
|
pBFO17_1 |
85 671 |
1.85× |
36.78 |
NC_006873.1 |
pBF9343 |
80.7 |
36 560 |
None | |
|
pBFO17_2 |
5594 |
23.03× |
39.65 |
NC_011073.1 |
pBFP35 |
99.9 |
5594 |
Rep_1 (100/100) | |
|
BFO18 |
Chr |
5 302 644 |
1.00× |
43.34 |
– |
– | – |
– |
– |
|
pBFO18_1 |
7221 |
25.98× |
42.32 |
NC_015168.1 |
pBACSA02 |
85.6 |
19 280 |
Rep_3 (99.69/99.69) | |
|
pBFO18_2 |
4137 |
50.80× |
45.40 |
NC_019534.1 |
pBFUK1 |
92.2 |
12 817 |
Rep_3 (100.00/98.24) | |
|
pBFO18_3 |
2782 |
59.91× |
41.45 |
NC_005026.1 |
pBI143 |
94.6 |
2747 |
RepL (89.66/49.22)* | |
|
S01 |
Chr |
5 325 251 |
1.00× |
43.57 |
– | – |
– |
– |
– |
|
pBFS01_1 |
78 085 |
2.29× |
36.04 |
NC_006873.1 |
pBF9343 |
80.7 |
36 560 |
None | |
|
pBFS01_2 |
8331 |
20.99× |
41.17 |
NC_015166.1 |
pBACSA03 |
95.6 |
6277 |
Rep_3 (100.00/97.85) | |
|
pBFS01_3 |
5595 |
22.67× |
39.62 |
NC_011073.1 |
pBFP35 |
99.9 |
5594 |
Rep_1 (100.00/99.48) | |
|
BFO42 |
Chr |
5 141 257 |
1.00× |
43.35 |
– |
- |
– |
– |
– |
|
pBFO32_1 |
8306 |
40.06× |
43.34 |
KJ830768.1 |
pBF69566b |
96.0 |
11 019 |
RHH_1 (92.94/64.63) Rep_3 (93.64/68.31) | |
|
pBFO32_2 |
5594 |
40.15× |
39.63 |
NC_011073.1 |
pBFP35 |
99.9 |
5594 |
Rep_1 (100.00/99.48) | |
|
BFO67 |
Chr |
5 478 614 |
1.00× |
43.85 |
– |
– |
– |
– |
– |
|
pBFO67_1 |
6129 |
94.15× |
41.67 |
NC_011073.1 |
pBFP35 |
76.9 |
5594 |
Rep_3 (100.00/99.69) | |
|
BFO85 |
Chr |
5 504 076 |
1.00× |
43.60 |
– |
– |
– |
– |
– |
Chr, Chromosome.
*Annotated as RepA protein in the pgap annotation.
Fig. 3.Linear representation of an alignment of putative circular plasmid sequences pBFO17_1 and pBFS01_1 (reverse complement for better visualization) using EasyFig [82]. EasyFig uses blast to identify sequences of similarity. Sequence similarities of >98 % are indicated by full colouring, a darker colour indicates a higher % ID. Products of annotated CDSs are shown. CDSs annotated as hypothetical or domain of unknown function are coloured white. The two sequences show a very high degree of similarity. pBFO17_1 is 7586 bp longer than pBFS01_1. This is mainly due to the insertion of a reverse transcriptase (pBFO17_1, 11367…13034) (disrupting a DNA methylase), the insertion of prophage (from position 56125 to 61162) (identified as an incomplete prophage using phaster [83] and an IS1380 family-like transposase (67933…69237). The regions pBFO17_1 50711…52501 and pBFS01_1 32248…30304 are not similar. Possibly, the insertion of two transposases in pBFO17_1 have excised most of the ParB-family DNA partitioning protein in the corresponding sequence range in pBFS01_1.
Antimicrobial susceptibility and resistance genes and IS elements for the six MDR strains
Identified genes are displayed next to the relevant antimicrobials. Identified IS elements in correct orientation (opposite strand) directly upstream of the genes are included. The % ID and % COV refer to the gene hit. Hits with % ID or % COV <98% were confirmed with blastx searches. The hits for ugd represent possible homologues for genes encoding PmrE, which is involved in polymyxin resistance in Gram-negative bacteria. Full ABRicate results with nucleotide positions and information on the IS elements are available in Table S4.
|
Antimicrobial susceptibility* |
AMR genes and IS elements | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Strain |
Antimi-crobial |
Etest MIC (mg l− 1) |
Result |
Gene |
Upstream IS element |
Sequence† |
% ID |
% COV |
Associated resistance to drug class |
|
BFO17 |
MEM |
>32 |
R |
|
IS |
Chr |
100.00 |
99.20 |
Carbapenem |
|
IPM |
>32 |
R | |||||||
|
MTZ |
>32 |
R |
|
IS |
Chr |
99.40 |
100.00 |
Nitroimidazole | |
|
|
IS |
Chrc |
99.40 |
100.00 |
Nitroimidazole | ||||
|
CLI |
0.094 |
S | |||||||
|
TZP |
>256 |
R | |||||||
|
|
Chr |
99.34 |
99.34 |
Tetracycline | |||||
|
|
Chr |
85.71 |
100.00 |
Cephamycin | |||||
|
|
Chrc |
91.21 |
100.00 |
Fluoroquinolone | |||||
|
|
Chr |
73.77 |
99.02 |
Fluoroquinolone | |||||
|
BFO18 |
MEM |
>32 |
R |
|
IS |
Chr |
100.00 |
100.00 |
Carbapenem |
|
IPM |
16 |
R |
100.00 |
100.00 | |||||
|
MTZ |
16 |
R |
|
IS |
S |
99.19 |
100.00 |
Nitroimidazole | |
|
CLI |
6 |
R |
|
IS |
Chrc |
99.83 |
72.03 |
Clindamycin | |
|
IS |
Chrc |
70.97 |
97.19 | ||||||
|
|
Chrc |
99.58 |
29.71 | ||||||
|
|
Chrc |
100.00 |
100.00 |
Clindamycin | |||||
|
TZP |
>256 |
R | |||||||
|
|
Chr |
65.69 |
53.04 |
Polymyxin | |||||
|
|
Chrc |
73.60 |
99.02 |
Fluoroquinolone | |||||
|
|
Chr |
91.14 |
100.00 |
Fluoroquinolone | |||||
|
|
Chrc |
99.79 |
100.00 |
Tetracycline | |||||
|
|
Chrc |
99.83 |
100.00 |
Macrolides | |||||
|
S01 |
MEM |
>32 |
R |
|
IS |
Chr |
99.20 |
100.00 |
Carbapenem |
|
IPM |
16 |
R | |||||||
|
MTZ |
64 |
R |
|
IS |
pBFS01_2 |
100.00 |
100.00 |
Nitroimidazole | |
|
CLI |
>32 |
R |
|
IS |
Chr |
99.50 |
100.00 |
Clindamycin | |
|
TZP |
6 |
S | |||||||
|
|
Chr |
90.02 |
99.95 |
Tetracycline | |||||
|
|
Chrc |
99.84 |
100.00 |
Tetracycline | |||||
|
|
Chrc |
91.06 |
100.00 |
Fluoroquinolone | |||||
|
|
Chr |
74.03 |
98.80 |
Fluoroquinolone | |||||
|
BFO42 |
MEM |
0.094 |
S | ||||||
|
IPM |
0.25 |
S | |||||||
|
MTZ |
8 |
R |
|
IS |
pBFO32_1 |
98.64 |
96.61 |
Nitroimidazole | |
|
CLI |
>256 |
R |
|
IS |
Chr |
99.50 |
100.00 |
Clindamycin | |
|
|
Chr |
100.00 |
100.00 |
Clindamycin | |||||
|
TZP |
0.38 |
S | |||||||
|
|
Chr |
70.38 |
31.45 |
Polymyxin | |||||
|
|
Chrc |
100.00 |
100.00 |
Cephalosporin | |||||
|
|
Chr |
99.83 |
100.00 |
Macrolide | |||||
|
|
Chr |
71.15 |
31.11 |
Polymyxin | |||||
|
|
Chrc |
100.00 |
100.00 |
Tetracycline | |||||
|
|
Chr |
99.12 |
100.00 |
Fluoroquinolone | |||||
|
|
Chr |
96.66 |
100.00 |
Erythromycin | |||||
|
|
Chrc |
99.88 |
100.00 |
Aminoglycoside | |||||
|
|
IS |
Chrc |
100.00 |
100.00 |
Penicillin, cephalosporin | ||||
|
|
Chr |
75.09 |
99.62 |
Fluoroquinolone | |||||
|
BFO67 |
MEM |
8 |
R |
|
None |
Chr |
100.00 |
100.00 |
Carbapenem |
|
IPM |
0.5 |
S | |||||||
|
MTZ |
0.19 |
S | |||||||
|
CLI |
0.38 |
S | |||||||
|
TZP |
2 |
S | |||||||
|
|
IS |
Chr |
99.69 |
100.00 |
Cephamycin | ||||
|
|
Chr |
99.75 |
100.00 |
Macrolide | |||||
|
|
Chr |
100.00 |
100.00 |
Clindamycin | |||||
|
|
Chrc |
66.76 |
56.30 |
Polymyxin | |||||
|
|
Chr |
100.00 |
100.00 |
Tetracycline | |||||
|
|
Chrc |
90.92 |
100.00 |
Fluoroquinolone | |||||
|
|
Chr |
73.90 |
99.02 |
Fluoroquinolone | |||||
|
BFO65 |
MEM |
32 |
R |
|
None |
Chr |
100.00 |
100.00 |
Carbapenem |
|
IPM |
1 |
S | |||||||
|
MTZ |
0.25 |
S | |||||||
|
CLI |
>256 |
R |
|
Chrc |
99.19 |
98.66 |
Clindamycin | ||
|
TZP |
2 |
S | |||||||
|
|
Chr |
69.84 |
31.45 |
Polymyxin | |||||
|
|
Chrc |
90.02 |
99.95 |
Tetracycline | |||||
|
|
Chrc |
100.00 |
100.00 |
Aminoglycoside | |||||
|
|
Chrc |
100.00 |
100.00 |
Aminoglycoside | |||||
|
|
Chrc |
90.92 |
100.00 |
Fluoroquinolone | |||||
|
|
Chr |
73.53 |
99.02 |
Fluoroquinolone | |||||
|
|
IS |
Chrc |
100.00 |
100.00 |
Cephamycin | ||||
|
|
Chrc |
99.84 |
100.00 |
Tetracycline | |||||
Chr, Chromosome; MEM, meropenem; IPM, imipenem; MTZ, metronidazole; CLI, clindamycin; TZP, piperacillin/tazobactam.
*Results from previously published work following EUCAST (European Committee on Antimicrobial Susceptibility Testing) breakpoints [14].
†A subscript letter C denotes the complement strand.
‡A transposase has inserted itself, splitting the ermF gene in two.