| Literature DB >> 28558786 |
G Margos1, S Hepner2, C Mang2, D Marosevic3,4, S E Reynolds5, S Krebs6, A Sing2, M Derdakova7, M A Reiter8, V Fingerle2.
Abstract
BACKGROUND: Borrelia (B.) burgdorferi sensu lato, including the tick-transmitted agents of human Lyme borreliosis, have particularly complex genomes, consisting of a linear main chromosome and numerous linear and circular plasmids. The number and structure of plasmids is variable even in strains within a single genospecies. Genes on these plasmids are known to play essential roles in virulence and pathogenicity as well as host and vector associations. For this reason, it is essential to explore methods for rapid and reliable characterisation of molecular level changes on plasmids. In this study we used three strains: a low passage isolate of B. burgdorferi sensu stricto strain B31(-NRZ) and two closely related strains (PAli and PAbe) that were isolated from human patients. Sequences of these strains were compared to the previously sequenced reference strain B31 (available in GenBank) to obtain proof-of-principle information on the suitability of next generation sequencing (NGS) library construction and sequencing methods on the assembly of bacterial plasmids. We tested the effectiveness of different short read assemblers on Illumina sequences, and of long read generation methods on sequence data from Pacific Bioscience single-molecule real-time (SMRT) and nanopore (Oxford Nanopore Technologies) sequencing technology.Entities:
Keywords: Borrelia burgdorferi; De novo assembly; Genomics; Next generation sequencing; Plasmids
Mesh:
Year: 2017 PMID: 28558786 PMCID: PMC5450258 DOI: 10.1186/s12864-017-3804-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic drawing of the genome organization of B. burgdorferi s.s. strain B31-GB. Strain B31-GB has been described to possess >20 linear and circular plasmids (Casjens et al. 2012). Similar color indicates plasmids with high sequence similarity. Note that lp56 has an insertion of a cp32 plasmid
Sequencing statistics for all samples and sequencers
| Sequencer/Strain | B31-NRZ | PAli | PAbe | |||||
|---|---|---|---|---|---|---|---|---|
| Illumina | Library prep | TS | MP | NX | MP | NX | TS | MP |
| Total bp (Mb) | 564.6 | 174.4 | 779 | 2,140 | 514 | 527 | 106.5 | |
| Average read length after trimming | 249 | 239 | 221 | 243 | 218 | 228 | 242 | |
| GC content % | 28.1 | 28.1 | 28.2 | 28.2 | 28.2 | 28.2 | 28.2 | |
| Total sequences in pairs | 2,264,058 | 725,744 | 3,092,326 | 8,984,852 | 2,040,798 | 2,178,498 | 437,050 | |
| PacBio | Library prep | BluePippin | BluePippin | BluePippin | ||||
| Total bp | 794.6 | 429.3 | 2,188.6 | |||||
| Average read length | 13,987 | 13,632 | 21,727 | |||||
| GC content | 28.4 | 28.3 | 28.3 | |||||
| Average insert size (bp) | 11,332 | 7,997 | 4,177 | |||||
| Nanopore | Library prep | Ligation 1D | ||||||
| Total bp (Mb) | 77.5 | |||||||
| Average read length | 4000 | |||||||
| GC content | 28.8 | |||||||
Fig. 2Visualization of de novo assembly of lp28-1 in B31-NRZ. Poor assembly was observed using TS libraries only (inner most ring, purple color, labelled TS library in legend). A combination of TS and MP libraries considerably improved the assembly of lp28-1 (blue color, labelled TS-MP library in legend). Also the use of enriched plasmid DNA for NX library construction improved the assembly (red color, plasmid enrichment). Pacific Bioscience SMRT sequences also provided a good assembly (turquois color, SMRT Bell library). It is noteworthy that in all cases (TS-MP, plasmid enrichment, SMRT Bell) the plasmid of B31-NRZ was shorter than that of B31-GB. Different shades of coloration in one panel indicate different identities between aligned sequences. Regions with lighter shades correspond to less sequence similarity (see region app. 7 kbp to 7.8 kbp in plasmid enrichment)
Fig. 3Visualization of assemblies of lp17 in PAbe aligned to the reference B31-GB using BRIG. Total genomic DNA was used for TS library preparation (purple color, labelled TS library). A spurious gap in the region from 0 to 2.5 kb is visible. Using enriched plasmid DNA for NX library construction improved the assembly and a smaller gap at 16–17 kb is visible (red color, labelled plasmid (de novo)). Not surprinsingly, read mapping of Illumina reads on B31-GB using enriched plasmid DNA for library construction showed complete coverage suggesting that reads for the complete plasmid existed but were not assembled de novo (blue color, labelled plasmid readmapping). De novo assemblies of Pacific Bioscience SMRT sequences showed complete coverage of lp17 confirming that the complete plasmid was present in PAbe and that the gaps presenting in de novo assemblies of short reads were artefacts either of library construction or short read assembly (torquois color, labelled SMRT Bell library)
Summary statistics of B31-NRZ TS paired-end Illumina reads using different de novo assemblers
| Value/Assembler | CLC Genomic Workbench | SGA | Velvetoptimiser | SPAdes |
|---|---|---|---|---|
| Number of contigs | 71 | 3930 | 304 | 145 |
| Length of longest contig | 442,339 | 174,427 | 434,181 | 910,137 |
| Total length | 1,255,479 | 2,307,446 | 1,173,546 | 1,251,637 |
| N50 | 359,624 | 663 | 353,278 | 910,137 |
Fig. 4Visualization of assemblies of cp32-1 of B31-NRZ aligned to the reference B31-GB. De novo assembly of total genomic DNA sequenced from a TS library preparation (purple color, labelled TS library). The alignment shows several gaps covering only three quarter of the plasmid. Using enriched plasmid DNA for NX library construction followed by de novo assembly in the CLC Genomics workbench (red color, plasmid (de novo)), the size of gaps was reduced. Read mapping of Illumina reads on B31-GB using enriched plasmid DNA for NX library construction (blue color, plasmid (read mapping)) produced a gapless alignment of reads to the reference. De novo assemblies of Pacific Bioscience SMRT sequences (torquois color, SMRT Bell library) showed complete coverage of the cp32-1 plasmid. The image shows the larger size of the SMRT assembly with a gap appearing around 17 kb which likely reflect the differences observed between B31-NRZ and B31-GB. Different shades of coloration in one panel indicate different identities between aligned sequences. Regions with lighter shades correspond to less sequence similarity (see region app. 3.5 kbp to 4.0 kbp in de novo assembly using enriched plasmids)
Plasmids present in B31-NRZ, PAli and PAbe
| GenBank | PCR verification | PacBioscience SMRT sequences | ||
|---|---|---|---|---|
| B31 or other Bbssa strains | B31 NRZ | B31 NRZ | PAli | PAbe |
| cp26 | cp26 | cp26 (con9) | cp26 (con6) | cp26 (con8) |
| cp32-1 | cp32-1_i5_i6 (con4)b
| cp32-1c (con14) | cp32-1_i5_i6d (con2) (56,633 bp) | |
| cp32-3 | cp32-3 | cp32-3 (con8) | cp32-3 (con4) | cp32-3 (con5) |
| cp32-4 | cp32-4 (con10) | cp32-4 (con9) | (fusion cp32-9) | |
| cp32-2 | cp32-2 (con6)e | - | cp32-2 (con6)e | |
| cp32-5 | (fusion cp32-1) | cp32-5 (con13) | (fusion cp32-1) | |
| cp32-9 | cp32-9 (con7) | cp32-9 (con7) | cp32-9 (con3)f
| |
| lp17 | lp17 | lp17 (con5) | lp17 (con8) | lp17 (con7) |
| lp28-1 | lp28-1g | lp28-1 (con12, 21,885 bp) | - | lp28-1 (4800 bp) |
| lp36 | lp36 | lp36 (con11) | lp36 (con5) | lp36 (part; |
| lp38 | lp38 | lp38 (con2) | lp38 (con10) | - |
| lp54 | lp54 | lp54 (con1) | lp54 (con2) | lp54 (con1) |
| lp56 | lp56 | lp56 (con3) | lp56 (con3) | lp56 (con42: 10,951 bp, con43: 48,674 bp; overlap 200 bp) |
aBbss = B. burgdorferi s.s.; the full complement of B31-GB plasmids can be found in Table S3
bIn B31-NRZ contig4 (con4) of the SMRT sequences was likely a hybrid plasmid as the first app. 33,000 bp showed 1138 SNPs to cp32-1 while the sequence starting from 33,000 had 773 SNPs to cp32-1. The regions showing variation to cp32-1 of B31 showed similarity to cp32-5 of strain 64b (21–26 kb) and to cp32-6 of strain 156a (47–48.5 kb) (see text for details). Contig4 had 6631 SNPs compared to cp32-1 + 5 of JD1. Its PFam32 sequences showed 100% similarity to cp32-5 and cp32-1
cIn PAli, three contigs were found that matched cp32-1 in BLAST searches. Contig14 showed 20 SNPs to cp32-1 but was short (20,004 bp) and did not have a PFam32 locus. Contig13 matched cp32-1 from 1 to 18,740 bp, 18.7–23 kb the similarity was higher to cp32-5 of B. burgdorferi s.s. strain 64b , while the remaining sequence to 32,285 matched cp32-1 again. Its PFam32 sequences showed 100% similarity to cp32-5 of several B. burgdorferi s. s. strains. Contig15 between 16.5 and 18 kb showed a closer match to cp32-6 of strain 156a while the remaining sequence matched cp32-1. PFam32 sequence of contig15 was identical to cp32-1 of B31-GB
dIn PAbe one contig was present (contig2) that partially matched cp32-1 but part of the sequence also matched cp32-5 (appr. 14.5–19 kb) and cp32-6 (appr. 40–42 kb). In contig2, the PFam32 of cp32-1 showed 100% identity
eA plasmid was found in B31-NRZ that was not present in the B31-GB reference. Its PFam32 sequence was identical to cp32-2 and showed high similarity (99%) to cp32-7 of several B. burgdorferi s.s. strains. An almost identical plasmid was found in PAbe with a PFam32 matching that of cp32-2
fIn BLAST searches contig3 of PAbe showed similarity to cp32-9 . However, the first part of the sequence was highly similar to cp32-9 while the second part was more similar to cp32-4 of B31 suggesting that this plasmid represents a hybrid plasmid. The PFam32 sequences were identical to PFam32 of cp32-9 and cp32-4
gIn B31 plasmid lp28-1 is short (21,885 bp). In PAbe the presence of lp28-1 is questionable as there were only two contigs of approximately 4800 bp that showed high similarity to lp28-1
Comparison of nanopore contigs and trimmed reads with GenBank and determination of single nucleotide differences (SNPs) in MEGA
| B31-GB genome (length) | Nanopore contig name | Nanopore contig length | Identity with B31-GB | Id % | Gaps | Gaps % | Start query (subject) | Stop query (subject) | SNPs NP MEGA | % | SNPs PB MEGA | % |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| main (910,724) | 241244 | 878,078 | 659,561/683,765 | 96 | 24,016/683,765 | 3 | 205,813 (213,067) | 865,577 (897,715 | nd | nd | ||
| cp26 (26,498) | 241245 | 46,144 | 25,597/26,496 | 97 | 8,951/26,496 | 3 | 13,432 (1) | 39,032 (26,496) | 15/26,498 | 0.05 | 4/26,498 | 0.01 |
| lp54 (53,657) | 241246 | 51,693 | 51,381/53,655 | 96 | 2,222/53,655 | 4 | 199 (53,651) | 51,635 (1) | 199/53,651 | 0.2 | 7/53,915 | 0.01 |
| lp56 (52,971) | 241248 | 51,105 | 50,156/52,909 | 95 | 2,152/52,909 | 4 | 1 (65) | 51,105 (52,578) | 837/53,318 | 1.6 | 7/53,318 | 0.01 |
| lp38 (38,829) | 241249 | 37,824 | 37,356/38,823 | 96 | 1,429/38,823 | 3 | 54 (38,821) | 37,454 (6) | 176/38,821 | 0.45 | 1/38,821 | 0.002 |
| cp32-1 (30,750) | 241247 | 67,311 | 20,520/21,493 | 95 | 663/21,493 | 3 | 36,733 (1) | 57,597 (21,458) | 2694/60,973b | 4.4 | ||
| lp17 (16,821) | ch_216a | 16,238 | 16,067/16,821 | 96 | 751/16,821 | 4 | 42 (1) | 16,111 (16,821) | 80/16,989 | 0.47 | 0/16,989 | 0.0 |
| lp28-1 (28,155) | ch_17a | 15,787 | 15,638/16,873 | 93 | 1,117/16,873 | 6 | 1 (1866) | 15,785 (18,709) | 1911/19,082 | 10.0 | 171/21,885 | 0.78 |
| lp36 (36,849) | ch_474a | 34,829 | 34,690/36,856 | 94 | 2,094/36,856 | 5 | 1 (8) | 34,777 (36,848) | 281/36,856 | 0.7 | 0/36,856 | 0.0 |
| cp32-3 (30,223) | ch_176a | 30,596 | 23,314/24,600 | 95 | 1,195/24,600 | 4 | 7187 (5629) | 30,596 (30,223) | 286/31,974 | 0.9 | 2/30,223 | 0.006 |
| cp32-4 (30,299) | ch_152a | 22,112 | 21,907/23,209 | 94 | 1,105/23,209 | 4 | 1 (4046) | 22,111 (27,247) | 339/27,247 | 1.2 | 3/27,247 | 0.01 |
| cp32-9 (30,651) | ch_212a | 28,699 | 17,790/19,416 | 92 | 1,325/19,416 | 6 | 1 (11,245) | 18,100 (30,651) | 491/30,651 | 2.7 | 2/30,651 | 0.006 |
| cp32-2 | ch_293a | 18,058 | nd | nd | nd | nd | nd | nd | 181/18,058b | 1.0 |
aunassembled trimmed reads
baligned to Pacific Bioscience SMRT sequence unitig4 (cp32-1 fused plasmid) and unitig6 (cp32-2), respectively