| Literature DB >> 33180015 |
Alexander Bolotin1, Benoit Quinquis2, Hugo Roume2, Michel Gohar1, Didier Lereclus1, Alexei Sorokin1.
Abstract
Bacillus thuringiensis serovar israelensis is the most widely used natural biopesticide against mosquito larvae worldwide. Its lineage has been actively studied and a plasmid-free strain, B. thuringiensis serovar israelensis BGSC 4Q7 (4Q7), has been produced. Previous sequencing of the genome of this strain has revealed the persistent presence of a 235 kb extrachromosomal element, pBtic235, which has been shown to be an inducible prophage, although three putative chromosomal prophages have been lost. Moreover, a 492 kb region, potentially including the standard replication terminus, has also been deleted in the 4Q7 strain, indicating an absence of essential genes in this area. We reanalysed the genome coverage distribution of reads for the previously sequenced variant strain, and sequenced two independently maintained samples of the 4Q7 strain. A 553 kb area, close to the 492 kb deletion, was found to be duplicated. This duplication presumably restored the equal sizes of the replichores, and a balanced functioning of replication termination. An analysis of genome assembly graphs revealed a transient association of the host chromosome with the pBtic235 element. This association may play a functional role in the replication of the bacterial chromosome, and the termination of this process in particular. The genome-restructuring events detected may modify the genetic status of cytotoxic or haemolytic toxins, potentially influencing strain virulence. Twelve of the single-nucleotide variants identified in 4Q7 were probably due to the procedure used for strain construction or were present in the precursor of this strain. No sequence variants were found in pBtic235, but the distribution of the corresponding 4Q7 reads indicates a significant difference from counterparts in natural B. thuringiensis serovar israelensis strains, suggesting a duplication or over-replication in 4Q7. Thus, the 4Q7 strain is not a pure plasmid-less offshoot, but a highly genetically modified derivative of its natural ancestor. In addition to potentially influencing virulence, genome-restructuring events can modify the replication termination machinery. These findings have potential implications for the conclusions of virulence studies on 4Q7 as a model, but they also raise interesting fundamental questions about the functioning of the Bacillus genome.Entities:
Keywords: BGSC strains; Bacillus thuringiensis; chromosome structure; long inverted repeats; plasmid-chromosome association; replication termination
Year: 2020 PMID: 33180015 PMCID: PMC8116677 DOI: 10.1099/mgen.0.000468
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Deletions and duplication on the serovar israelensis 4Q7 chromosome. (a) Distribution of Illumina sequencing reads, in coverage per nucleotide, for the 4Q7 (top) and AM65-52 (bottom) strains, over the AM65-52 genome (accession no. CP013275). The 492 kb deletion in 4Q7, corresponding to the 2244 to 2736 kb positions in AM65-52, is indicated by red arrows. Read coverage is higher for the duplicated 553 kb area from 2745 to 3298 kb. The distribution image was copied from the Tablet interface panel and modified slightly to improve its readability. The vertical and horizontal scales are linear, and the values are not shown. The distribution shown corresponds to assembly from the SRR1174235 sequencing reads [1]. A very similar distribution was also obtained with the reads for the 4Q7AS and 4Q7JM samples generated in this study (accession numbers SRR11567778 and SRR11565157; not shown). (b) Coding sequence (CDS) map of the 2.235 to 2.75 Mb region of the serovar israelensis HD1002 strain corresponding to the 492 kb deletion in 4Q7. Potentially essential genes are indicated by blue arrows. The red rectangle indicates the region close to 2622 kb in which GC-skew and the preferential orientations of CDS change sign, which is deleted in 4Q7. The 492 kb deletion includes the dif site (5′-CCTATAATATATATTATGTTAACT-3′) mapping to this area [32]. (c) Circular map of the serovar israelensis HD1002 chromosome. The circles from the centre represent: 1, GC-skew; 2, G+C content distribution; 3, positions of repeated elements; 4 and 5, CDS in the anticlockwise and clockwise directions, respectively; 6, position scale for the circular genome. Red arrows indicate the locations of prophages deleted in the 4Q7 strain. The red segment indicates the 492 kb deletion, and the blue solid arrow indicates the area duplicated in this strain, the potential second copy is indicated by a dashed arrow. The source of information for the HD1002 strain and figure design for (b) and (c) are from the IMG database of the DOE Joint Genome Institute [28]. Bti, serovar israelensis.
Fig. 2.Examples of sequencing reads from SRR1174235 data confirming the inverted repeat and pBtic235-to-chromosome joins in the serovar israelensis 4Q7 strain. (a) Reads confirming the inverted repeat join. Sequences of two reads extracted from the SRR1174235 dataset for the 4Q7 strain are shown at the bottom. The corresponding BLASTn analysis against the serovar israelensis HD1002 genome is indicated in the middle. Each splitting of a read (corresponding to the 2 236 245 and 3 289 716 bp positions in HD1002) indicates the covalent join between two non-neighbouring template genome areas, shown at the top. Homology spots and their links are indicated by red bars and arrows. The two reads shown were randomly selected from about 420 confirming this link. The two sequences in the reads, non-adjacent in HD1002, are shown in italics and plain text. About 700 reads confirming the usual assembly link, corresponding to 3 289 716 bp in HD1002, for multiple group genomes are also present in SRR1174235, but examples are not shown. (b) Reads confirming the joins between the chromosome and the pBtic235 element. As in (a), two reads extracted from SRR1174235 are shown at the bottom. The corresponding BLASTn analysis against the serovar israelensis HD1002 genome is indicated in the middle. Each splitting of a read indicates the covalent join between two non-neighbouring template genome areas, shown at the top, with the chromosomal part on the right and the pBtic235 part on the left. Homology spots and their links are indicated by red bars and arrows. The two reads shown were randomly selected from about 410 reads confirming the link. The sequences in the reads corresponding to pBtic235 are underlined, and those corresponding to the chromosome are not underlined. About 800 reads for pBtic235, and 900 reads for the chromosome, confirming the usual assembly for multiple serovar israelensis genome structures are also present in SRR1174235, but examples are not shown.
Fig. 3.Coverage of the pBtic235 sequence with reads generated from DNA from different strains The distribution images were copied from the Tablet interface panel and have been modified slightly to improve readability. The reads for serovar israelensis 4Q7 strain samples (4Q7KBC, 4Q7AS and 4Q7JM) and for the serovar israelensis ATCC 35646 strain, are shown from top to bottom. The vertical scales are linear, and the values are not shown. The horizontal scale, in kb, is drawn below the figure. The distribution for 4Q7KBC corresponds to the sequencing reads from SRR1174235 [1]. Distributions for 4Q7AS, 4Q7JM and ATCC 35646 were generated from the reads obtained in this study (accession numbers SRR11567778, SRR11565157 and SRR8474067, respectively).
pBtic235 element copy number in different samples of serovar israelensis 4Q7 and in environmental strains (AM65-52, ATCC 35646 and BMP144)
|
Strain sample |
4Q7KBC |
4Q7AS |
4Q7JM |
AM65-52 |
ATCC 35646 |
BMP144 |
|---|---|---|---|---|---|---|
|
Plasmid reads |
1 903 671 |
698 236 |
416 587 |
202 988 |
830 518 |
1 070 076 |
|
Chromosomal reads |
29 299 563 |
10 122 752 |
5 640 342 |
7 914 340 |
26 877 218 |
32 319 992 |
|
Plasmid copy number* |
1.52 |
1.61 |
1.73 |
0.60 |
0.72 |
0.77 |
*Plasmid copy number was calculated as (plasmid reads/plasmid size)/(chromosomal reads/chromosomal size), where plasmid size is 235 424 bp and chromosomal size is 5 499 731 bp.The data for the environmental strains were obtained from our previously published study [12].
Summary of differences between the serovar israelensis BGSC 4Q7 strain and AM65-52
|
Position in AM65-52 |
Nucleotide change |
Amino acid change |
Annotation by RAST |
Variation in samples of 4Q7* |
Variation in other |
|---|---|---|---|---|---|
|
50 293 |
75 bp del |
25 aa del |
DNA-binding protein SpoVG |
|
ATCC 35646, HD1002, HD789 – no |
|
115 294† |
A→G |
Glu→Arg |
DNA-directed RNA polymerase beta subunit |
|
ATCC 35646, HD1002, HD789 – no |
|
564 011 |
T→del |
Frameshift |
Glycine betaine transporter OpuD |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
636 259 |
AT→del |
No change |
Intergenic |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
875 553 |
A→G |
Arg→His |
Hypothetical protein |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
1 222 626 |
C→G |
Ala→Pro |
BclA protein |
|
ATCC 35646, HD1002, HD789 – no |
|
1 222 630 |
C→G |
No change |
BclA protein |
|
ATCC 35646, HD1002, HD789 – no |
|
1 222 638 |
87 bp del |
29 aa del |
BclA protein |
|
ATCC 35646, HD1002, HD789 – no |
|
1 355 830 |
C→T |
Ser→Pro |
Bacitracin transporter BCRB |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
1 529 720 |
G→A |
No change |
Inner spore coat protein CotD |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
1 529 729 |
G→A |
No change |
Inner spore coat protein CotD |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
2 764 501 |
G→A |
Val→Ile |
DedA family membrane protein |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
2 860 235 |
G→A |
No change |
Putative kinase |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
2 958 066 |
T→C |
Ile→Met |
Phosphohydrolase (MutT family protein) |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
3 148 593 |
A→C |
Ile→Arg |
Penicillin acylase II |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
3 174 075 |
A→del |
Frameshift |
Capsule biosynthesis protein CapA |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
3 175 718 |
T→C |
Asn→Ser |
MFS-type transporter YfkF |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
3 229 841 |
A→G |
No change |
Intergenic |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
3 904 125 |
C→T |
Met→Ile |
P-type Ca2+-transport ATPase |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
4 018 377 |
C→del |
No change |
Intergenic |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
4 231 259 |
G→C |
Ala→Pro |
Transcriptional regulator, AcrR family |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
4 247 985 |
G→T |
No change |
Intergenic |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – yes |
|
4 573 149 |
A→del |
No change |
Intergenic |
|
ATCC 35646, HD1002, HD789 – no |
|
4 747 124 |
9 bp del |
3 aa del |
VrrB protein |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
4 787 070 |
A→T |
Ile→Leu |
Sporulation kinase |
KBC, AS, JM – yes |
ATCC 35646, HD1002, HD789 – no |
|
4 811 263 |
A→C |
Asn→His |
Glycerate kinase |
|
ATCC 35646, HD1002, HD789 – no |
*Variations specific to a particular sample of the 4Q7 strain are underlined.
†RifR mutation in the B. thuringiensis serovar israelensis 4Q7JM sample.
Fig. 4.Bandage-assisted visualization of the assembly graphs and proposed resolution for the serovar israelensis 4Q7 genome. (a), (b) and (c) show the Bandage [26] visual presentation of the Bruijn graphs generated by the SPAdes assembler [19] for the 4Q7KBC, 4Q7AS and 4Q7JM samples, respectively. Curved grey, yellow and green lines, with thicknesses proportional to read coverage, represent nodes (contigs). Only the part of genome close to the replication termination area and the connection to the pBtic235 element node are shown. Thin black lines represent potential edges (links) that connect nodes, as proposed by the assembly software and corrected following scrutiny by an expert. The results of expert intervention are shown from the left to the right graphs. The red and black numbers on the cartoon on the left indicate the mean contig coverage and the number of reads supporting the edges proposed by the software, respectively. The graphs furthest to the right correspond to the best assembly based on expert scrutiny. Grey closed curved circular structures represent separate pBtic235 elements. The bacterial chromosome, thus, appears to be linear.
Fig. 5.pBtic235 element-to-chromosome linkage on the genetic map of pBtic235. (a–c) Bandage presentation of assembly graphs for: serovar israelensis 4Q7JM ONT reads assembled de novo with canu (a), 4Q7JM Illumina reads assembled de novo with SPAdes (b) and 4Q7AS Illumina reads assembled de novo with SPAdes (c). Blue indicates the chromosome nodes, apart from the long 553 kb repeats, which are shown in green. In (b) and (c), the complex repeats, mostly corresponding to rRNA operons, are left unresolved. The pBtic235 node is indicated in red. Dashed insets indicate enlargements of portions of assembly graphs close to the element-to-chromosome connection. (d) A simplified genetic map of pBtic235 is shown on the right. Circles from the centre represent: 1, scale (0 to 235.4 kb); 2, GC-skew; 3, G+C-content deviation from the mean; 4, CDS map; 5, coverage (black) with Illumina reads for the 4Q7KBC sample; 6, coverage (blue) with Illumina reads for 4Q7AS; 7, coverage (green) with Illumina reads for 4Q7JM; 8, coverage (red) with ONT reads for 4Q7JM; 9, scale and selected genetic markers for the plasmid-like module [13] of the element. For convenience, linear vertical value scales are indicated for each read distribution. The same read distributions are presented in linear form on the left. Dashed lines indicate the correspondence of regions with overcoverage in the linear and circular presentations of the distributions. The regions are identical for 4Q7AS and 4Q7KBC. Dashed arrows show the location of element-to-chromosome linkages confirmed with multiple sequencing reads. Note the gradual increase in coverage at around 150 and 100 kb, for the 4Q7JM and 4Q7AS, and 4Q7KBC samples, respectively, presumably due to the use of different active origins of replication. For 4Q7KBC, the distribution corresponds to the sequencing reads from SRR1174235 [1]. The distributions for 4Q7AS and 4Q7JM were generated from the reads obtained during this study (SRR11567778 and SRR11565157). NGS, Next Generation Sequencing.