Ming Zhang1, Jun Liu1, Zhenzhen Yin1, Li Zhang2. 1. School of Yunkang Medicine and Health, Nanfang College, Guangzhou, Guangdong, China. 2. School of Life Science, Liaoning University, Shenyang, Liaoning, China.
Abstract
Bacillus cereus is a food contaminant with widely varying enterotoxic potential due to its virulence proteins. In this article, phylogenetic analysis of the amino acid sequences from the whole-genomes of 41 strains, evolutionary distance calculation of the amino acid sequences of the virulence genes, and functional and structural predictions of the virulence proteins were performed to reveal the taxonomically diverse distribution of virulence factors. The genome evolution of the strains showed a clustering trend based on the protein-coding virulence genes. The strains of B. cereus have evolved into non-toxic risk and toxic risk clusters with medium-high- and medium-low-risk subclusters. The evolutionary transfer distances of incomplete virulence genes relative to housekeeping genes were greater than those of complete virulence genes, and the distance values of HblACD were higher than those of nheABC and CytK among the complete virulence genes. Cytoplasmic localization was impossible for all the virulence proteins, and NheB, NheC, Hbl-B, and Hbl-L1 were predicted to be extracellular. Nhe and Hbl proteins except CytK had similar spatial structures. The predicted structures of Nhe and Hbl mainly showed 'head' and 'tail' domains. The 'head' of NheA and Hbl-B, including two α-helices separated by β-tongue strands, might play a special role in the formation of Nhe trimers and Hbl trimers, respectively. The 'cap' of CytK, which includes two 'latches' with many β-sheets, formed a β-barrel structure with pores, and a 'rim' balanced the structure. The evolution of B. cereus strains showed a clustering tendency based on the protein-coding virulence genes, and the complete virulence-gene operon combination had higher relative genetic stability. The beta-tongue or latch associated with β-sheet folding might play an important role in the binding of virulence structures and pore-forming toxins in B. cereus.
Bacillus cereus is a food contaminant with widely varying enterotoxic potential due to its virulence proteins. In this article, phylogenetic analysis of the amino acid sequences from the whole-genomes of 41 strains, evolutionary distance calculation of the amino acid sequences of the virulence genes, and functional and structural predictions of the virulence proteins were performed to reveal the taxonomically diverse distribution of virulence factors. The genome evolution of the strains showed a clustering trend based on the protein-coding virulence genes. The strains of B. cereus have evolved into non-toxic risk and toxic risk clusters with medium-high- and medium-low-risk subclusters. The evolutionary transfer distances of incomplete virulence genes relative to housekeeping genes were greater than those of complete virulence genes, and the distance values of HblACD were higher than those of nheABC and CytK among the complete virulence genes. Cytoplasmic localization was impossible for all the virulence proteins, and NheB, NheC, Hbl-B, and Hbl-L1 were predicted to be extracellular. Nhe and Hbl proteins except CytK had similar spatial structures. The predicted structures of Nhe and Hbl mainly showed 'head' and 'tail' domains. The 'head' of NheA and Hbl-B, including two α-helices separated by β-tongue strands, might play a special role in the formation of Nhe trimers and Hbl trimers, respectively. The 'cap' of CytK, which includes two 'latches' with many β-sheets, formed a β-barrel structure with pores, and a 'rim' balanced the structure. The evolution of B. cereus strains showed a clustering tendency based on the protein-coding virulence genes, and the complete virulence-gene operon combination had higher relative genetic stability. The beta-tongue or latch associated with β-sheet folding might play an important role in the binding of virulence structures and pore-forming toxins in B. cereus.
Bacillus cereus (B. cereus), which is one of twelve closely related species in the Bacillus cereus group [1], is a Gram-positive bacterium occurring ubiquitously in nature with widely varying pathogenic potential [2]. Cells are rod-shaped, with some in chains or occasionally long filaments, and are aerobic or facultative anaerobic. Most species will grow on common media such as nutrient agar and blood agar. They are characteristically large (2–7 mm in diameter) and vary in shape from circular to irregular, with matt or granular textures, while smooth and moist colonies are also common. B. cereus, the spores of which can survive at high temperatures and germinated vegetative cells of which can multiply and produce toxins under favorable conditions, is recognized as the most frequent cause of food-borne disease [3]. Its toxins cause two distinct forms of food poisoning, the emetic type (uncommon) and the diarrheal type (common). Diarrheal strains produce three enterotoxins, which belong to the family of pore-forming toxins: nonhemolytic enterotoxin (Nhe), hemolysin BL (Hbl), and cytotoxin K (CytK). Nhe comprises three proteins, NheA, NheB, and NheC, encoded by one operon containing one of three genes, namely, nheA, nheB, and nheC, respectively. Hbl consists of a single B-component (encoded by hblA) and two L-components, L1 (hblC) and L2 (hblD), all of which are essential for activity, with no individual or pairwise activity [4]. CytK (cytK) is a single-component toxin [5]. The genes that encode NheABC can be detected in nearly all enteropathogenic B. cereus strains, hblACD can be detected in approximately 45% to 65% of such strains, and cytK is less prevalent [6,7]. B. cereus species, which were compared on the basis of 16S rRNA (identity values >98%), were closely homologous to each other [8]. Nevertheless, the suitability of this marker for the classification of B. cereus might be limited, as it is unable to effectively distinguish between the closely related species [9]. Some papers have reported that the species affiliation of B. cereus group, which could lead to an exchange of virulence plasmids between species, often does not match patterns of phylogenetic relatedness [10,11]. While the enterotoxins of B. cereus are chromosome-coded, the unique characteristics are observed for plasmids and are thus present throughout the B. cereus group [12]. Lapidus et al. reported a large plasmid with an operon encoding all three Nhe components in a B. cereus strain [13]. There is evidence that extensive gene exchange occurs between plasmids and the chromosome during the evolution of the B. cereus group [14]. Therefore, some genes encoded on plasmids can spread via horizontal gene transfer among B. cereus and the transfer of a single plasmid from one species to another [15]. Didelot et al. detected three phylogenetic groups (clades) in a study on the evolution of pathogenicity in the B. cereus group [16]. B. cereus, as a genomospecies, could be mainly found in clade two based MLSA and multiple comparative analysis of ANI values [17]. Later, seven major phylogenetic groups with ecological differences were identified in the B. cereus group [10]. A recent study suggested that nine phylogenetic clades of isolates may be better for assessing the risk of diarrheal foodborne disease caused by B. cereus group isolates [18]. These studies of virulence factors of B. cereus concern the evolutionary classification of virulence genes, and there have been few comparative analyses of the relative evolutionary distance of virulence genes and the prediction of virulence protein function and structure. In this study, the genome, virulence gene sequences and predicted virulence proteins of 41 B. cereus strains were comparatively analyzed. This work aims to examine the species diversity of B. cereus strains and the phylogenetic relationships among virulence factors, to systematically evaluate the distribution of virulence genes, and to comparatively analyze the structures and functions of virulence proteins.
Materials and methods
Characterization of B. cereus strains
Forty-one strains of B. cereus with complete- and chromosome-level assemblies in the National Center for Biotechnology Information (NCBI) database were selected for comparative analysis. Datasets that passed the completeness test (acceptable level is >85%) and contamination test (acceptable level is <5%) were composed of sequences submitted on deadline March 5, 2021. Thirty-one pathogenic and ten nonpathogenic strains that were isolated from food, patients, the environment, and unknown sources, were eligible, along with a control strain, Sporolactobacillus terrae 70–3 (S. terrae 70–3), that belonged to a different genus. In terms of evolution, S. terrae which has a defined taxonomic and phylogenetic status, is closely related to B cereus in Bacillales. The sequences and annotation information for the stains were downloaded from the NCBI (details in Table 1).
Table 1
The forty-one B. cereus strains and one S. terrae control strain used in this study.
Strain
Length(Mb)
G+C(%)
Accessionnumber
Origin
Strain
Length(Mb)
G+C(%)
Accessionnumber
Origin
B4264
5.42
35.30
GCA_000021205.1
Pneumonia
172560W
5.70
34.80
GCA_000160935.1
Human tissue
ATCC14579
5.43
35.31
GCA_000007825.1
Soil
Rock3-29
5.88
34.90
GCA_000161215.1
Soil
FORC-013
5.68
35.20
GCA_001518875.1
Foodborne
Rock1-3
5.86
34.90
GCA_000161155.1
Soil
03BB102
5.45
35.29
GCA_000022505.1
Pneumonia
Rock4-18
5.92
35.00
GCA_000161295.1
Soil
ATCC10987
5.43
35.52
GCA_000008005.1
Soil
Rock1-15
5.77
34.90
GCA_000161175.1
Soil
G9842
5.74
35.05
GCA_000021305.1
Stool
BDRD-ST24
5.44
35.10
GCA_000161055.1
Nd
AH820
5.59
35.31
GCA_000021785.1
Periodontitis
ATCC10876
5.94
34.80
GCA_000160895.1
Nd
F837/76
5.29
35.40
GCA_000239195.1
Enteritis
Rock4-2
5.77
34.90
GCA_000161275.1
Soil
Q1
5.51
35.50
GCA_000013065.1
Nd
BGSC6E1
5.73
35.00
GCA_000160915.1
Nd
CI
5.49
35.27
GCA_000143605.1
Anthrax
F65185
6.13
34.70
GCA_000161315.1
Wound
E33L
5.84
35.17
GCA_000833045.1
Carcass
Rock3-28
6.04
35.75
GCA_000161195.1
Soil
AH187
5.60
35.52
GCA_000021225.1
Nd
Rock3-42
5.20
35.20
GCA_000161235.1
Soil
NC7401
5.55
35.54
GCA_000283675.1
Soil
m1293
5.27
35.35
GCA_000003645.1
Food
SGAir0263
6.47
34.91
GCA_010223795.1
Air
BDRD-ST196
5.58
35.20
GCA_000161095.1
Nd
SGAir0260
6.30
34.97
GCA_010232525.2
Air
AH676
5.59
35.00
GCA_000161395.1
Soil
BHU1
5.20
35.10
GCA_002504105.1
Soil
AH1271
5.66
35.30
GCA_000161375.1
Lamp
Co1-1
6.38
35.10
GCA_007923085.1
Wastewater
AH1272
5.79
35.20
GCA_000161395.1
Amniotic fluid
ATCC4342
5.23
35.20
GCA_000161015.1
Food
AH1273
5.79
35.25
GCA_000003955.1
Blood
BDRD-Cer4
5.40
35.10
GCA_000161115.1
Food
R309803
5.59
35.30
GCA_0000160995.1
Clinic
m1550
5.25
35.10
GCA_000161035.1
Food
BDRD-ST26
5.57
35.25
GCA_000161075.1
Nd
95/8201
5.58
35.10
GCA_000161135.1
Endocarditis
Control 70–3
3.31
45.30
GCA_009176625.1
Nd
Nd: Not determined.
Nd: Not determined.
Quality assessment of genomic sequences
The contamination and completeness of the metagenomic sequences were evaluated by CheckM software version v1.1.3 [19].
Phylogenetic and average nucleotide identity (ANI) analysis
The first phylogenetic tree, based on whole-genome amino acid sequences of each strain, was constructed by using the CVTree4 webserver (http://cvtree.online/v4/prok/index.html), which constructs whole-genome-based phylogenetic trees without sequence alignment by using a composition vector (CV) approach, and the K-tuple length was 6 [20]. Every genome sequence was represented by a composition vector, which was calculated as the difference between the frequencies of k-strings and the prediction frequencies by the Markov model [21]. The shape and text content of the phylogenetic tree were modified by Molecular Evolutionary Genetics Analysis (MEGA-X version 10.2.2) [22]. In this study, the same genome sequence data were subjected to ANI analysis to verify the significance of the first phylogenetic tree. ANI analysis was performed using JSpeciesWS Online Service (http://jspecies.ribohost.com/jspeciesws/) as described by Richter et al. [23]. The distance matrix, which was calculated by the distance value (DV) using the formula DV = 1-[ANIb value], was used to construct the second phylogenetic tree, which was generated from the resulting Newick format file using Njplot [24]. The formula was balanced using the mean value method and was subjected to calculation using DrawGram in the PHYLIP package version 3.695 [25]. To determine the associations between each protein-coding gene and the different clusters, statistical enrichment analyses were conducted with PhyloGLM V2.6 [17].
Multilocus sequence analysis (MLSA)
A total of forty-one strains containing gene sequences, which were downloaded from the NCBI, were found and further analyzed for the presence of seven housekeeping and three enterotoxin genes. The housekeeping genes adenylate kinase (adk), catabolite control protein A (ccpA), glycerol uptake facilitator protein (glpF), glycerol-3-phosphate transporter (glpT), pantoate-beta-alanine ligase (panC), phosphate acetyltransferase (pta), and pyruvate carboxylase (pyc) were chosen to calculate the basic evolutionary distances of the species. These housekeeping genes, scattered across the entire chromosome, are suitable for MLSA [26]. The types of enterotoxin genes (nhe, hbl, and cytK) were divided into different groups, and the base sequences were concatenated for further MLSA. Thus, rearrangement of genes was unnecessary because the order of the genes within the operons was conserved in all strains. The distances of concatenated genes were calculated in MEGA X using the maximum likelihood (ML) algorithms, which are based on the Tamura-Nei model with a discrete gamma distribution [22]. The model applied for MLSA of DVs was the ideal substitution model according to the ‘find best DNA/Protein models’ function [27]. The housekeeping genes were of the same length in all strains, as were the different virulence genes. The same settings for the calculation of all phylogenetic DVs were used to ensure comparability of the results. We calculated the relative changes in genetic DVs between the virulence genes, which were concatenated housekeeping genes minus the simple housekeeping gene, representing the change in virulence gene transfer.
Prediction of virulence protein function and structure
SMART software (http://smart.embl-heidelberg.de/), which is a simple modular architecture research tool, was used to predict the domain architecture of the virulence proteins in this study [28]. PSORT (http://www.psort.org/psortb2) and TMHMM (http://www.cbs.dtu.dk/services/TMHMM) software were employed to predict the subcellular location and transmembrane helices of virulence proteins, respectively [29,30]. The SIGNALP-5.0 (http://www.cbs.dtu.dk/services/SignalP/), SWISS-MODEL (http://swissmodel.expasy.org) and AlphaFold v2.1.1 (https://github.com/deepmind/alphafold) servers were used to predict the signal peptide cleavage and three-dimensional (3-D) structures of the enterotoxin proteins, respectively [31-34]. The amino acid sequences of the virulence proteins analyzed were submitted in FASTA format. To predict structure, we performed homology modeling to generate 3-D virulence protein structures.
Results
General genome characteristics and quality assessment of sequences
A summary of the features of the forty-one genomes of B. cereus and the control genome of the closely related species S. terrae is provided in Table 1. The genome sizes of B. cereus strains varied from 5.20 to 6.47 MB. The G+C contents of the forty-one genomes ranged from 34.70% to 35.75%. Compared with the control genome from S. terrae 70–3 (3.31 MB and 45.30%), the genomes of B. cereus were much larger and had lower G+C contents. The contamination and completeness of the sequences were 0–2.02% and 89.81%-98.99%, respectively (shown in Table 2). These results suggested that these sequences are of high quality, have low contamination (values <2.02%, acceptable level is <5%), and have high completeness (values >89.81%, acceptable level is >85%); thus, they were appropriate for analysis. In this study, the strains originated from food (5/41), the clinic (12/41), the environment (17/41), and undetermined sources (7/41). The enterotoxic risk potential based on the virulence genes of forty-one B. cereus strains is listed in Table 2. Enterotoxicity, which was reflected by virulence gene numbers, was categorized into levels of three types (10/41), two types (14/41), one type (7/41), and no types (10/41) levels. The genes detected as enterotoxic were nheABC (29/41), hblACD (19/41), cytk (17/41), nheAB (10/41), hblCD (6/41), hblAD (2/41), and hblD (2/41) (shown in Table 2).
Table 2
The results of the completeness, contamination, enterotoxic genes and risk potential of the strains.
Strain
Completeness(%)
Contamination(%)
Enterotoxicrisk potential
Analysis of enterotoxic genes
Strain
Completeness(%)
Contamination(%)
Enterotoxicrisk potential
Analysis of enterotoxic genes
nheABC
hblACD
cytK
nheABC
hblACD
cytK
B4264
98.99
0.00
H
+
+
+
172560W
98.99
0.00
M
+
CD
+
ATCC14579
98.99
1.35
H
+
+
+
Rock3-29
98.99
0.00
-
AB
CD
-
FORC_013
98.99
0.00
H
+
+
+
Rock1-3
98.99
0.00
-
AB
CD
-
03BB102
98.99
0.00
L
+
-
-
Rock4-18
98.99
0.00
-
AB
CD
-
ATCC10987
98.99
0.00
M
+
-
+
Rock1-15
98.99
1.01
H
+
+
+
G9842
98.99
2.02
M
+
+
-
BDRD-ST24
98.99
0.00
H
+
+
+
AH820
98.48
0.00
H
+
+
+
ATCC10876
98.99
0.00
H
+
+
+
F837/76
98.99
0.00
M
+
+
-
Rock4-2
89.81
0.34
H
+
+
+
Q1
98.99
0.34
-
AB
-
-
BGSC6E1
98.99
0.00
M
+
+
-
CI
98.99
1.01
L
+
-
-
F65185
97.98
0.34
M
+
AD
+
E33L
98.99
0.00
M
+
-
+
Rock3-28
96.46
0.34
-
AB
CD
-
AH187
98.99
0.00
-
AB
-
-
Rock3-42
98.99
0.00
M
+
-
+
NC7401
98.99
0.00
-
AB
-
-
m1293
98.99
0.34
-
AB
-
-
SGAir0263
98.99
0.34
M
+
+
-
BDRD-ST196
98.99
0.34
-
AB
CD
-
SGAir0260
98.99
0.34
M
+
+
-
AH676
98.99
1.01
M
+
AD
+
BHU1
98.99
0.00
L
-
+
-
AH1271
98.99
0.00
M
+
+
-
Co1-1
98.99
0.00
L
-
+
-
AH1272
97.47
1.35
L
+
D
-
ATCC4342
98.99
0.34
M
+
+
-
AH1273
96.93
1.35
L
+
D
-
BDRD-Cer4
97.98
0.34
H
+
+
+
R309803
98.99
0.00
L
+
-
-
m1550
98.99
0.00
H
+
+
+
BDRD-ST26
98.99
0.00
-
AB
-
-
95/8201
98.99
1.01
M
+
-
+
Control 70–3
98.99
0.00
-
-
-
-
Enterotoxic risk potential: Toxicity via the expression of the nheABC, hblACD, and cytK genes; + or -: All or none, respectively; H, M, or L: Three, two, or one type(s) of toxic genes, respectively.
Enterotoxic risk potential: Toxicity via the expression of the nheABC, hblACD, and cytK genes; + or -: All or none, respectively; H, M, or L: Three, two, or one type(s) of toxic genes, respectively.
Phylogenetic analysis based on whole amino acid sequences
Two whole-genome-based methods were used to construct phylogenetic trees. The first phylogenetic tree was constructed with the CV method using the whole amino acid sequences of forty-one B. cereus strains and the outgroup species S. terrae 70–3 [35]. To ensure the accuracy of the results, we added the inbuilt sequence AH1273 and the sequence of S. terrae 70–3 (control) from the webserver database (Fig 1). According to enterotoxic risk potential, the forty-one strains of B. cereus had evolved into five distinct clusters, which were likely risk regions I, IV and V and nonrisk regions II and III. However, there were individual nonconformities, such as nontoxicity of BDRD ST196 in region V. Region I was dominated by medium- and high-risk strains (15/17) but also included two low-risk strains (BHU1 and CO1-1), region II and III included only nonrisk strains (9/9), and regions IV and V were dominated by medium- and low-risk strains (13/15) but also included AH820 (high-risk strain) and BDRD ST196 (nonrisk strain), respectively.
Fig 1
Phylogenetic relationships of the amino acid sequences of forty-one B. cereus strains and one external S. terrae control strain used in this study.
Two additional inbuilt strains from the software were used as internal controls. H, M, L, and "-” symbols indicate high (three types of virulence genes), middle (two types of virulence genes), low (one type of virulence gene), and no enterotoxic risk potential.
Phylogenetic relationships of the amino acid sequences of forty-one B. cereus strains and one external S. terrae control strain used in this study.
Two additional inbuilt strains from the software were used as internal controls. H, M, L, and "-” symbols indicate high (three types of virulence genes), middle (two types of virulence genes), low (one type of virulence gene), and no enterotoxic risk potential.To verify the above results and obtain more accurate molecular evolutionary relationships, we established a second phylogenetic tree based on ANI analysis (Fig 2). According to enterotoxic risk potential, the forty-one strains of B. cereus had evolved into six distinct regions: likely risk regions A, C, E, and D2 and nonrisk regions B and D1. The two phylogenetic trees were similar in terms of the regions where enterotoxic risk was likely. The only difference between the two phylogenetic trees was a change in the evolutionary cluster of two strains. The ATCC 10987 and ATCC 4342 strains, which belonged region IV in the first tree, were assigned to region D in the second tree. Region A, which was dominated by medium-high-risk strains (15/17) but also included two low-risk strains (BHU1 and CO1-1), was the same as region I. Regions B and D1, which included only nonrisk strains (9/9), were the same as regions II and III. Regions C, D2 and E, which were also dominated by medium- and low-risk strains (13/15) but included AH820 (high-risk strain in C) and BDRD ST196 (nonrisk strain in E), were the same as regions IV and V. By taking advantage of the updated enterotoxic risk regions found in the current B. cereus strains, we decided to use a statistical approach to evaluate whether the occurrence of virulence factor-encoding genes (detailed in Table 3) correlate with a particular region. We observed that nheABC was significantly present in region C (p < 0.05), and hblA and nheC were significantly present in region A and C (p < 0.05), respectively. The results showed that nheABC and nheC were significantly enriched in the medium-low-risk region, and hblA was significantly enriched in the medium-high-risk region.
Fig 2
ANI analysis of the phylogenetic relationships of the forty-one B. cereus strains and one S.terrae control strain used in this study.
H, M, L, and "-” have the same meanings as in Table 2.
Table 3
Analysis of the presence of toxic genes enriched by the PhyloGLM tool.
Clade
nheA
nheB
nheC
hblA
hblC
hblD
cytK
nheABC
hblACD
E
pvalue
E
pvalue
E
pvalue
E
pvalue
E
pvalue
E
pvalue
E
pvalue
E
pvalue
E
pvalue
A
-0.0324
0.9259
-0.0324
0.9259
0.3325
0.5176
1.7076
0.0265
0.0696
0.7994
0.1587
0.7026
0.2329
0.5356
0.3326
0.5176
0.4710
0.2952
B
2.6099
0.4195
2.6327
0.4202
-0.1009
0.7644
-0.3859
0.4192
0.4412
0.3372
0.4722
0.3986
-0.0551
0.8126
-0.1009
0.7644
-0.5876
0.3012
C
0.0003
0.9992
-0.0002
0.9995
2.7819
0.0306
0.0142
0.9645
-0.0477
0.8810
-0.1765
0.7184
0.0474
0.8796
2.7819
0.0306
0.0157
0.9517
D
-0.0082
0.9832
-0.0082
0.9832
-0.9999
0.2557
-0.0182
0.9555
-0.0605
0.8506
-1.4942
0.1320
-0.0519
0.8722
-0.9999
0.2557
-0.0086
0.9728
E
2.2231
0.3705
2.2245
0.3704
-0.0364
0.8931
-0.3678
0.4250
-1.8756
0.1716
0.4070
0.4224
-0.0499
0.8161
-0.0364
0.8931
-0.0642
0.7707
E: Estimated value for the Generalized Linear Model.
ANI analysis of the phylogenetic relationships of the forty-one B. cereus strains and one S.terrae control strain used in this study.
H, M, L, and "-” have the same meanings as in Table 2.E: Estimated value for the Generalized Linear Model.
Phylogenetic distance analysis based on concatenated housekeeping and virulence genes
To analyze the evolution and phylogenetic relationships of virulence gene transfer in relation to DVs, the nheABC, hblACD, and cytK genes of the forty-one strains, which need to be compared to the housekeeping genes of the strains, were studied. To this end, we concatenated the sequences of virulence proteins from the strains and seven housekeeping proteins (Adk-CcpA-GlpF-GlpT-PanC-Pta-Pyc) from the B. cereus core genome. The genetic DV of virulence gene transfer was evaluated by calculating the average difference in the phylogenetic DV of the ATCC14579 strain compared with forty other strains. The hblD and hblAD virulence proteins, which were observed in only two strains, were excluded. As shown in Table 2, the relative genetic DVs were calculated for nheAB (10/41), hblCD (6/41), nheABC (29/41), hblACD (19/41), and CytK (17/41). As shown in Fig 3, the average evolutionary DVs of virulence gene transfer from high to low were 0.015 (nheAB), 0.012 (hblCD), 0.005 (hblACD), 0.003 (nheABC) and 0.001 (cytK). The DVs of incomplete virulence genes (nheAB and hblCD) were higher than those of complete virulence genes (nheABC, hblACD, and CytK). The average evolutionary DV of nheAB was higher than the DV of hblCD among the incomplete virulence genes; the DV of hblACD was the highest and that of cytK was the lowest among the complete virulence genes.
Fig 3
The average difference in the phylogenetic distance values of the ATCC14579 strain compared with forty other strains for virulence genes plus housekeeping genes and housekeeping genes examined by MLSA.
Comparative prediction analysis of the function and structure of virulence proteins
As shown in Table 4 and Fig 4, we obtained the scores of the seven virulence proteins for subcellular localization prediction. The scores of NheB, NheC, Hbl-B, and Hbl-L1 were all 9.73, and that of CytK was 9.98, all consistent with extracellular localization. The localization of NheA and Hbl-L2 was unknown because the scores were all lower in the cytoplasmic membrane (3.33/4.6), cell wall (3.33/2.48), and extracellular space (3.33/2.92), making it impossible for the virulence proteins to appear in the cytoplasm. NheB and Hbl-L1 had two helices, which were transmembrane region sequences 235-257/267-286 and 239-261/268-290, and NheC had only one helix, of which the transmembrane region was 228–250, but the others had none. The virulence protein cleavage sites of all strains were in the sequence 30–32 with 0.93–0.99 likelihood levels, except NheA, for which the site was in the 26–27 sequence (0.81 likelihood level). The amino acid sequences of Nhe, Hbl and CytK contained N-terminal signal peptides for secretion (< 31 amino acids). The signal peptide start-end was between 1 and 31 sequences but not found for Hbl-B and Hbl-L1 were not found, and the domain start-end was between 35 and 329 sequences.
Table 4
Subcellular localization, transmembrane helix and region, signal peptide, and domain prediction results of the virulence proteins of B. cereus strains.
Name
Final localization
Final score
Scores for different localizations
Helix number
Transmembraneregion
Cleavagesite
Signal peptidelikelihood
Signal peptide
Domain
Cytoplasmic membrane
Cell wall
Extracellular
Cytoplasmic
start-end
start-end
NheA
Unknown
3.33/4.60
3.33/4.60
3.33/2.48
3.33/2.92
0.00
0
-
26 and 27
0.81±0.07
1–26
41–216
NheB
Extracellular
9.73
0.09
0.18
9.73
0.00
2
235–257;267–286
30 and 31
0.99±0.01
1–30
54–232
NheC
Extracellular
9.73
0.09
0.18
9.73
0.00
1
228–250
30 and 31
0.98±0.02
1–23
46–224
Hbl-B
Extracellular
9.73
0.09
0.18
9.73/9.72
0.00
0
-
31 and 32
0.96±0.04
-
45–224
Hbl-L1
Extracellular
9.73
0.09
0.18
9.73
0.00
2
239–261;268–290
30 and 31
0.98±0.01
1–20
44–230
Hbl-L2
Unknown
3.33/4.60
3.33/4.60
3.33/2.48
3.33/2.92
0.00
0
-
32 and 33
0.93±0.05
-
35–207
CytK
Extracellular
9.98
0.00
0.02
9.98
0.00
0
-
31 and 32
0.99±0.01
1–31
64–329
Fig 4
The locations of transmembrane helices, cleavage sites, signal peptides, and domain start-ends were predicted by TMHMM, SignalP, and SMART software with the ATCC14579 strain.
Examination of the phylogenetic tree constructed using the Hbl, Nhe, and CytK sequences of ATCC 14579 (Fig 5) showed that NheA and Hbl-L2, as well as NheBC and Hbl-L1, were more closely related to one another than to the other components, and CytK was the least evolutionarily related. This result was also reflected in the evaluation parameters of the 3-D enterotoxin protein structures. As shown in Table 5, the closest template for NheB and NheC was Hbl-L1 (sequence identity of 40.82% and 36.83%, respectively), and that for Hbl-L2 was NheA (24.85%). The closest template for CytK was alpha-hemolysis (30.39%), as expected, with considerable amino acid sequence homology to S. aureus leukocidin [36]. The templates of NheA, and Hbl-L1 were included in the SWISS-MODEL server with high sequence identity (97.22% and 99.73%, respectively), and Hbl-B had acceptable sequence identity (71.99%). The sequence coverage and range of all structures were 0.71–0.93 and 33–439, respectively, with GMQE evaluation values (0.54–0.88) above 0.5, which indicated reliable model construction. Each residue is allotted a reliability score between 0 and 1, indicating the expected resemblance to the native structure. Higher numbers represent higher reliability of the residues [37]. In general, a sequence identity of >30% for each template was acceptable based on the SWISS-MODEL server. To verify the above results (especially the sequence identity of Hbl-L2, which was 24.85%) and obtain the predicted Nhe-trimer and Hbl-trimer structures, we used AlphaFold software for secondary structural prediction. The results were acceptable, the predicted local-distance difference test (plDDt) values of monomers were 81.90–94.14, and the scores of Nhe trimers and Hbl trimers were 0.68 and 0.36 (ipTM+pTM), respectively [33,34].
Fig 5
The phylogenetic tree of Hbl-B, Hbl-L1, Hbl-L2, NheA, NheB, NheC, and CytK component sequences with the ATCC14579 strain.
Table 5
The evaluation parameters of the 3-D enterotoxin protein structures predicted by the SWISS-MODEL server and AlphaFold software with the ATCC14589 strain.
Name
Templatenumber
Templatedescription
Sequenceidentity
Sequencecoverage
Sequencerange
GMQE
AlphaFoldplDDt
AlphaFoldipTM+pTM
NheA
4k1p.1.A
NheA
97.22%
0.93
42–386
0.88
93.86
-
NheB
7nmq.1.A
Hbl-L1
40.82%
0.85
52–397
0.69
83.42
-
NheC
7nmq.1.A
Hbl-L1
36.83%
0.88
44–358
0.68
90.25
-
Hbl-B
2nrj.1.A
Hbl-B
71.99%
0.71
33–364
0.61
91.73
-
Hbl-L1
7nmq.1.A
Hbl-L1
99.73%
0.90
41–405
0.88
81.90
-
Hbl-L2
4k1p.1.A
NheA
24.85%
0.74
38–439
0.54
94.14
-
CytK
3yhd.1.A
Alpha-hemolysin
30.39%
0.84
35–334
0.61
89.75
-
Nhe-trimer
-
-
-
-
-
-
-
0.68
Hbl-trimer
-
-
-
-
-
-
-
0.36
The global model quality estimate (GMQE) is a quality estimate that combines properties from the target-template alignment and the template structure [38]. plDDt: Predicted local-distance difference test [29]; pTM: Predicted TM score; ipTM: Interface pTM; ipTM+pTM: Evaluation result of multimer prediction [34].
The global model quality estimate (GMQE) is a quality estimate that combines properties from the target-template alignment and the template structure [38]. plDDt: Predicted local-distance difference test [29]; pTM: Predicted TM score; ipTM: Interface pTM; ipTM+pTM: Evaluation result of multimer prediction [34].Due to the sequence similarity of NheB and NheC with Hbl-B, homology models based on the Hbl-B structure were established. As shown in Fig 6, NheA and Hbl-B had highly similar structures (Fig 6A), and the NheA, NheB, NheC, HBl-B, HBl-L1, and HBl-L2 structures showed that there were two main domains, a ‘head’ and ‘tail’ (Figs 6B–6D, 6F–6H, and 7A–7F). The main body of the structure was formed by the ‘tail’ domain, which consisted of five major helices, and the ‘head’ domain of NheA included two long α-helices separated by β-tongue strands (Figs 6B and 7A). Multiple β-tongue strands were detected in Hbl-B (Fig 7D) but are not shown in Fig 6F because of the prediction method. Another difference was the ‘head’ of Hbl-L2, possibly related to the low sequence identity (Figs 6H and 7F). The ‘latch’ with many β-sheets of CytK folded the ‘cap’ domain, which was the toxic area (Figs 6E and 7G). The amino ‘latch’, which included a short helix in all known pore structures, was observed on the top of the conformation, which extended into the pore to form a β-barrel and was folded into a stranded antiparallel β-sheet in the monomer. Although the amino ‘latch’ protrudes and interacts with the adjacent protomer in the pore, it is located at the edge of the β-sheet of the ‘cap’ region [39]. The ‘rim’ domain, which was composed of three strands of short β-sheets, formed the main body of the balanced structure. The trimers of Nhe and Hbl were horizontally arranged. The Hbl trimer (arranged in the sequence B, -L1, -L2) was more similar than the Nhe trimer (arranged in the sequence A, B, C) based on the structural features. The β-tongue strands of Hbl-B and NheA might play an important structural and functional role in the formation of trimers.
Fig 6
Overview of the structure predicted by the SWISS-MODEL server.
a shows the superposition of the structures of Hbl-B (green) and NheA (burgundy) [40]. b, c, d, f, g, and h show the structures of NheA, NheB, NheC, Hbl-B, Hbl-L1 and Hbl-L2, which are annotated with the ‘head’ and ‘tail’, respectively. b shows a beta-tongue in the ‘head’ region. e shows the structure of CytK, which is annotated with ‘latch’, ‘cap’ and ‘rim’.
Fig 7
Overview of the structure predicted by AlphaFold software.
A, B, C, D, E, and F are the structures of NheA, NheB, NheC, Hbl-B, Hbl-L1, and Hbl-L2, which are annotated with the ‘head’ and ‘tail’, respectively. A and D show beta-tongues in the ‘head’ region. G is the structure of CytK, which is annotated with ‘latch’, ‘cap’ and ‘rim’. H and I show the trimers of the structures of NheA and Hbl-B (green), NheB and Hbl-L2 (wathet), and NheC and Hbl-L1 (pink).
Overview of the structure predicted by the SWISS-MODEL server.
a shows the superposition of the structures of Hbl-B (green) and NheA (burgundy) [40]. b, c, d, f, g, and h show the structures of NheA, NheB, NheC, Hbl-B, Hbl-L1 and Hbl-L2, which are annotated with the ‘head’ and ‘tail’, respectively. b shows a beta-tongue in the ‘head’ region. e shows the structure of CytK, which is annotated with ‘latch’, ‘cap’ and ‘rim’.
Overview of the structure predicted by AlphaFold software.
A, B, C, D, E, and F are the structures of NheA, NheB, NheC, Hbl-B, Hbl-L1, and Hbl-L2, which are annotated with the ‘head’ and ‘tail’, respectively. A and D show beta-tongues in the ‘head’ region. G is the structure of CytK, which is annotated with ‘latch’, ‘cap’ and ‘rim’. H and I show the trimers of the structures of NheA and Hbl-B (green), NheB and Hbl-L2 (wathet), and NheC and Hbl-L1 (pink).
Discussion
In this study, forty-one strains of B. cereus were subjected to phylogenetic analyses based on whole amino acid sequences. Enterotoxicity, which was evaluated on the basis of nheABC, hblACD, and cytK gene expression, was classified into levels of three types, two types, one type, and no types. In terms of evolutionary relationships, clusters of virulence and nonvirulence gene strains were evident, and the regional distribution of the number of types of virulence genes was also presented, further confirmed by ANI-based phylogenetic analyses. We found that the two phylogenetic trees were similar. All non-toxic-risk strains were concentrated in two clusters, and all but two of the medium-high- and medium-low-toxic-risk strains formed clusters. The results suggest the possibility of virulence gene transfer, which may be related to frequent exchange of pathogenicity factors during B. cereus virulence evolution, including so-called probiotic or nonpathogenic species [15]. Previous taxonomic results for the B. cereus group are largely based on inadequate criteria such as virulence characteristics, which residing on virulence plasmids [9]. Due to rampant horizontal gene transfer in bacterial ecosystems, increasing numbers of “core” genes should be found and defined based on refined species classification [41]. Recently, phylogeny-aware methods based on linear regression models were applied at the whole-genome scale to study the genomes of bacteria [17]. By using this statistical approach, we hereby observed that the virulence genes nheABC and nheC positively correlated with enrichment in the medium-low-risk cluster, and hblA was found in the medium-high-risk cluster. The inconsistent evolutionary distribution of individual virulence genes may be due to other factors, which needs further study.The Bacillus hemolytic and nonhemolytic enterotoxin family of proteins consists of several Bacillus enterotoxins, which can cause food poisoning in humans [42]. Hemolytic BL and cytotoxin K (encoded by hblACD and cytK) and nonhemolytic enterotoxin (encoded by nheABC) represent the significant enterotoxins produced by B. cereus. Cardazzo et al. detected horizontal gene transfer in the evolution of enterotoxins within B. cereus strains [43]. Our MLSA results showed that in the process of toxin molecular evolution, there were differences between the results for complete and incomplete virulence proteins, and two toxic-type genes had a more significant effect in relation to DVs than three toxic-type genes. The results suggested that the complete virulence-gene operon combination has higher relative genetic stability. The DV of hemolysin Bl was greater than that of nonhemolytic cytotoxin K. nheABC, which was responsible for most of the cytotoxic activity of B. cereus isolates, showed stable, strictly vertical inheritance [44]. In contrast to hbl, duplication or deletion of nhe, which was almost exclusively transmitted vertically, was rarely observed, and cytK, a one-type gene, had the highest relative genetic stability [15].Bazinet revealed significant associations of particular genes with phenotypic traits shared by groups of taxa [41]. Currently, it is commonly accepted that the toxicity potential of B. cereus is not driven by enterotoxin gene types because the expression of enterotoxin genes is highly complex and probably strain-specifically affected by transcription, posttranscriptional and posttranslational modification [45-47]. Carroll et al. suggested that further classification and descriptions of phenotypes should be added on the basis of genotype classification in B. cereus [2]. In this study, both Nhe and Hbl are three-component cytotoxins composed of binding components A and B and two lytic components B, C and -L1, -L2, with all three subunits acting synergically to cause illness. The amino acid sequences of all Nhe, Hbl and Cytk components containing N-terminal signal peptides indicated toxin secretion via the secretory translocation pathway. The final positions of Hbl-B, Hbl-L1, NheB, NheC, and CytK were all extracellular and did not appear in the cytoplasm, and the two transmembrane regions of NheB and Hbl-L1 might be responsible for transporting the assembled three-component cytotoxins across the membrane to complete the toxic effect. Dietrich et al. found that the factor triggering enterotoxin production under simulated intestinal conditions by various cell lines from different organisms and compartments was independent of cell differentiation [48]. Similarities were found when predicted transmembrane helices were compared. NheA and Hbl-B had no such helices, and NheB and Hbl-L1 had two that may play an important role in molecular docking and transmembrane activities. Furthermore, the Nhe components seem to be additionally processed in the extracellular space after separation from the signal peptide for secretion [48]. The difference is that NheC had one such component and Hbl B had none, which may strengthen the secretion of the Nhe protein.The Nhe and Hbl proteins share sequence similarities, both between the three components of each complex and between the two enterotoxin complexes [12]. The structural and functional properties were consistent with those of the superfamily of pore-forming cytotoxins of Hbl and Nhe [49-51]. The NheA, NheB, NheC, HBl-B, HBl-L1, and HBl-L2 structures showed two main domains, a ‘head’ and ‘tail’. The ‘heads’ of NheA and Hbl-B, including two α-helices separated by β-tongue strands, play a special role in Nhe trimers and Hbl trimers, respectively. Upon contact with lipids, cell membranes or detergents, the protein oligomerizes and forms ring-shaped structures acting as transmembrane pores [52,53], and the hydrophobic β-tongue is assumed to be inserted into the membrane first [54]. It is worth noting that NheB, NheC, Hbl-L1, and Hbl-L2 had few or no β-strands in the ‘head’, which were either responsible for conformational changes of NheA and Hbl-B or for the stabilization of the ‘head’ domain [50] or might lead to reduced toxicity or only ligand function [41]. The difference was reflected in the triplet prediction results, i.e., a significant difference in structural arrangement compactness between Nhe trimers (noncompact type) and Hbl trimers (compact type). Ganash et al. speculated that the Nhe trimer requires interaction with unknown proteins of an additional function [41]. A specific binding order of the three Nhe and Hbl components is also necessary for pore formation [55,56]. We found that NheB and Hbl-L1 were close to NheA and Hbl-B in the predicted trimer structure. Didier et al. found that NheA is important for attaching to cell-bound NheB and NheC and that NheB is the main interaction partner of NheA [57], and a further correlation was found for the amounts of Hbl B and Hbl L1 [46]. Cytotoxin K is a single protein with β-barrel pore-forming toxin in contrast to the tripartite toxin complexes Hbl and Nhe. The CytK structure, which exhibits two ‘latches’ with many β-sheets folded beside the ‘cap’ domain forming a β-barrel, was the pore structure on top of the conformation. The ‘rim’ region, which was folded into a three-stranded antiparallel β-sheet, balanced the structure in the monomer. The predicted structure revealed that CytK was likely to belong to the leukocyte toxin family. These monomers diffuse to target cells and are attached to them by specific receivers [58], which are lipids and proteins that cause lysis of red blood cells by destroying their cell membrane [59].
Conclusion
In this study, we describe the molecular evolution, function and structural diversity of virulence factors in B. cereus strains. The evolution of B. cereus strains showed a clustering trend based on the coding virulence genes. The complete virulence gene operon combination had higher relative genetic stability than the incomplete operon. The two α-helices in the ‘head’ of the NheA and Hbl-B structures, which are separated by β-tongue strands, and two ‘latches’ with many β-sheets folded beside the ‘cap’ of the CytK structure might play a special role in the binding of virulence structures and pore-forming toxins in B. cereus. Overall, the exact mechanism by which B. cereus causes diarrhea remains unknown, but our results provide helpful information for better understanding the taxonomically diverse distribution of virulence factors in B. cereus strains.
Authors: José Juan Almagro Armenteros; Konstantinos D Tsirigos; Casper Kaae Sønderby; Thomas Nordahl Petersen; Ole Winther; Søren Brunak; Gunnar von Heijne; Henrik Nielsen Journal: Nat Biotechnol Date: 2019-02-18 Impact factor: 54.908
Authors: Harley L Worthy; Lainey J Williamson; Husam Sabah Auhim; Stephen H Leppla; Inka Sastalla; D Dafydd Jones; Pierre J Rizkallah; Colin Berry Journal: Toxins (Basel) Date: 2021-03-31 Impact factor: 4.546