Literature DB >> 35171089

Comparative genome analyses of five Vibrio penaeicida strains provide insights into their virulence-related factors.

Wafaa Ragab^1,2, Satoshi Kawato¹, Reiko Nozaki¹, Hidehiro Kondo¹, Ikuo Hirono¹.

Abstract

Vibrio penaeicida (family Vibrionaceae) is an important bacterial pathogen that affects Japanese shrimp aquaculture. Only two whole-genome sequences of V. penaeicida are publicly available, which has hampered our understanding of the pathogenesis of shrimp vibriosis caused by this bacterium. To gain insight into the genetic features, evolution and pathogenicity of V. penaeicida, we sequenced five V. penaeicida strains (IFO 15640T, IFO 15641, IFO 15642, TUMSAT-OK1 and TUMSAT-OK2) and performed comparative genomic analyses. Virulence factors and mobile genetic elements were detected. Furthermore, average nucleotide identities (ANIs), clusters of orthologous groups and phylogenetic relationships were evaluated. The V. penaeicida genome consists of two circular chromosomes. Chromosome I sizes ranged from 4.1 to 4.3 Mb, the GC content ranged from 43.9 to 44.1 %, and the number of predicted protein-coding sequences (CDSs) ranged from 3620 to 3782. Chromosome II sizes ranged from 2.2 to 2.4 Mb, the GC content ranged from 43.5 to 43.8 %, and the number of predicted CDSs ranged from 1992 to 2273. All strains except IFO 15641 harboured one plasmid, having sizes that ranged from 150 to 285 kb. All five genomes had typical virulence factors, including adherence, anti-phagocytosis, flagella-related proteins and toxins (repeats-in-toxin and thermolabile haemolysin). The genomes also contained factors responsible for iron uptake and the type II, IV and VI secretion systems. The genome of strain TUMSAT-OK2 tended to encode more prophage regions than the other strains, whereas the genome of strain IFO 15640T had the highest number of regions encoding genomic islands. For comparative genome analysis, we used V. penaeicida (strain CAIM 285T) as a reference strain. ANIs between strain CAIM 285T and the five V. penaeicida strains were >95 %, which indicated that these strains belong to the same species. Orthology cluster analysis showed that strains TUMSAT-OK1 and TUMSAT-OK2 had the greatest number of shared gene clusters, followed by strains CAIM 285T and IFO 15640T. These strains were also the most closely related to each other in a phylogenetic analysis. This study presents the first comparative genome analysis of V. penaeicida and these results will be useful for understanding the pathogenesis of this bacterium.

Entities: Chemical

Keywords: Vibrio penaeicida; comparative genomics; genome sequencing; mobile genetic elements; virulence factors

Mesh：

Substances：
Virulence Factors

Year: 2022 PMID： 35171089 PMCID： PMC8942037 DOI： 10.1099/mgen.0.000766

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

Data Summary

The translated CDSs of 105 (102 and three species) reference genomes were retrieved from the NCBI RefSeq database. The corresponding accession numbers and other details of CDSs are available in Table S1 (available in the online version of this article). Genome assemblies of four strains and two plasmid sequences were retrieved from the NCBI database. The corresponding accession numbers and URLs are available in Table S2. species are ubiquitous in aquatic ecosystems, causing various infections in fish, crustaceans and shellfish. is a member of the genus that was first isolated from diseased shrimp in 1982 in Yamaguchi and Kumamoto Prefectures, Japan. In this study, we investigated the genomes of five strains using whole-genome sequencing and comparative genome analysis. Our genomic analysis identified common genomic species features, multiple virulence factors and various mobile genetic elements. Furthermore, novel plasmids were identified and were found to be contained in tetracycline resistance genes. The results of this study provide valuable information to understand the genetic features, pathogenicity, evolution and phylogenetic distinctness of .

Introduction

The global shrimp trade represents about 18 % of the total world fish trade in terms of value, and the global farmed shrimp market continues to grow faster than that of any other aquaculture species, with most shrimp being produced in Asia [1]. The kuruma shrimp (Marsupenaeus japonicus) is one of the most economically important reared shrimp species in Japan [2]. However, in shrimp aquaculture, microbial infections remain a major problem, with bacterial infections accounting for 20 % of total losses [1]. Vibriosis is a serious bacterial disease in shrimp aquaculture [3]. was first described in 1995 by Ishimaru et al. as a Gram-negative, facultatively anaerobic, motile by a single polar flagellum, and slightly curved rod-shaped bacterium [4], and was first isolated by Takahashi et al. in 1982 as the causative agent of vibriosis affecting cultured kuruma shrimp in Japan [5]. has also been responsible for mortality outbreaks of Penaeus stylirostris in New Caledonia [6]. Shrimp with vibriosis are characterized by cloudiness of the abdominal muscle, especially in the sixth segment, and brown spots in the lymphoid organs and gills [5]. The pathogenicity of vibrios is due to a large number of virulence factors, including proteases, haemolysins, siderophores, cytotoxins, quorum sensing (QS), phage, biofilm formation [7-9] and flagella, which are essential for motility [10]. Among the various virulence factors, haemolysin was reported as one of the major virulence factors among species, such as the thermolabile haemolysin (TLH) that was detected in [11] and [12]. QS is a process through which bacteria can communicate by extracellular signalling molecules called autoinducers [13] and gives bacteria the ability to control the secretion of virulence factors. For example, QS is known as the most defined virulence regulatory mechanism in [14] and V. cholera [15]. The type VI secretion system (T6SS), another important virulence factor, has been associated with the pathogenesis of [16], [17], and acute hepatopancreatic necrosis disease (AHPND)-causing [18]. Furthermore, type IV pili, a virulence factor related to adherence, play important roles in host–cell interactions, gliding motility, DNA uptake, twitching motility and signal transduction [19]. The genomes of species typically possess two chromosomes that are shaped by recombination and horizontal gene transfer (HGT) [20]. HGT has an important role in bacterial pathogenesis by dissemination of genes encoding virulence and antibiotic resistance through mobile genetic elements (MGEs) such as integrating conjugative elements (ICEs), genomic islands (GIs), plasmids and bacteriophages [21]. These MGEs have been identified in vibrios and have been reported to play a significant role in the evolution of species [22]. Recently, whole genome sequencing (WGS) approaches towards microbes have provided high-quality sequences to examine virulence-associated genes [23], metabolism, drug resistance, host–pathogen interactions and host–environment reactions [24]. Currently, only two whole-genome sequences of that were isolated from kuruma shrimp are available in the GenBank database. In addition, only a few studies on the pathogenesis of in penaeid shrimps are available. Previous studies identified and proposed different extracellular products as putative virulence factors to shrimp; for example, strain AM101 secreted an extracellular thermo-labile cytotoxin, which produced 100 % mortality when it was injected into juvenile Litopenaeus stylirostris [25], and a cysteine protease-like exotoxin, which produced high mortality when injected into juvenile Litopenaeus vannamei [26]. Therefore, the present study aimed to characterize the genomes of five strains using a comparative genomics approach to better understand their pathogenesis.

Methods

Bacterial strains

Strains TUMSAT-OK1 and TUMSAT-OK2 were isolated from two mass mortality events involving M. japonicus in a commercial shrimp farm in Okinawa, in 2019. The shrimp did not show any unambiguously diagnostic clinical signs at the macroscopic level (as is often the case for any shrimp infectious diseases). We therefore sought to isolate possibly pathogenic bacteria from the moribund shrimp by inoculating shrimp stomach contents onto heart infusion agar plates. The colonies grown on the plates were uniform, indicating that the moribund shrimp were infected by a single type of bacterium, which was subsequently identified as by 16S rRNA gene sequencing. We isolated nine phenotypically indistinguishable isolates, from which TUMSAT-OK1 and TUMSAT-OK2 were randomly selected to represent each mass mortality event and were used in WGS. For comparative genomic analyses, we purchased three historical strains, IFO 15640T, IFO 15641 and IFO 15642, from the NITE Biological Resource Center (NBRC) (Table 1). The three strains were the only strains available in public culture collections. Starting with glycerol stocks, all strains were cultured on heart infusion agar plates supplemented with 2.5 % (w/v) NaCl and were incubated at 25 °C overnight. For DNA extraction, a full loopful of grown colonies was selected, cultured in heart infusion broth (Gibco) supplemented with 2.5 % (w/v) NaCl and incubated at 25 °C overnight with shaking.

Table 1.

Genome assembly statistics and annotation information of strains

Strain	GeneBank accession no.	Genome Size (bp)	GC (%)	no. of coding sequences	no. of rRNAs	no. of tRNAs	Origin or source of strain and year of isolation
IFO 15640^T
Chromosome I	AP025144	4,134,604	44.12	3620	31	102	Kuruma shrimp, Kagoshima Prefecture, purchased from NBRC, 1989
Chromosome II	AP025145	2,363,338	43.8	2076		7
Plasmid	AP025146	285,012	41.83	294
IFO 15641
Chromosome I	AP025147	4,300,323	44.01	3782	28	98	Kuruma shrimp, Yamaguchi Prefecture, purchased from NBRC, 1986
Chromosome II	AP025148	2,273,513	43.81	1992		7
IFO 15642
Chromosome I	AP025149	4,212,274	43.92	3708	25	94	Kuruma shrimp, Hiroshima Prefecture, purchased from NBRC
Chromosome II	AP025150	2,363, 514	43.8	2071		7
Plasmid	AP025151	240,687	41.76	260
TUMSAT-OK1
Chromosome I	AP025152	4,213,929	43.94	3711	31	102	Kuruma shrimp, stomach, Okinawa Prefecture, 2019
Chromosome II	AP025153	2,489,285	43.59	2270		7
Plasmid	AP025154	150,136	43.33	185
TUMSAT-OK2
Chromosome I	AP025155	4,213,527	43.94	3711	31	102	Kuruma shrimp, stomach, Okinawa Prefecture, 2019
Chromosome II	AP025156	2,489,259	43.59	2273		7
Plasmid	AP025157	150,127	43.33	187

Genome assembly statistics and annotation information of strains Strain GeneBank accession no. Genome Size (bp) GC (%) no. of coding sequences no. of rRNAs no. of tRNAs Origin or source of strain and year of isolation IFO 15640T Chromosome I AP025144 4,134,604 44.12 3620 31 102 Kuruma shrimp, Kagoshima Prefecture, purchased from NBRC, 1989 Chromosome II AP025145 2,363,338 43.8 2076 7 Plasmid AP025146 285,012 41.83 294 IFO 15641 Chromosome I AP025147 4,300,323 44.01 3782 28 98 Kuruma shrimp, Yamaguchi Prefecture, purchased from NBRC, 1986 Chromosome II AP025148 2,273,513 43.81 1992 7 IFO 15642 Chromosome I AP025149 4,212,274 43.92 3708 25 94 Kuruma shrimp, Hiroshima Prefecture, purchased from NBRC Chromosome II AP025150 2,363, 514 43.8 2071 7 Plasmid AP025151 240,687 41.76 260 TUMSAT-OK1 Chromosome I AP025152 4,213,929 43.94 3711 31 102 Kuruma shrimp, stomach, Okinawa Prefecture, 2019 Chromosome II AP025153 2,489,285 43.59 2270 7 Plasmid AP025154 150,136 43.33 185 TUMSAT-OK2 Chromosome I AP025155 4,213,527 43.94 3711 31 102 Kuruma shrimp, stomach, Okinawa Prefecture, 2019 Chromosome II AP025156 2,489,259 43.59 2273 7 Plasmid AP025157 150,127 43.33 187

Whole-genome sequencing using Illumina and Nanopore sequencing

For Illumina sequencing, we extracted the bacterial genomic DNA using the standard cetyltrimethylammonium bromide method. The DNA paired-end libraries were prepared using a Nextera XT library preparation kit (Illumina), following the manufacturer’s protocol and sequenced using the MiSeq Illumina platform and MiSeq reagent kit v.2 (300 cycles; Illumina). For Nanopore sequencing, genomic DNA was extracted using NucleoBond AXG100 columns and the NucleoBond buffer set III (Macherey-Nagel). Long-read libraries were prepared with the ligation sequencing kit (SQK-LSK109; Oxford Nanopore Technologies) and sequenced using R9.4.1 flow cells on a GridION platform. The fast5files were base-called using Guppy v.4.0.1 with the settings configdna_r9.4.1_450bps_hac and qscore_filtering.

Genome assembly

We combined Illumina short reads with Nanopore long reads to produce hybrid complete genome assemblies of genomes. The raw Illumina sequencing data were quality assessed using fastp v.0.20.2 [27] with default settings. The Nanopore reads were de novo assembled using Flye v.2.7 [28] with the settings nano-raw and genome-size 7M. The Illumina reads and Nanopore reads were aligned to the assemblies using minimap2 v.2.17 [29] with the settings ax sr for Illumina reads and ax map-ont for Nanopore reads, and the resulting BAM files were used to improve the accuracy of assemblies by HyPo v.1.0.2 [30] with the settings s 6 m and c 50. Assembly quality was evaluated using Quast v.5.0.2 [31] with default options, and assembly completeness was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.4.1.4 [32]. The circular topology of the chromosomes and plasmids was confirmed using Bandage v.0.8.1 [33].

Genome annotation

The genome assemblies were annotated on the Rapid Annotations using Subsystems Technology (RAST) server v.2.0 (http://rast.nmpdr.org/) [34-36] and Prokka v.1.13.3 [37, 38] with the settings force, rfam, kingdom bacteria, gram neg, genus , usegenus. Prokka predicted the protein-coding sequences (CDSs), tRNAs and rRNAs found in the genomes and RAST was used for further annotation of all predicted prophages and GIs. We used the virulence factor database (VFDB) for prediction of virulence factors (http://www.mgc.ac.cn/VFs/main.htm) [39]. The antimicrobial resistance (AMR) gene family and resistance mechanisms were identified by the Resistance Gene Identifier (RGI) (https://card.mcmaster.ca/analyze/rgi), which is based on the comprehensive antibiotic resistance database (CARD) [40].

Mobile genetic elements

Putative GIs and prophages were predicted using IslandViewer4 (http://www.pathogenomics.sfu.ca) [41] and PHAge Search Tool Enhanced Release (PHASTER) (https://phaster.ca/) [42]. Furthermore, ICEs were identified by using the ICEfinder web-based tool in ICEberg v.2.0 (https://db-mml.sjtu.edu.cn/ICEfinder/ICEfinder.html) [43].

Comparative whole-genome analysis

Average nucleotide identity (ANI)

ANI between the genomes was measured using the JSpeciesWS server (http://jspecies.ribohost.com/jspeciesws/) [44] based on blast+ (ANIb). We adopted a cutoff value of >95 % to delineate species boundaries.

Phylogenetic tree reconstruction

Two phylogenetic tree analysis methods for were used. A phylogenetic orthology inference method was used to describe the genetic relatedness between and other vibrios. We downloaded the translated CDSs of 105 (102 and three species) reference genomes from the NCBI RefSeq database (accessed 9 December 2021; Table S1). The protein sequences were clustered by OrthoFinder2 v.2.5.1 [45], yielding 687 single-copy orthologues unanimously conserved among all species. The protein sequences were aligned by MAFFT v.7.490 [46], and the multiple sequence alignments were used for maximum-likelihood phylogenetic analysis using IQ-TREE2 v.2.1.4-beta [47]. Another whole-genome proteome-based phylogenetic tree was reconstructed to describe the genetic relatedness among strains using the Type Strain Genome Server (TYGS) (https://tygs.dsmz.de) [48, 49]. The genomes of four strains were retrieved from the NCBI database (Table S2) and all sequences were submitted to the TYGS server in fasta format. The phylogenetic tree was built using FastME v.2.1.6.1 [50] from whole proteome-based Genome blast Distance Phylogeny (GBDP), and the tree was rooted at the midpoint [51] and visualized by iTOL v.5 (https://itol.embl.de/) [52].

Prediction of clusters of orthologous groups (COGs)

COGs are clusters of genes in different species that evolve from a common ancestral gene through speciation events [53]. These genes can be used to assess the evolutionary history of given genes from common ancestors. The online program OrthoVenn with default parameters E-value 1e-2 and inflation value 1.5 (https://orthovenn2.bioinfotoolkits.net/task/create) [54] was used to compare and annotate the COGs, and perform Gene Ontology (GO) enrichment analysis for the assembled genomes of the studied strains.

Whole-genome comparative visualization

Visualization of genome comparisons was conducted to determine the genotypic differences between strains using blast Ring Image Generator (BRIG) [55]

Results and discussion

General genomic characteristics of strains

In this study, we investigated the genomic features of five strains isolated from different shrimp farms in Japan to gain more insight into the pathogenesis and evolution of this important shrimp pathogen. We sequenced a total of five strains using Illumina and Nanopore platforms, yielding chromosome-level hybrid assemblies. The final assemblies of each strain had two circular DNA chromosomes and four strains had one circular plasmid. BUSCO analysis yielded 99.6 % of BUSCO completeness, indicating our genome assemblies were of high quality. Chromosome I (ChrI) sizes ranged from 4.1 to 4.3 Mb, the GC content ranged from 43.9 to 44.1 %, and the number of predicted CDSs ranged from 3620 to 3782. ChrII sizes ranged from 2.2 to 2.4 Mb, the GC content ranged from 43.5 to 43.8 %, and the number of predicted CDSs ranged from 1992 to 2273. This study confirmed that the genome possessed two circular DNA chromosomes, which is consistent with the results from other species [56]; the sizes of both chromosomes were relatively constant among the genomes. Strain TUMSAT-OK1 had the largest genome, whereas strain IFO 15640T harboured the largest plasmid among the strains analysed in this study. The majority of recognizable genes for essential cell functions (e.g. DNA replication, RNA metabolism, biosynthetic pathways and membrane transport) and pathogenicity (e.g. iron acquisition, chemotaxis and motility, endotoxin, and immune evasion) are located on ChrI. In contrast, ChrII contains a larger percentage (45–49 %) of hypothetical genes compared with ChrI (30–32 %), as is the case in other vibrios [57, 58]. Plasmid sizes ranged from 150, 127 to 285, 012 bp, the GC content ranged from 41.7 to 43.3 %, and the predicted CDSs ranged from 185 to 294. Details of the general genomic features are summarized in Table 1. The plasmid sequences were subjected to NCBI blast searches against the NCBI non-redundant nucleotide database and showed high similarities to sp. 04Ya090-plasmid pAQU2 and AM7-plasmid pAM7. Comparison of whole genomes and plasmids of the five strains was visualized using BRIG (Figs. 1 and 2). Genome-wide comparison of five strains. Chromosome I (a) and chromosome II (b) of strain IFO 15640T used as central references. From inner to outer rings, the first ring indicates the reference genome, the second and third rings are the GC content and GC skew, and the fourth to seventh rings indicate the other genomes in this study. The clockwise-arrows in the remaining rings indicate the presence, absence and location of virulence factors and genomic islands (GIs) of interest among the five genomes. The virulence factors include: RtxB, repeats-in-toxin B; TLH, thermolabile haemolysin; T4SS and T6SS, type IV and VI secretion systems; and AAI/SCI-II, T6SS-3 gene cluster in enteroaggregative . The GIs include GIs encoding MDR (multidrug resistance). The figure was produced using the blast Ring Image Generator (BRIG). Circular comparison of plasmids mapped against respective reference plasmids. (a) Alignments of plasmids from TUMSAT-OK1 and TUMSAT-OK2 against sp. 04Ya090 plasmid pAQU2, and (b) alignments of plasmids from IFO 15640T and IFO 15642 against AM7 plasmid pAM7. The different colours refer to the different plasmids, GC skew and GC content, and are listed in the key. The figure was generated using the blast Ring Image Generator (BRIG). ANI has been considered the best alternative [59, 60] to DNA–DNA hybridization (gold standard) for species delineation at the genomic level [59]. We used ANI to determine the species boundaries and measure the genetic distance between genomes. ANI values were calculated using the five strains and the reference strain ( CAIM 285T). All ANI values were higher than the threshold value of 95 % for bacterial delineation [59], which indicated that these strains belong to the same species. Based on ANI values, strain IFO 15640T (99.99 %) was the most closely related to strain CAIM 285T, followed by strain IFO 15641 (99.91 %) and strain IFO 15642 (99.89 %). With an ANI value of 99.6 %, strains TUMSAT-OK1 and TUMSAT-OK2 were more distantly related to strain CAIM 285T. The absence of certain genomic regions in strain IFO 15641 (Fig. 1), despite its higher ANI than strain IFO 15642, can be explained by the large plasmid missing in strain IFO15641 as well as the chromosomal GIs and/or prophages. Also, strain IFO15641 had a larger ChrI than strain IFO15640T, seemingly inflated by unique GIs.

Fig. 1.

Genome-wide comparison of five strains. Chromosome I (a) and chromosome II (b) of strain IFO 15640T used as central references. From inner to outer rings, the first ring indicates the reference genome, the second and third rings are the GC content and GC skew, and the fourth to seventh rings indicate the other genomes in this study. The clockwise-arrows in the remaining rings indicate the presence, absence and location of virulence factors and genomic islands (GIs) of interest among the five genomes. The virulence factors include: RtxB, repeats-in-toxin B; TLH, thermolabile haemolysin; T4SS and T6SS, type IV and VI secretion systems; and AAI/SCI-II, T6SS-3 gene cluster in enteroaggregative . The GIs include GIs encoding MDR (multidrug resistance). The figure was produced using the blast Ring Image Generator (BRIG).

Virulence factors and AMR profiles

Identification of virulence factors is essential for estimating the pathogenicity of a given bacterium because these factors are directly responsible for pathogenic bacteria infecting and damaging the host [61]. VFDB revealed putative virulence factors in the genomes, including genes involved in: adherence; anti-phagocytosis; chemotaxis and motility; iron uptake and acquisition; QS; type II, IV, and VI secretion system proteins; and endotoxins and toxins [repeats-in-toxin (RTX) and TLH]. These putative virulence factors are well known in the genus [62], and many of them were closely related to those found in other species such as O1 biovar El Tor str. N16961, O395, RIMD 2210633, CMCP6 and YJ016. They were also closely related to those found in other bacterial taxa such as , , , , , , , , , and (Table 2).

Table 2.

Potential virulence factor profiles of the five strains predicted using the virulence factor database (VFDB)

Classification	Virulence factors	Related genes	Location
Adherence	Accessory colonization factor	acfB	Chromosome II
	Mannose-sensitive HA	mshA–N	Chromosome I
	Type IV pilus	pilA, pilB, pilC, pilD	Chromosome I
	Flp type IV pili ( Aeromonas )†	flpC, flpF, flpH	Chromosome I
	Tad locus ( Haemophilus )‡	tadA	Chromosome II
	Hsp60 ( Legionella )§	htpB	Chromosome II
Antiphagocytosis	Capsular polysaccharide	wbfB, wbfU, wbfY, wbfV/wcvB	Chromosome I
Antiphagocytosis	Capsular polysaccharide	cpsA, cpsC	Chromosome II
Chemotaxis and motility	Flagella	cheA, cheB, cheR, cheV, cheW, cheY, cheZ, filM, flaA, flaC, flaD, flaE, flgA, flgB, flgC, flgD, flgE, flgF–N, flhA, flhB, flhF, flhG, fliA, fliD, fliE, fliF, fliG, fliH, fliL, fliJ, fliL, fliN, fliO, flip, fliQ, fliR, fliS, flrA, flrB, flrc, motA, motB, motX, motY	Chromosome I
				cheA	Chromosome II
				Iron uptake	Enterobactin receptors	vctA	Chromosome II
					Periplasmic binding protein dependent	vctC, vctD, vctG, viuC, viuD, viuG, viuP	Chromosome II
		Acinetobactin ( Acinetobacter )	basG		Chromosome I
Pyochelin ( Pseudomonas )	pchB	Chromosome I
Iron/magnesium transport ( Escherichia coli )	sitA, sitB, sitC, sitD	Chromosome II
Vibriobactin biosynthesis	vibA, vibB, vibC, vibE, vibF	Chromosome II
Vibriobactin utilization	viuA, viuB	Chromosome II
Quorum sensing	Cholerae autoinducer-1	cqsA	Chromosome II
Quorum sensing	Cholerae autoinducer-2	luxS	Chromosome I
Secretion system	EPS type II secretion system¶	epsC, epsE, epsF, epsG, epsH, epsI, epsJ, epsK, epsL, epsM, epsN, gspD	Chromosome I
	VAS effector protein	hcp-2, vgrG-2	Chromosome II
	VAS T6SS**	vasA, vasB, vasC, vasD, vasE, vasF, vasG, vasH, vasJ, vasK	Chromosome II
	VAS T6SS**	vasK	Chromosome I
	Hcp secretion island1 encoded T6SS ( Pseudomonas )††	clpV1	Chromosome I
	T4SS effectors ( Coxiella )‡‡		Chromosome I
	AAI/SCI-II T6SS ( Escherichia coli )§§	aaiL	Chromosome II*
	T6SS ( Aeromonas )		Chromosome II
Toxin	Repeats-in-toxin (RTX)	rtxB	Chromosome II
Toxin	Thermolabile haemolysin (TLH)	tlh	Chromosome II
Cell surface component	Trehalose-recycling ABC transporter ( Mycobacterium )	sugC	Chromosome I Chromosome II
Endotoxin	LOS ( Haemophilus )¶¶	kdkA, IgtF, lpxA, lpxD, lpxK, msbA, waaQ	Chromosome I
Immune evasion	LPS (glycosylation) (S higella )***	gtrB	Chromosome I
Iron acquisition	(Bacillibactin) ( Bacillus )	entE	Chromosome I
	O-Ag ( Yersinia )	wcaG	Chromosome I
Regulation	Two-component system ( Acinetobacter )	bfmR	Chromosome I

*Missing from TUMSAT-OK1.

†Flp, fimbrial low-molecular weight protein.

‡Tad, tight adherence.

§Hsp, heat shock protein.

¶Eps, extracellular protein secretion.

**T6SS, type VI secretion system.

††Hcp, haemolysin co-regulated protein.

§§AAI/SCI-II, T6SS-3 gene cluster in enteroaggregative Escherichia coli.

¶¶LOS, lipooligosaccharide.

***LPS, lipopolysaccharide.

Potential virulence factor profiles of the five strains predicted using the virulence factor database (VFDB) Classification Virulence factors Related genes Location Adherence Accessory colonization factor acfB Chromosome II Mannose-sensitive HA mshA–N Chromosome I Type IV pilus pilA, pilB, pilC, pilD Chromosome I Flp type IV pili ( )† flpC, flpF, flpH Chromosome I Tad locus ( )‡ tadA Chromosome II Hsp60 ( )§ htpB Chromosome II Antiphagocytosis Capsular polysaccharide wbfB, wbfU, wbfY, wbfV/wcvB Chromosome I cpsA, cpsC Chromosome II Chemotaxis and motility Flagella cheA, cheB, cheR, cheV, cheW, cheY, cheZ, filM, flaA, flaC, flaD, flaE, flgA, flgB, flgC, flgD, flgE, flgF–N, flhA, flhB, flhF, flhG, fliA, fliD, fliE, fliF, fliG, fliH, fliL, fliJ, fliL, fliN, fliO, flip, fliQ, fliR, fliS, flrA, flrB, flrc, motA, motB, motX, motY Chromosome I cheA Chromosome II Iron uptake Enterobactin receptors vctA Chromosome II Periplasmic binding protein dependent vctC, vctD, vctG, viuC, viuD, viuG, viuP Chromosome II Acinetobactin ( ) basG Chromosome I Pyochelin ( ) pchB Chromosome I Iron/magnesium transport ( ) sitA, sitB, sitC, sitD Chromosome II Vibriobactin biosynthesis vibA, vibB, vibC, vibE, vibF Chromosome II Vibriobactin utilization viuA, viuB Chromosome II Quorum sensing Cholerae autoinducer-1 cqsA Chromosome II Cholerae autoinducer-2 luxS Chromosome I Secretion system EPS type II secretion system¶ epsC, epsE, epsF, epsG, epsH, epsI, epsJ, epsK, epsL, epsM, epsN, gspD Chromosome I VAS effector protein hcp-2, vgrG-2 Chromosome II VAS T6SS** vasA, vasB, vasC, vasD, vasE, vasF, vasG, vasH, vasJ, vasK Chromosome II vasK Chromosome I Hcp secretion island1 encoded T6SS ( )†† clpV1 Chromosome I T4SS effectors ( )‡‡ Chromosome I AAI/SCI-II T6SS ( )§§ aaiL Chromosome II* T6SS ( ) Chromosome II Toxin Repeats-in-toxin (RTX) rtxB Chromosome II Thermolabile haemolysin (TLH) tlh Chromosome II Cell surface component Trehalose-recycling ABC transporter ( ) sugC Chromosome I Chromosome II Endotoxin LOS ( )¶¶ kdkA, IgtF, lpxA, lpxD, lpxK, msbA, waaQ Chromosome I Immune evasion LPS (glycosylation) (S )*** gtrB Chromosome I Iron acquisition (Bacillibactin) ( ) entE Chromosome I O-Ag ( ) wcaG Chromosome I Regulation Two-component system ( ) bfmR Chromosome I *Missing from TUMSAT-OK1. †Flp, fimbrial low-molecular weight protein. ‡Tad, tight adherence. §Hsp, heat shock protein. ¶Eps, extracellular protein secretion. **T6SS, type VI secretion system. ††Hcp, haemolysin co-regulated protein. §§AAI/SCI-II, T6SS-3 gene cluster in enteroaggregative Escherichia coli. ¶¶LOS, lipooligosaccharide. ***LPS, lipopolysaccharide. The putative virulence factors included virulence-related genes shared by other spcies, including the tlh gene, which encodes thermolabile haemolysin toxin with phospholipase activity [63]. The V. penaeicida tlh gene was present on ChrI, and this was also the case in [64] and 345 [65]. tlh is also considered a species-specific marker for [66, 67]. RTX are toxins with cytolytic activity that are produced by a wide range of pathogenic Gram-negative bacteria and are transported from the cytoplasm to the surface of the cell by transport proteins encoded by rtxB and rtxD genes [68]. In the current study, the rtxB gene was detected in ChrII of . Similarly, previous studies have shown that the rtxB gene was detected in V. alginolyticus, V. metschnikovii, V. anguillarum and [69]. Moreover, The genomes harboured the pilA, B, C and D genes encoding the components of the type IV pilus, which mediates bacterial motility on solid surfaces, host–cell adhesion, bacteriophage adsorption, microcolony formation and transformation [70]. This result is in accordance with that of a recent study, which identified a type IV pilus in the genome of causing disease in Penaeus vannamei [71]. QS gives bacteria the ability to control several critical processes, including virulence factor secretion, antibiotic production, biofilm formation, motility, bioluminescence, development of genetic competence and sporulation [72]. In this study, QS-related genes, such as CqsA and LuxS, genes of cholera autoinducer 1 and 2, respectively, were distributed in both chromosomes of the genome. CqsA and LuxS genes have also been recently reported in associated with disease outbreak in shrimp [73]. Extracellular protein secretion (EPS) type II secretion system is mediated by the genes epsC–epsN, which were reported to be essential for secretion of the cholera toxin (the main virulence factor of ) [74], and were located on ChrI of the . Furthermore, other secretion systems, including types IV and VI (T4SS and T6SS), were also predicted. The T4SS and T6SS are capable of delivering the virulence factors into adjacent eukaryotic cells, but only T6SSs have been reported to inject lethal toxins into prokaryotic cells [75]. The T4SS and T6SS components were detected on ChrI and ChrII, respectively. T6SS-3 gene cluster (also called aai or sci-2), which has been identified in the genome of enteroaggregative (strain 17-2) [76, 77], was found in the genomes of four strains. In summary, our results demonstrate that shares many of the virulence factors with other species, although the possibility remains that as-yet-unknown virulence factors exist, which may be annotated as hypothetical proteins in our current genome annotation scheme. Tetracycline resistance is a common phenotypic characteristic of species that has been reported in environmental samples and Asian aquaculture sectors [78-80]. CARD analysis identified genes that mediate tetracycline resistance, including tetB and tetR on the plasmids harboured by strains IFO 15640Tand IFO 15642. Similarly, a previous study identified the tetB gene-encoded plasmid in strains associated with AHPND from shrimp [81].

Identification of MGEs

The PHASTER web server was used to identify and annotate putative prophage sequences in the genomes. The predicted prophages are summarized in Table 3 and the detailed features are listed in Table S3. The putative prophages PHAGE_Bacill_vB_BtS_BMBtp14 and PHAGE_Salmon_118970_sal3 were only found in TUMSAT-OK2, whereas PHAGE_Vibrio_12B12, PHAGE_Entero_mEp235 and PHAGE_Escher_ArgO145 were found in all five strains. We detected a total of seven intact prophage sequences in the chromosomes; PHAGE_Vibrio_12B12 belongs to the family Myoviridae, whereas PHAGE_Escher_ArgO145 and PHAGE_Shigel_POCJ13 belong to the family Podoviridae as per the Virus-Host Database [82]. In other studies, prophages belonging to the family Podoviridae were detected in [83] and [84]. The majority of CDSs of the predicted prophages encoded unknown hypothetical proteins and it is unclear whether the prophages contribute to the virulence of this bacterium.

Table 3.

Predicted prophages in the genomes of the strains

+, One copy existed; ++, two copies existed; –, absent.

Predicted prophage	IFO 15640^T	IFO 15641	IFO 15642	TUMSAT-OK1	TUMSAT-OK2
PHAGE_Vibrio_12B12	+	++	+	++	++
PHAGE_Sulfit_pCB2047_A	+	−	−	+	−
PHAGE_Entero_mEp235	+	+	+	+	+
PHAGE_Escher_ArgO145	+	+	+	+	++
PHAGE_Sulfit_pCB2047_C	−	+	+	−	+
PHAGE_Pseudo_phi2	−	+	−	+	+
PHAGE_Escher_TL_2011b	−	−	⁺	⁺	−
PHAGE_Shigel_SfIV	−	−	−	+	+
PHAGE_Shigel_POCJ13	−	−	−	+	−
PHAGE_Bacill_vB_BtS_BMBtp14	−	−	−	−	+
PHAGE_Salmon_118970_sal3	−	−	−	−	+

Predicted prophages in the genomes of the strains +, One copy existed; ++, two copies existed; –, absent. Predicted prophage IFO 15640T IFO 15641 IFO 15642 TUMSAT-OK1 TUMSAT-OK2 PHAGE_Vibrio_12B12 + ++ + ++ ++ PHAGE_Sulfit_pCB2047_A + − − + − PHAGE_Entero_mEp235 + + + + + PHAGE_Escher_ArgO145 + + + + ++ PHAGE_Sulfit_pCB2047_C − + + − + PHAGE_Pseudo_phi2 − + − + + PHAGE_Escher_TL_2011b − − − PHAGE_Shigel_SfIV − − − + + PHAGE_Shigel_POCJ13 − − − + − PHAGE_Bacill_vB_BtS_BMBtp14 − − − − + PHAGE_Salmon_118970_sal3 − − − − + GIs are typically recognized as large segments of genomic DNA that range in size from 10 to 200 kb, and GIs smaller than 10 kb are known as genomic islets [85]. In total, we detected 86 GIs in the genomes using IslandViewer4. A total of 104 transposase genes were predicted, the majority of which were classified into the IS_3 family. The putative GIs and their features are listed in Table S4. Strain IFO 15640T had the highest number of regions encoding GIs, suggesting that this strain has experienced numerous HGT events mediated by GIs. GIs encoding multidrug resistance were identified in the genomes. had another large putative GI that did not contain any virulence factors but contained genes for essential cell functions. In contrast to V. penaeicida, large GIs encoding virulence factors have been found in and [86]. ICEs are one of the MGEs that can integrate into the host chromosome and shape the behaviour of bacterial communities [87]. ICE-finder predicted one putative T4SS-type ICE of 86,243 bp in ChrI of strain IFO 15641, indicating this ICE may facilitate transfer of the T4SS, which is involved in the pathogenicity of most Gram-negative bacteria [88]. Moreover, three ICEs were detected in the plasmids of strains IFO 15642, TUMSAT-OK1 and TUMSAT-OK2, which harboured essential genes for the ability of plasmids to perform conjugative transfer [e.g. transfer (tra), T4SS, type IV coupling protein and relaxase genes] [89]. This result indicated that these plasmids are conjugative. A T4SS-mediated ICE was also detected on ChrI of strain IFO 15641. Our data indicated that various MGEs, such as plasmids, prophages, GIs and T4SS-type ICE, were detected in the five genomes; this suggests strongly that is capable of acquiring new traits, including virulence and antibiotic resistance genes via HGT by these MGEs which have a strong influence on the genomic evolution of vibrios [22].

Phylogenetic analysis

Two phylogenetic trees were reconstructed to identify the species closely related to and to describe the genetic relatedness among strains. Phylogenetic analysis (Fig. 3a) was based on sequences coding for 687 single-copy conserved genes found in the 105 species (102 and three species), which was reconstructed using the maximum-likelihood method. The resulting phylogenetic tree showed that the 102 species were divided into 16 clades and belongs to the clade of Nigripulchritudo. This clade was located close to the clades Halioticoli and Mediterranei, but distant from the clades Rumoiensis, Splendidus, Vulnificus and Harveyi.

Fig. 3.

Phylogenetic tree analysis. (a) Phylogenetic tree of 102 species, and three species used as the out-group. A total of 687 single-copy conserved genes (237 491 amino acids) were used to build a maximum-likelihood phylogenetic tree by IQ-TREE v2.1.4-beta (substitution model: LG+F+I+G4 [91], 1000 UFBoot [92] replicates). The UFBoot support value was 100 % unless indicated beside the corresponding node. Species clades within the genus were defined based on Sawabe et al. [93], with minor modifications to reconcile paraphyly. (b) Phylogenetic tree of strains based on whole-genome proteome data and two strains of used as the out-group. Numbers at each node indicate GBDP pseudo-bootstrap support values from 100 replications. Additional phylogenetic analysis (Fig. 3b) was conducted based on whole-genome proteome data of genomes. The resulting tree split the strains into two clusters: cluster I consisted of strains TUMSAT-OK1 and TUMSAT-OK2, while cluster II consisted of strains IFO 15640T, IFO 15641, IFO 15642 and CAIM 285T. Similar to the ANI results, CAIM 285T and IFO 15640T were the closest relatives, whereas strains TUMSAT-OK1 and TUMSAT-OK2 were most closely related to each other and most distant from the other strains. Our results showed that ANI is an effective tool for accurately measuring the genetic distance between genomes. TUMSAT-NU1 was distantly related to all other strains with substantial sequence divergence, suggesting that there may be unexplored divergent strains circulating in aquaculture environments. Therefore, characterization of the genetic diversity will be important for tracking the emergence of novel pathogenic strains.

Orthologous gene analysis

Orthologous genes usually retain functions similar to those of their ancestral genes [90]. COG analysis identified a total of 5110 orthologous clusters that were shared among the six strains (Table S5). Strains TUMSAT-OK1 and TUMSAT-OK2 had the highest number of shared gene clusters (n=264) (Table S6), followed by strains CAIM 285T and IFO 15640T (n=48) (Table S7), reflecting their close relationship to each other in the phylogenetic analysis, compared with the other strains. A total of 28 strain-specific clusters were found among these strains. strain CAIM 285T had the highest number of unique gene clusters (n=8), of which one gene cluster was annotated as transketolase activity GO:0004802, whereas IFO 15640T had the lowest number of unique gene clusters (n=1), which was annotated as transposition GO:0032196 (Fig. 4). GO enrichment analysis revealed that the following GO terms were enriched among the shared gene clusters: DNA restriction-modification system GO:0009307, transposition GO:0032196, conjugation GO:0000746, nitrate assimilation GO:0042128 and transposition, DNA-mediated GO:0006313.

Fig. 4.

Venn diagram showing the distribution of shared and unique orthologous gene clusters among six strains as visualized by OrthoVenn2. A total of 5110 shared clusters of orthologous groups were identified in these six strains. In summary, this study presents a comparative genomic analysis of five strains and identified common features of these strains. The information regarding virulence factors (adherence, anti-phagocytosis, motility, iron uptake and acquisition, QS, secretion systems, and toxins), which was obtained via the VFDB, will enhance our understanding of pathogenesis and will guide vaccine development. In addition, various MGEs, such as plasmids, prophages, GIs and a T4SS-type ICE, were identified, indicating that this bacterium is capable of acquiring new genetic information through HGT, which has a significant role in bacterial evolution and affects pathogenesis. Our study provides valuable information for understanding the genetic features, pathogenicity, phylogeny and evolution of . Click here for additional data file.

82 in total

1. Assembly of long, error-prone reads using repeat graphs.

Authors: Mikhail Kolmogorov; Jeffrey Yuan; Yu Lin; Pavel A Pevzner
Journal: Nat Biotechnol Date: 2019-04-01 Impact factor: 54.908

2. Prokka: rapid prokaryotic genome annotation.

Authors: Torsten Seemann
Journal: Bioinformatics Date: 2014-03-18 Impact factor: 6.937

Review 3. Pathogenicity islands and the evolution of microbes.

Authors: J Hacker; J B Kaper
Journal: Annu Rev Microbiol Date: 2000 Impact factor: 15.500

4. Purification and characterization of a lecithin-dependent haemolysin from Escherichia coli transformed by a Vibrio parahaemolyticus gene.

Authors: S Shinoda; H Matsuoka; T Tsuchie; S Miyoshi; S Yamamoto; H Taniguchi; Y Mizuguchi
Journal: J Gen Microbiol Date: 1991-12