Literature DB >> 35786440

Comparative genomics of Nocardia seriolae reveals recent importation and subsequent widespread dissemination in mariculture farms in the South Central Coast region, Vietnam.

Cuong T Le^1,2, Erin P Price^1,3, Derek S Sarovich^1,3, Thu T A Nguyen⁴, Daniel Powell¹, Hung Vu-Khac⁵, D İpek Kurtböke¹, Wayne Knibb¹, Shih-Chu Chen⁶, Mohammad Katouli^1,7.

Abstract

Between 2010 and 2015, nocardiosis outbreaks caused by Nocardia seriolae affected many permit farms throughout Vietnam, causing mass fish mortalities. To understand the biology, origin and epidemiology of these outbreaks, 20 N. seriolae strains collected from farms in four provinces in the South Central Coast region of Vietnam, along with two Taiwanese strains, were analysed using genetics and genomics. PFGE identified a single cluster amongst all Vietnamese strains that was distinct from the Taiwanese strains. Like the PFGE findings, phylogenomic and SNP genotyping analyses revealed that all Vietnamese N. seriolae strains belonged to a single, unique clade. Strains fell into two subclades that differed by 103 SNPs, with almost no diversity within clades (0-5 SNPs). There was no association between geographical origin and subclade placement, suggesting frequent N. seriolae transmission between Vietnamese mariculture facilities during the outbreaks. The Vietnamese strains shared a common ancestor with strains from Japan and China, with the closest strain, UTF1 from Japan, differing by just 220 SNPs from the Vietnamese ancestral node. Draft Vietnamese genomes range from 7.55 to 7.96 Mbp in size, have an average G+C content of 68.2 % and encode 7 602-7958 predicted genes. Several putative virulence factors were identified, including genes associated with host cell adhesion, invasion, intracellular survival, antibiotic and toxic compound resistance, and haemolysin biosynthesis. Our findings provide important new insights into the epidemiology and pathogenicity of N. seriolae and will aid future vaccine development and disease management strategies, with the ultimate goal of nocardiosis-free aquaculture.

Entities: Chemical

Keywords: Nocardia seriolae; aquaculture; fish infection; fish mortality; genomics; infectious disease; nocardiosis; permit fish; trachinotus

Mesh：

Year: 2022 PMID： 35786440 PMCID： PMC9455698 DOI： 10.1099/mgen.0.000845

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

Data Summary

Sequence read files (SRX10462095, SRX10462096, SRX10462097, SRX10462093, SRX10462094, SRX10462092, SRX10462098) and the draft genome assemblies of all seven Vietnamese strains are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive under BioProject PRJNA551736 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA551736 and https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/14550/). , the aetiological agent of a lethal granulomatous disease known as fish nocardiosis, has caused high fish mortalities to global aquaculture sectors in recent decades, particularly in Asia and the Americas. This pathogen possesses a highly conserved genome and minimal genetic diversity, which limits the discriminatory power of existing genotyping techniques such as PFGE, leading to insufficient resolution among genetically related strains. To overcome resolution issues using genotyping methods such as PFGE, we employed whole-genome sequencing (WGS) to create highly resolved time-calibrated phylogenies from all available genomes (n=20), including seven newly sequenced strains we retrieved from Vietnamese fish farms, where nocardiosis outbreaks are increasingly imposing a significant commercial burden. This comprehensive comparative genomic analysis provides the first global phylogenetic analysis of strains, allowing the elucidation of the temporal and spatial dynamics of this pathogen, particularly in Vietnam. Using the comparative genomic data, we developed two SNP-based genotyping assays for differentiating Vietnamese from non-Vietnamese strains, and for distinguishing between the two Vietnamese clades, offering an inexpensive tool for rapidly discriminating and tracing the origin of new nocardiosis outbreaks. Our WGS and SNP assays identified the rapid and undetected spread of throughout South Central Coast aquaculture facilities, reflecting the need for better surveillance measures for this emerging pathogen. Finally, our genomic analysis also identified multiple virulence factors and antimicrobial resistance genes, which provide valuable information for better understanding the pathogenicity and persistence of this important aquaculture pathogen.

Introduction

The genus Trachinotus, of the family Carangidae, comprises a group of marine, medium-sized, migratory, pelagic finfish that are widely distributed in subtropical and tropical waters worldwide [1, 2]. Many members of the genus, such as T. carolinus, T. blochii, T. ovatus, and T. falcatus are of great economic importance for fisheries and aquaculture sectors in America and Asia due to their high-quality meat, fast growth, high market price, and strong adaptability to a variety of captive environments [3-7]. In Asia, the farming of permit fish, particularly the snub nose permit, T. falcatus, has commercially taken place in ponds, raceways, and floating sea cages in both brackish and sea waters. Since 2010, Asian mariculture farms have produced over 2 million tonnes of fish meat, significantly contributing to the food security, poverty alleviation, and economic growth of the region [8]. However, the shortage of quality seed stock and the risk of fish disease outbreaks in several countries are key obstacles and challenges for the sector’s sustainable development. T. falcatus fingerlings were first imported into Vietnam from Taiwan and China in the 2000s and have quickly gained popularity, with permit fish now the third largest group of commercially cultured marine fish after seabass and grouper. However, high mortality rates of T. falcatus weighing between 5 and 350 g (6–45 cm in length) emerged in 2010 during an epizootic event that affected sea cage farms in Khánh Hòa province, in the South Central Coast region of Vietnam. Since this initial outbreak, large-scale outbreaks have occurred at several other farming sites in southern and central parts of the country [9, 10]. Infected fish showed clinical signs of nocardiosis such as lethargy, skin blisters, ulcers, and multiple yellowish to whitish nodules affecting both internal and external organs. Based on analyses of 16S rRNA gene sequences and biochemical characteristics, the bacterial pathogen was confirmed as the causative agent [10]; however, the origin of affecting Vietnamese permit fish farms has not yet been identified. is a Gram-positive, branching, filamentous intracellular bacterium of the family that was initially described as N. kampachi in farmed yellowtail, Seriola quinqueradiata [11], following large outbreaks in Mie Prefecture, Japan. An estimated loss of approximately 260 tonnes of cultured yellowtails due to the disease was recorded in 1989 [12]. Nocardiosis has also impacted several other important fish species within the Japanese aquaculture industry such as amberjack (Seriola dumerili), Japanese flounder (Paralichthys olivaceus), and chub mackerel (Scomber japonicas). has subsequently been documented in Taiwan, China, Korea, USA, and Mexico, where high mortalities and associated economic losses due to nocardiosis have been reported in freshwater and marine fish species in both cultured and wild populations [13-23]. Despite causing significant economic losses in fish aquaculture worldwide, there are currently no effective measures against nocardiosis. Five complete and eight draft genome sequences were publicly available prior to our study, representing isolates retrieved from Japan, South Korea, and China [24-28]. These genomes have provided important insights into epidemiology, transmission, pathogenesis, and infection control strategies; however, isolates from other nocardiosis-prevalent regions such as Taiwan, USA, Mexico, and Vietnam have not yet been examined, leaving major gaps in our understanding of this devastating infectious disease. In the current study, we sequenced the genomes of seven isolates isolated from different permit fish farm locations across Vietnam and compared them with the 13 previously genome-sequenced isolates, allowing a comparison of isolates spanning a decade in time and from a variety of sources and geographical locations. Using this information, we developed two novel SNP-based PCR assays to rapidly differentiate Vietnam and non-Vietnam strains, and strains representing the two Vietnamese clades. We also characterized potential virulence factors and antimicrobial/toxin resistance determinants to gain insights into pathogenicity and survival mechanisms. Finally, we functionally annotated our genomes to determine whether differences in gene content might contribute to physiological variability among isolates.

Methods

Bacterial strains

Due to a ban on culture importation into Australia, all live culture work was carried out in laboratories at the Institute for Aquaculture, Nha Trang University, Vietnam (for Vietnamese strains) and the Department of Veterinary Medicine, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung, Taiwan (for Taiwanese strains). Twenty-two strains isolated from fish were examined in this study, comprising 20 from Vietnam and two from Taiwan. Vietnamese strains were isolated from cultured permit fish (T. falcatus) (31.0–85.8 g) during nocardiosis outbreaks occurring between 2014 and 2015 in four provinces (Phú Yên, Khánh Hòa, Ninh Thuận, and Vũng Tàu) in the South Central Coast region, and the Taiwanese strains were isolated from largemouth bass (Micropterus salmoides) and mullet (Mugil cephalus) in 2007 (Fig. 1 and Table 1). Isolates were confirmed as based on morphological observations, Ziehl-Neelsen staining (Fig. 2), 16S rRNA gene sequencing, and biochemical characteristics. The 20 Vietnamese strains were subject to PFGE analyses, of which seven isolates were selected for whole-genome sequencing (WGS) to enable more detailed genetic analyses. All 22 isolates were tested using our SNP genotyping assays.

Fig. 1.

Four Vietnamese provinces where isolates were collected from infected permit fish (Trachinotus falcatus).

Table 1.

strains collected in this study, their AseI and XbaI PFGE profiles, and their SNP genotypes

Country	Strain	Fish species	Host tissue	Origin	Collection date	AseI	XbaI	SNP genotype*
Country	Strain	Fish species	Host tissue	Origin	Collection date	AseI	XbaI	SNP genotype*	Taiwan	96127	Micropterus salmoides	Unknown	Taiwan	2007	A1	X1	S1
Taiwan	96994	Mugil cephalus	Unknown	Taiwan	2007	A4	X5	S1
Vietnam	KH_11	Trachinotus falcate	Muscle	Khánh Hòa, Vietnam	March 2014	NsA2	NsX3	S2C1
Vietnam	KH_14	Trachinotus falcatus	Spleen	Khánh Hòa, Vietnam	April 2014	NsA1	NsX1	S2C2
Vietnam	KH_15	Trachinotus falcatus	Kidney	Khánh Hòa, Vietnam	May 2014	NsA1	NsX5	S2C1
Vietnam	KH_17	Trachinotus falcatus	Spleen	Khánh Hòa, Vietnam	March 2014	NsA1	NsX3	S2C1
Vietnam	KH_21	Trachinotus falcatus	Kidney	Khánh Hòa, Vietnam	April 2014	NsA2	NsX3	S2C2
Vietnam	NT_01	Trachinotus falcatus	Muscle	Ninh Thuận, Vietnam	April 2014	NsA3	NsX5	S2C2
Vietnam	NT_02	Trachinotus falcatus	Spleen	Ninh Thuận, Vietnam	April 2014	NsA3	NsX2	S2C1
Vietnam	NT_03	Trachinotus falcatus	Liver	Ninh Thuận, Vietnam	April 2014	NsA5	NsX1	S2C2
Vietnam	NT_50	Trachinotus falcatus	Spleen	Ninh Thuận, Vietnam	April 2014	NsA2	NsX3	S2C2
Vietnam	PY_22	Trachinotus falcatus	Spleen	Phú Yên, Vietnam	April 2014	NsA4	NsX1	S2C1
Vietnam	PY_23	Trachinotus falcatus	Muscle	Phú Yên, Vietnam	April 2014	NsA9	NsX1	S2C1
Vietnam	PY_30	Trachinotus falcatus	Liver	Phú Yên, Vietnam	April 2014	NsA8	NsX1	S2C2
Vietnam	PY_31	Trachinotus falcatus	Bone	Phú Yên, Vietnam	April 2014	NsA10	NsX4	S2C1
Vietnam	PY_35	Trachinotus falcatus	Spleen	Phú Yên, Vietnam	April 2014	NsA7	NsX1	S2C2
Vietnam	PY_37	Trachinotus falcatus	Spleen	Phú Yên, Vietnam	April 2014	NsA6	NsX1	S2C2
Vietnam	PY_39	Trachinotus falcatus	Spleen	Phú Yên, Vietnam	April 2014	NsA7	NsX1	S2C2
Vietnam	PY_40	Trachinotus falcatus	Kidney	Phú Yên, Vietnam	April 2014	NsA6	NsX1	S2C1
Vietnam	VT_45	Trachinotus falcatus	Spleen	Vũng Tàu, Vietnam	June 2015	NsA10	NsX3	S2C1
Vietnam	VT_61	Trachinotus falcatus	Spleen	Vũng Tàu, Vietnam	June 2015	NsA11	NsX1	S2C1
Vietnam	VT_62	Trachinotus falcatus	Liver	Vũng Tàu, Vietnam	June 2015	NsA12	NsX1	S2C2

*S1, non-Vietnamese SNP genotype; S2, Vietnamese SNP genotype; C1, Vietnam Clade 1; C2, Vietnam Clade 2.

Fig. 2.

Morphology of isolated from Vietnam mariculture farms. (a) Chalky white non-haemolytic colonies of on sheep blood agar (3-week-old culture); and (b) Ziehl–Neelsen-stained , showing purple red, filamentous branching bacteria.

Four Vietnamese provinces where isolates were collected from infected permit fish (Trachinotus falcatus). strains collected in this study, their AseI and XbaI PFGE profiles, and their SNP genotypes Country Strain Fish species Host tissue Origin Collection date AseI XbaI SNP genotype* Taiwan 96127 Micropterus salmoides Unknown Taiwan 2007 A1 X1 S1 Taiwan 96994 Mugil cephalus Unknown Taiwan 2007 A4 X5 S1 Vietnam KH_11 Trachinotus falcate Muscle Khánh Hòa, Vietnam March 2014 NsA2 NsX3 S2C1 Vietnam KH_14 Trachinotus falcatus Spleen Khánh Hòa, Vietnam April 2014 NsA1 NsX1 S2C2 Vietnam KH_15 Trachinotus falcatus Kidney Khánh Hòa, Vietnam May 2014 NsA1 NsX5 S2C1 Vietnam KH_17 Trachinotus falcatus Spleen Khánh Hòa, Vietnam March 2014 NsA1 NsX3 S2C1 Vietnam KH_21 Trachinotus falcatus Kidney Khánh Hòa, Vietnam April 2014 NsA2 NsX3 S2C2 Vietnam NT_01 Trachinotus falcatus Muscle Ninh Thuận, Vietnam April 2014 NsA3 NsX5 S2C2 Vietnam NT_02 Trachinotus falcatus Spleen Ninh Thuận, Vietnam April 2014 NsA3 NsX2 S2C1 Vietnam NT_03 Trachinotus falcatus Liver Ninh Thuận, Vietnam April 2014 NsA5 NsX1 S2C2 Vietnam NT_50 Trachinotus falcatus Spleen Ninh Thuận, Vietnam April 2014 NsA2 NsX3 S2C2 Vietnam PY_22 Trachinotus falcatus Spleen Phú Yên, Vietnam April 2014 NsA4 NsX1 S2C1 Vietnam PY_23 Trachinotus falcatus Muscle Phú Yên, Vietnam April 2014 NsA9 NsX1 S2C1 Vietnam PY_30 Trachinotus falcatus Liver Phú Yên, Vietnam April 2014 NsA8 NsX1 S2C2 Vietnam PY_31 Trachinotus falcatus Bone Phú Yên, Vietnam April 2014 NsA10 NsX4 S2C1 Vietnam PY_35 Trachinotus falcatus Spleen Phú Yên, Vietnam April 2014 NsA7 NsX1 S2C2 Vietnam PY_37 Trachinotus falcatus Spleen Phú Yên, Vietnam April 2014 NsA6 NsX1 S2C2 Vietnam PY_39 Trachinotus falcatus Spleen Phú Yên, Vietnam April 2014 NsA7 NsX1 S2C2 Vietnam PY_40 Trachinotus falcatus Kidney Phú Yên, Vietnam April 2014 NsA6 NsX1 S2C1 Vietnam VT_45 Trachinotus falcatus Spleen Vũng Tàu, Vietnam June 2015 NsA10 NsX3 S2C1 Vietnam VT_61 Trachinotus falcatus Spleen Vũng Tàu, Vietnam June 2015 NsA11 NsX1 S2C1 Vietnam VT_62 Trachinotus falcatus Liver Vũng Tàu, Vietnam June 2015 NsA12 NsX1 S2C2 *S1, non-Vietnamese SNP genotype; S2, Vietnamese SNP genotype; C1, Vietnam Clade 1; C2, Vietnam Clade 2. Morphology of isolated from Vietnam mariculture farms. (a) Chalky white non-haemolytic colonies of on sheep blood agar (3-week-old culture); and (b) Ziehl–Neelsen-stained , showing purple red, filamentous branching bacteria. Isolates were preserved in Brain Heart Infusion (BHI; Difco) broth mixed with 25 % (v/v) glycerol and stored at −80 °C. For culturing, strains were grown in BHI broth at 28 °C for 5 days, with orbital shaking at 150 r.p.m. For DNA extraction, 0.3 ml of bacterial cells were pelleted at 6000 at 4 °C for 5 min and washed twice with 1× sterile PBS. To test for a haemolytic reaction, colonies grown in BHI broth were streaked onto 5 % (v/v) sheep blood agar and incubated at 28 °C for 3 weeks (Fig. 2a).

PFGE typing

PFGE was performed using 50 U XbaI or AseI (New England BioLabs) as previously described [20]. The type strain, BCRC 13745 (JCM 3360; isolated from the spleen of farmed yellowtail in Nagasaki Prefecture, Japan, ca. 1974), was included for comparative purposes. Gels of DNA fragments were analysed using GelCompar II software version 6.5 (Applied Maths). Gel bands were automatically assigned by the software and were checked and corrected manually. Only clearly resolved bands were considered for further analysis. A dendrogram was constructed using an unweighted pair group method with arithmetic mean (UPGMA) approach and the Dice similarity coefficient, with band optimization and band position tolerances of 1.0 %. Isolates that showed similarity between the banding profiles of ≥80 % (fewer than six bands of difference) were defined as indistinguishable or clonally related, whereas patterns with <80 % similarity (six or more bands of difference) represented different clusters of unrelated strains [29, 30].

DNA extraction

Total genomic DNA of bacterial isolates was extracted using the Wizard Genomic DNA Purification Kit (Promega) as per the manufacturer’s instructions. DNA was checked for sterility and shipped to the University of the Sunshine Coast, Queensland, Australia. The quantity and purity of extracted DNA were assessed using a NanoDrop 2000 (Thermo Scientific) and 1 % gel electrophoresis. DNA for Illumina WGS was submitted on dry ice to the Australian Genome Research Facility (AGRF; North Melbourne, VIC, Australia).

WGS and comparative genomic analyses

NextEra DNA Flex Illumina libraries for seven Vietnamese isolates were sequenced in four lanes of a single flowcell on the NextSeq 500 platform (Illumina), to produce 150 bp paired reads at an average depth of ~ 390× (range: 326–433×). Raw read quality was assessed with FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). These seven genomes are available on the Sequence Read Archive database under BioProject PRJNA551736. Thirteen publicly available genome assemblies (strains EM150506, CK-14008, HSY-NS01, HSY-NS02, MH196537, N-2927, NBRC 15557, NK201610020, SY-24, U-1, UTF1, ZJ0503 and TL20, corresponding to GenBank assembly references ASM186585v1, ASM188553v1, ASM301359v1, ASM366707v1, ASM1411730v1, ASM58371v2, ASM799071v1, ASM1520982v1, ASM209393v1, ASM119293v1, ASM235603v1, ASM76316v1 and ASM1822368v1, respectively) were converted to simulated Illumina reads using ART v2016.06.05 [31] prior to analysis. EM150506, the largest complete genome (GenBank accession number CP017839.1) [28], was used as the reference sequence for read mapping and gene annotation. Biallelic, orthologous SNPs from the 20 . genomes were identified using the default settings of SPANDx v4.0.1 [32], which integrates the Burrows–Wheeler Aligner [33], Sequence Alignment/Map (SAM) tools [34], BEDTools [35], VCFtools [36], Picard Tools (http://broadinstitute.github.io/picard) and Genome Analysis Toolkit [37] into a single pipeline. We performed a hierarchical rooted phylogenetic approach to identify the appropriate root for our -only phylogeny (Fig. 3). First, we identified the nearest genetic neighbour to via a SPANDx phylogenomic comparison of 134 species genomes belonging to 78 assigned species and 10 unassigned species (Fig. S1, available in the online version of this article). Next, we reconstructed a rooted phylogeny using the closest relative, N . concava NBRC 100430 (RefSeq accession: GCF_000308815.1) (Fig. S2), to determine the most ancestral strain for phylogenetic rooting.

Fig. 3.

Rooted maximum-parsimony phylogenomic analysis of seven Vietnamese (KH_11, KH_21, NT_50, PY_31, PY_37, VT_62 and VT_45; grey box) and 13 non-Vietnamese genomes using EM150506 (Fig. S2) as an outgroup. A total of 7343 high-confidence biallelic, orthologous, core-genome SNPs were used to reconstruct the phylogeny. Branch lengths within the Vietnamese clade are labelled and refer to the number of SNPs along each branch. Consistency index=0.998. Using the SPANDx SNP matrices, maximum-parsimony phylogenomic trees were reconstructed by Phylogenetic Analysis Using Parsimony (PAUP*) v4.0a168 software [38], with trees visualized using FigTree v1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/). For the -only phylogeny, variant annotation was also carried out using SnpEff [39] (Data S1). To determine similarity among genomes, and to check for potential rearrangements, contigs in all genome assemblies were oriented and arranged against the reference genome using MAUVE v2.3.1 [40]. blast Ring Image Generator (BRIG) [41] was subsequently used to visualize genome relatedness and structural variation. Finally, temporal analysis was performed with BEAST v1.10.4 [42] using the approach outlined by Holt and colleagues [43, 44].

SNP genotyping

The SPANDx SNP matrix was used to identify SNPs that: (i) distinguished Vietnamese from non-Vietnamese strains (220 SNPs; SNP1 assay), and (ii) differentiated the two Vietnamese clades (103 SNPs; SNP2 assay). We selected SNPs at positions 60409 and 587171 in EM150506 for SNP1 and SNP2 assay design, respectively (Data S1). SYBR Green-based mismatch amplification mutation assay (SYBR-MAMA) real-time PCRs were developed to permit rapid genotyping of all strains from this study against these two SNPs. SYBR-MAMA, also known as allele-specific PCR or amplification-refractory mutation system, exploits the differential 3′ amplification efficiency of Taq polymerase in real time via allele-specific primers targeting each SNP allele at their ultimate 3′-end [45]. SYBR-MAMA has been used for SNP genotyping in many bacteria [46, 47] due to its low cost and simplicity . Each SNP assay consisted of one common primer and two allele-specific primers, matching either the non-Viet allele or the Viet allele for the SNP1 assay, and the Viet Clade 1 allele or Viet Clade 2 allele for the SNP2 assay (Table 2). The same destabilizing mismatch (A for SNP1 and G for SNP2) was incorporated at the penultimate (−2) 3′ base of both allele-specific primers to increase allele specificity [48]. Cycles-to-threshold (C T) values for each allele-specific reaction were used to determine the SNP genotype for each strain via a change in C T value (ΔC T).

Table 2.

Primer sequences of SYBR-MAMAs designed in this study for the differentiation of Vietnamese strains

SNP assay and target	SNP position*	Variation (allele base)	Primer name	Primer sequence†
SNP1 (Vietnam vs. non-Vietnam strains)	60409	C/T	CtS1_nonViet_For	CAAACCGGCTGGATATCGaC
			CtS1_Viet_For	CAAACCGGCTGGATATCGaT
			SNP1_Rev	CACGCCGACGCTAGTACCTG
SNP2 (Vietnam subclades 1 vs. 2)	587171	A/C	CtS2_Clade1_Rev	CATACCGACTTCCAGGTGTGgT
			CtS2_Clade2_Rev	ACCGACTTCCAGGTGTGgG
			SNP2_For	AGCCCATTAGCAGTCGTGTGA

*SNP position as per N. seriolae EM150506 [28] (GenBank reference CP017839.1).

†Single 3′ penultimate mismatch bases are shown in lowercase; SNP-specific nucleotides are indicated in bold.

SYBR-MAMA, SYBR Green-based mismatch amplification mutation assay.;

Primer sequences of SYBR-MAMAs designed in this study for the differentiation of Vietnamese strains SNP assay and target SNP position* Variation (allele base) Primer name Primer sequence† SNP1 (Vietnam vs. non-Vietnam strains) 60409 C/T CtS1_nonViet_For CAAACCGGCTGGATATCGaC CtS1_Viet_For CAAACCGGCTGGATATCGaT SNP1_Rev CACGCCGACGCTAGTACCTG SNP2 (Vietnam subclades 1 vs. 2) 587171 A/C CtS2_Clade1_Rev CATACCGACTTCCAGGTGTGgT CtS2_Clade2_Rev ACCGACTTCCAGGTGTGgG SNP2_For AGCCCATTAGCAGTCGTGTGA *SNP position as per N. seriolae EM150506 [28] (GenBank reference CP017839.1). †Single 3′ penultimate mismatch bases are shown in lowercase; SNP-specific nucleotides are indicated in bold. SYBR-MAMA, SYBR Green-based mismatch amplification mutation assay.; To validate SNP genotypes for our newly developed assays, we first established the reference ΔC T values for each assay by running against the two Taiwanese and seven genome-sequenced Vietnamese strains. Assays were then tested against the 13 remaining Vietnamese isolates to determine their genotypes. For each PCR run, control DNA samples representing the matching and non-matching allele genotypes were used as positive controls, and at least two no-template controls were included. SYBR-MAMAs contained 1 µl of target DNA template at ~1 ng µl–1, 0.2 µM allele-specific primer, 0.2 µM common primer (Macrogen), 1× Platinum SYBR Green qPCR SuperMix-UDG (cat. no. 11733038; Thermo Fisher Scientific) and RNase/DNase-free PCR-grade water (Cat No. 10977015; Thermo Fisher Scientific), to a total reaction volume of 5 µl. Thermocycling conditions comprised an initial 2 min denaturation at 95 °C, followed by 45 cycles of 95 °C for 15 s and 60 °C for 15 s. All samples were run in duplicate.

Genome assembly and annotation

Assemblies of the seven Vietnamese genomes were constructed from the quality-filtered sequence data using the Microbial Genome Assembly Pipeline (MGAP) v1.1 (https://github.com/dsarov/MGAP---Microbial-Genome-Assembler-Pipeline) and EM150506 (GenBank reference CP017839.1) as the scaffolding reference. MGAP wraps Trimmomatic [49], Velvet [50], VelvetOptimiser (https://github.com/tseemann/VelvetOptimiser), ABACAS [51], IMAGE [52], SSPACE [53, 54], GapFiller [55, 56] and Pilon [57] into a single tool. Assemblies were primarily annotated using the Rapid Annotations using Subsystems Technology (RAST) server v2.0 with SEED data with default features (RAST annotation scheme: RASTtk, automatically fix errors, fix frameshifts, build metabolic model, backfill gaps, turn on debug, verbose level: 0, and disable replication: yes). RAST was also used to group genes into functional subsystems (akin to Clusters of Orthologous Groups). Annotated genomes were then compared with results provided by Prokka v1.8 [58]. In cases where aberrant results arose between the two tools, the functional prediction of RAST was checked and manually corrected by using BLASTP to search for similar proteins in the UniProtKB database (http://www.uniprot.org/blast/). The clustered regularly interspaced short palindromic repeat (CRISPR)-Cas region finder program (https://crisprcas.i2bc.paris-saclay.fr) was used to identify regular repeats and the intervening spacer sequences [59]. The assembled genomes for all Vietnamese strains are available from NCBI under BioProject PRJNA551736 (Table 3).

Table 3.

Genetic and genomic features of the Vietnamese strains compared with the South Korean EM150506 strain according to RAST

Strains/feature	Country	Genome size (Mbp)	Level of completion	Sequencing platform	Sequencing depth	GC%	L50 (bp)	Total no. of proteins	No. of RNAs	No. of hypothetical proteins	No. of proteins with function prediction	No. of proteins assigned to subsystem	NCBI accession no.
Strains/feature	Country	Genome size (Mbp)	Level of completion	Sequencing platform	Sequencing depth	GC%	L50 (bp)	Total no. of proteins	No. of RNAs	No. of hypothetical proteins	No. of proteins with function prediction	No. of proteins assigned to subsystem	NCBI accession no.	KH_11	Vietnam	7.66	Draft	NextSeq 500	340×	68.3	90	7655	58	3560	4465	2055	WMKE00000000.1
KH_21	Vietnam	7.72	Draft	NextSeq 500	424×	68.2	58	7657	66	3597	4428	2033	WMKF00000000.1
NT_50	Vietnam	7.96	Draft	NextSeq 500	395×	68.2	86	7640	66	3571	4437	2063	WMKG00000000.1
PY_31	Vietnam	7.68	Draft	NextSeq 500	408×	68.3	62	7602	62	3212	4818	2220	WMKC00000000.1
PY_37	Vietnam	7.55	Draft	NextSeq 500	326×	68.3	126	7707	51	3549	4525	2087	WMKD00000000.1
VT_45	Vietnam	7.94	Draft	NextSeq 500	404×	68.2	70	7958	67	3609	4718	2054	WMKB00000000.1
VT_62	Vietnam	7.7	Draft	NextSeq 500	433×	68.3	62	7643	63	3580	4428	2052	WMKH00000000.1
UTF1	Japan	8.12	Complete	PacBio	133×	68.1	1	7890	75	3572	4683	2219	AP017900.1
U-1	Japan	7.77	Draft	Roche 454; MiSeq	179×	68.3	56	7757	69	3645	4497	2291	BBYQ00000000.1
N-2927	Japan	7.76	Draft	Roche 454	160×	68.3	54	7627	66	3225	4841	2245	BAWD00000000.2
NBRC15557	Japan	7.61	Draft	Roche 454; HiSeq 1000	112×	68.3	51	7527	64	3190	4768	2211	NZ_BJWY01000001.1
SY-24	China	7.89	Draft	MiSeq	100×	68.2	52	7632	66	3227	4845	2230	MVAC00000000.1
NK201610020	China	8.31	Complete	HiSeq; PacBio	100×	68.1	1	8133	78	3398	5185	2306	NZ_CP063662.1
HSY-NS01	China	7.91	Draft	HiSeq	126×	68.2	50	7947	70	3727	4605	2133	PXZE00000000.1
HSY-NS02	China	7.76	Draft	HiSeq	110×	68.2	51	7801	69	3301	4932	2225	RCNK00000000.1
ZJ0503	China	7.71	Draft	MiSeq	100×	68.3	50	7579	66	3212	4798	2204	JNCT00000000.1
TL20	China	8.3	Complete	PacBio	200×	68.1	1	7710	66	3212	4798	2204	GCA_018223685.1
CK-14008	Korea	8.37	Draft	PacBio	139×	68.1	1	8212	78	3422	5244	2347	MOYO00000000.1
MH196537	Korea	8.26	Complete	PacBio	118×	68.1	1	8074	78	3368	5155	2296	CP059737.1
EM150506	Korea	8.3	Complete	PacBio	156×	68.1	1	8068	77	3338	5175	2277	CP017839.1

Virulence and antimicrobial resistance profile determination

The identification of antimicrobial resistance- and virulence-related genes among the Vietnamese genomes were performed using RAST and the Virulence Factor Database (VFDB), Victors and PATRIC Virulence Factor (VF) databases available on the Pathosystems Resource Integration Center (PATRIC) [60, 61]. In addition, homologues of experimentally verified pathogenicity determinants within other members of the genus were searched for in the genomes.

Results

PFGE genotypes

Twenty isolates from four Vietnamese coastal provinces (Fig. 1) were subjected to XbaI and AseI digestion to determine isolate relatedness across provinces. Restriction fragment sizes ranged from 40 kb to 1.1 Mbp. PFGE with XbaI alone resulted in between 19 and 21 restriction fragments among the Vietnamese strains; similarly, between 16 and 20 fragments were identified using AseI. Seven distinct patterns (labelled as pulsotypes NsX1–NsX7) were present using XbaI-digested DNA fragments, and ten patterns (labelled as pulsotypes NsA1–NsA10) for AseI. Using the ≥80 % similarity cut-off and ‘fewer than six bands of difference’ Tenover criteria, only one cluster was identified for each enzyme [29, 30]. Even when combining data from both enzymes, the 20 Vietnamese isolates were still closely related, irrespective of their geographical origin, as shown by their categorization into a single cluster that was distinct from the Japanese type strain (Fig. 4).

Fig. 4.

PFGE dendrogram of AseI- and XbaI-digested genomic DNA from 20 representative strains collected in four Vietnamese provinces. A type strain, BCRC 13745 (Japan), was included for comparison. Cluster analysis of genetic distances was performed using the Dice coefficient and UPGMA method (tolerance and optimization 1%). Two pulsotypes were identified based on an 80 % similarity cut-off. Numbers at tree nodes indicate the percentage of replicate trees in which the same clusters were found after 1000 bootstrap replicates.

Phylogenomic analysis

Based on the PFGE results, seven geographically diverse Vietnamese isolates were Illumina-sequenced, resulting in high-coverage draft genomes (Table 3). These genomic data were generated to address two questions: (i) whether comparative genomics, as with PFGE, would reveal minimal genetic diversity among the Vietnamese strains, and (ii) whether phylogenomic analysis could identify a potential origin for nocardiosis in Vietnamese aquaculture facilities. The seven Vietnamese genomes generated in this study, plus the sequences of 13 publicly available strains (all from other Asian countries), were compared to identify phylogenetically informative SNPs. A total of 8206 SNPs were identified; 7517 (91.6 %) were located in coding regions and comprised 126 nonsense, 5163 missense and 1531 silent variants. Of the 8206 SNPs, 7275 high-confidence, orthologous, core genome, biallelic SNPs were identified among the 20 . strains; these SNPs were used for phylogenomic reconstruction. The phylogenomic dendrogram revealed five distinct strain clusters (Fig. 3). As with PFGE, the seven Vietnamese isolates were highly clonal, with all strains clustering into a single unique ‘Vietnamese’ clade. Within this clade were two subclades that differed by 103 SNPs. These subclade SNPs were well distributed across the genome, with no evidence of SNP clusters due to recombination. The phylogenomic analysis also suggested that undergoes very little, if any, recombination, as demonstrated by a very high consistency index of 0.997; in other words, homoplastic SNP characters, which are more common following recombination events [62], were essentially absent. Within the two Vietnamese subclades, isolates were virtually identical (0–5 SNPs), indicating limited genomic alterations among these lineages (Fig. 3). Notably, there was no link between geographical region and subclade placement, with strains from Phú Yên, Khánh Hòa and Vũng Tàu falling into both Vietnamese subclades, indicating frequent transmission events between regions. The most recent common ancestor of the Vietnamese strains differed by 220 SNPs from the next closest known strain, UTF1, which was isolated from cultured yellowtail that succumbed to nocardiosis in 2008 in Miyazaki Prefecture, Japan [27]. BEAST analysis (Fig. S3) showed that the most recent common ancestor (MRCA) for the Vietnamese and Japanese strains occurred in ~1998 [95 % highest posterior density (HPD): 1997–1999], and all Vietnam strains shared an MRCA in 2001 (95 % HPD: 1999–2003). SYBR-MAMAs demonstrated clear distinction of SNP genotypes. For the SNP1 assay, the two Taiwanese strains amplified the non-Viet allele earlier than the Viet allele (ΔC T range: 2.8–5.5); in contrast, all Vietnamese strains amplified the Viet allele earlier than the non-Viet allele (ΔC T range: 6.0–9.3). For the SNP2 assay, 10 Vietnamese strains belonging to Clade 1 amplified the Clade 1 allele earlier than the Clade 2 allele (ΔC T range: 9.9–13.4), whereas 10 Clade 2 strains amplified the Clade 2 allele earlier (ΔC T range: 4.5–8.1) (Table 1). No amplification was observed for the no-template controls.

Genome assembly and functional annotation

To gain deeper insights into the seven Vietnamese genomes, we conducted a comparative analysis of genome assembly metrics and gene function. The Vietnamese genomes possess 6937 core genes and encode 1–6 rRNA genes and 49–63 transfer RNA genes. Total assembly length ranged from 7.55 to 7.96 Mbp, smaller than the closed genomes EM150506 (8.30 Mbp), MH196537 (8.26 Mbp), UTF1 (8.12 Mbp), and draft genomes reported for CK-14008 (8.37 Mbp) and NK201610020 (8.31 Mbp), but similar to other draft genomes of this species (range: 7.61 to 7.91 Mbp). GC content (68.2–68.3 %) was comparable to previously sequenced genomes (Table 3). Multiple genome alignment of all strains against EM150506 using BRIG showed a high degree of homology (Fig. 5), demonstrating high conservation among genomes. There were four main non-homologous regions (positions 2 700 000–3 100 000, 3 900 000–4 100 000, 7 500 000–7 600 000 and 8 000 000–8 200 000 bp) that were present in the reference genome but absent in all other genomes; these regions may indeed be absent or may simply reflect differences in assembly quality [5]. Most genes at these loci were classified as hypothetical proteins, mobile element proteins and repeat regions; the remaining loci are mainly genes involved in membrane transport, biosynthesis, metabolism and transcription (Data S2).

Fig. 5.

Whole-genome comparison of strains from Vietnam and other Asian countries against the EM150506 (South Korean) reference genome using the circular BLASTn alignment in blast Ring Image Generator [39]. The innermost circle shows genome scale (bp), the black irregular ring represents %GC content, and the irregular purple/green ring represents %GC skew. Outer colour rings (innermost first) represent Vietnamese strains (KH_11, KH_21, NT_50, PY_31, PY_37, VT_45, VT_62) and 13 strains from Japan, China, and South Korea. The outermost circle (dark green) represents the EM150506 reference genome.

Genetic and genomic features of the Vietnamese strains compared with the South Korean EM150506 strain according to RAST Strains/feature Country Genome size (Mbp) Level of completion Sequencing platform Sequencing depth GC% L50 (bp) Total no. of proteins No. of RNAs No. of hypothetical proteins No. of proteins with function prediction No. of proteins assigned to subsystem NCBI accession no. KH_11 Vietnam 7.66 Draft NextSeq 500 340× 68.3 90 7655 58 3560 4465 2055 WMKE00000000.1 KH_21 Vietnam 7.72 Draft NextSeq 500 424× 68.2 58 7657 66 3597 4428 2033 WMKF00000000.1 NT_50 Vietnam 7.96 Draft NextSeq 500 395× 68.2 86 7640 66 3571 4437 2063 WMKG00000000.1 PY_31 Vietnam 7.68 Draft NextSeq 500 408× 68.3 62 7602 62 3212 4818 2220 WMKC00000000.1 PY_37 Vietnam 7.55 Draft NextSeq 500 326× 68.3 126 7707 51 3549 4525 2087 WMKD00000000.1 VT_45 Vietnam 7.94 Draft NextSeq 500 404× 68.2 70 7958 67 3609 4718 2054 WMKB00000000.1 VT_62 Vietnam 7.7 Draft NextSeq 500 433× 68.3 62 7643 63 3580 4428 2052 WMKH00000000.1 UTF1 Japan 8.12 Complete PacBio 133× 68.1 1 7890 75 3572 4683 2219 AP017900.1 U-1 Japan 7.77 Draft Roche 454; MiSeq 179× 68.3 56 7757 69 3645 4497 2291 BBYQ00000000.1 N-2927 Japan 7.76 Draft Roche 454 160× 68.3 54 7627 66 3225 4841 2245 BAWD00000000.2 NBRC15557 Japan 7.61 Draft Roche 454; HiSeq 1000 112× 68.3 51 7527 64 3190 4768 2211 NZ_BJWY01000001.1 SY-24 China 7.89 Draft MiSeq 100× 68.2 52 7632 66 3227 4845 2230 MVAC00000000.1 NK201610020 China 8.31 Complete HiSeq; PacBio 100× 68.1 1 8133 78 3398 5185 2306 NZ_CP063662.1 HSY-NS01 China 7.91 Draft HiSeq 126× 68.2 50 7947 70 3727 4605 2133 PXZE00000000.1 HSY-NS02 China 7.76 Draft HiSeq 110× 68.2 51 7801 69 3301 4932 2225 RCNK00000000.1 ZJ0503 China 7.71 Draft MiSeq 100× 68.3 50 7579 66 3212 4798 2204 JNCT00000000.1 TL20 China 8.3 Complete PacBio 200× 68.1 1 7710 66 3212 4798 2204 GCA_018223685.1 CK-14008 Korea 8.37 Draft PacBio 139× 68.1 1 8212 78 3422 5244 2347 MOYO00000000.1 MH196537 Korea 8.26 Complete PacBio 118× 68.1 1 8074 78 3368 5155 2296 CP059737.1 EM150506 Korea 8.3 Complete PacBio 156× 68.1 1 8068 77 3338 5175 2277 CP017839.1 Whole-genome comparison of strains from Vietnam and other Asian countries against the EM150506 (South Korean) reference genome using the circular BLASTn alignment in blast Ring Image Generator [39]. The innermost circle shows genome scale (bp), the black irregular ring represents %GC content, and the irregular purple/green ring represents %GC skew. Outer colour rings (innermost first) represent Vietnamese strains (KH_11, KH_21, NT_50, PY_31, PY_37, VT_45, VT_62) and 13 strains from Japan, China, and South Korea. The outermost circle (dark green) represents the EM150506 reference genome. RAST predicted between 7602 and 7958 coding DNA sequences in the Vietnamese genomes, of which 45.8 % (range: 42.2–47.0 %) are of unknown function (‘hypothetical proteins’). Of the 59.1 % (range: 57.8–63.4 %) coding DNA sequences with RAST function predictions, 45.8 % (range: 43.5–50.9 %) grouped into 308–330 functional subsystems belonging to 24 protein family categories. These predictions are similar to the previously reported genomes (Table 4). Little difference was found in the number of genes in family categories among Vietnamese vs. non-Vietnamese strains (Table 4). No plasmids were identified in any of the Vietnamese genomes, consistent with most genomes lacking plasmids; the only exception is CK-14008 from South Korea, which potentially harbours two plasmids [28].

Table 4.

Number of genes for each strain associated with the 24 general Clusters of Othologous Groups functional categories predicted by RAST

Functional category	KH_11	KH_21	NT_50	PY_31	PY_37	VT_45	VT_62	UTF1	U-1	N-2927	NBRC 15557	SY-24	NK 201610020	HSY-NS01	HSY-NS02	ZJ0503	TL20	CK-14008	MH196537	EM150506
Cofactors, Vitamins, Prosthetic Groups, Pigments	198	195	196	207	206	195	194	204	211	208	209	204	210	199	205	202	198	212	209	208
Cell Wall and Capsule	32	31	31	36	31	31	31	36	36	36	36	34	36	31	36	36	31	38	36	36
Virulence, Disease and Defence	50	47	48	56	50	53	47	55	58	59	55	57	58	49	55	55	49	60	59	62
Potassium metabolism	10	10	10	11	10	11	10	11	10	11	10	11	10	10	10	10	10	11	12	10
Miscellaneous	30	30	30	33	33	30	30	33	32	32	32	32	32	29	33	33	29	32	32	31
Phages, Prophages, Transposable elements, Plasmids	7	5	5	13	6	5	7	10	16	12	8	15	16	11	12	11	9	17	16	10
Membrane Transport	31	31	31	35	31	31	31	35	37	37	37	37	37	32	35	35	32	37	37	36
Iron acquisition and metabolism	14	14	14	15	14	14	14	15	14	15	15	15	15	14	15	15	14	15	15	15
RNA metabolism	56	58	58	59	56	60	58	61	58	59	57	58	62	58	59	56	60	63	62	62
Nucleosides and Nucleotides	96	96	96	107	98	95	97	101	100	100	106	99	101	95	106	101	95	103	101	100
Protein Metabolism	219	224	225	228	212	229	221	242	238	234	233	233	246	229	236	230	237	248	246	248
Regulation and Cell signalling	23	23	23	26	23	23	23	26	26	26	26	26	26	23	27	26	24	26	26	26
Secondary metabolism	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
DNA metabolism	100	99	100	100	105	101	99	102	101	101	100	102	101	99	101	102	101	105	101	100
Fatty Acids, Lipids and Isoprenoids	226	219	243	274	229	223	239	272	310	275	273	273	311	280	273	270	281	319	308	304
Nitrogen Metabolism	32	32	32	35	32	32	32	35	36	36	28	36	35	33	35	35	33	35	36	36
Dormancy and Sporulation	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
Respiration	101	100	100	104	107	103	99	103	104	103	77	102	103	99	104	104	98	104	104	104
Stress Response	56	54	55	59	55	56	54	58	58	61	58	61	58	54	60	60	52	59	57	57
Metabolism of Aromatic Compounds	26	26	26	32	27	27	27	32	33	32	32	33	33	26	33	33	27	32	33	34
Amino Acids and Derivatives	365	369	369	391	371	365	367	394	411	406	414	404	415	387	392	392	385	417	412	399
Sulphur Metabolism	14	13	14	13	16	13	14	12	12	14	12	14	13	14	13	14	13	13	13	13
Phosphorus Metabolism	27	27	26	27	27	27	27	27	27	27	27	27	27	27	27	27	27	27	27	27
Carbohydrates	337	325	326	354	343	325	326	350	358	356	361	352	356	329	353	352	329	369	349	354

Number of genes for each strain associated with the 24 general Clusters of Othologous Groups functional categories predicted by RAST Functional category KH_11 KH_21 NT_50 PY_31 PY_37 VT_45 VT_62 UTF1 U-1 N-2927 NBRC 15557 SY-24 NK 201610020 HSY-NS01 HSY-NS02 ZJ0503 TL20 CK-14008 MH196537 EM150506 Cofactors, Vitamins, Prosthetic Groups, Pigments 198 195 196 207 206 195 194 204 211 208 209 204 210 199 205 202 198 212 209 208 Cell Wall and Capsule 32 31 31 36 31 31 31 36 36 36 36 34 36 31 36 36 31 38 36 36 Virulence, Disease and Defence 50 47 48 56 50 53 47 55 58 59 55 57 58 49 55 55 49 60 59 62 Potassium metabolism 10 10 10 11 10 11 10 11 10 11 10 11 10 10 10 10 10 11 12 10 Miscellaneous 30 30 30 33 33 30 30 33 32 32 32 32 32 29 33 33 29 32 32 31 Phages, Prophages, Transposable elements, Plasmids 7 5 5 13 6 5 7 10 16 12 8 15 16 11 12 11 9 17 16 10 Membrane Transport 31 31 31 35 31 31 31 35 37 37 37 37 37 32 35 35 32 37 37 36 Iron acquisition and metabolism 14 14 14 15 14 14 14 15 14 15 15 15 15 14 15 15 14 15 15 15 RNA metabolism 56 58 58 59 56 60 58 61 58 59 57 58 62 58 59 56 60 63 62 62 Nucleosides and Nucleotides 96 96 96 107 98 95 97 101 100 100 106 99 101 95 106 101 95 103 101 100 Protein Metabolism 219 224 225 228 212 229 221 242 238 234 233 233 246 229 236 230 237 248 246 248 Regulation and Cell signalling 23 23 23 26 23 23 23 26 26 26 26 26 26 23 27 26 24 26 26 26 Secondary metabolism 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 DNA metabolism 100 99 100 100 105 101 99 102 101 101 100 102 101 99 101 102 101 105 101 100 Fatty Acids, Lipids and Isoprenoids 226 219 243 274 229 223 239 272 310 275 273 273 311 280 273 270 281 319 308 304 Nitrogen Metabolism 32 32 32 35 32 32 32 35 36 36 28 36 35 33 35 35 33 35 36 36 Dormancy and Sporulation 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Respiration 101 100 100 104 107 103 99 103 104 103 77 102 103 99 104 104 98 104 104 104 Stress Response 56 54 55 59 55 56 54 58 58 61 58 61 58 54 60 60 52 59 57 57 Metabolism of Aromatic Compounds 26 26 26 32 27 27 27 32 33 32 32 33 33 26 33 33 27 32 33 34 Amino Acids and Derivatives 365 369 369 391 371 365 367 394 411 406 414 404 415 387 392 392 385 417 412 399 Sulphur Metabolism 14 13 14 13 16 13 14 12 12 14 12 14 13 14 13 14 13 13 13 13 Phosphorus Metabolism 27 27 26 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 Carbohydrates 337 325 326 354 343 325 326 350 358 356 361 352 356 329 353 352 329 369 349 354 A typical CRISPR-Cas system contains both a CRISPR array of repeat and spacer units, and associated cas genes; however, many systems are devoid of one of these components. These atypical CRISPR configurations are known as ‘orphan’ or ‘isolated’ CRISPR arrays and cas loci depending on which component is lacking. Between three and six CRISPR arrays were found in the Vietnamese strains, with lengths varying from 73 to 114 bp. Each array is made up of two direct repeats and one spacer without nearby Cas (CRISPR-associated) genes (Data S3). Notably, the same CRISPR array structure was found in all 20 . genomes.

Virulence and antimicrobial/toxin resistance profiles

To explore the pathogenic potential of the Vietnamese strains, we assessed their virulence and antimicrobial/toxin resistance gene content in comparison to non-Vietnamese genomes. The RAST, VFDB, Victors and VF databases found between 182 and 202 genes that encode virulence and resistance factors, including gene products associated with Adherence (n=50–54), Cellular metabolism and nutrient uptake (n=10), Damage (n=6–7), Invasion and intracellular survival (n=33–36), Resistance to antibiotics and toxic compounds (n=65–81), and Other (n=16–18) (Data S3). In general, virulence factors and antimicrobial/toxin resistance factors were almost identical in number among the Vietnamese strains and were comparable to non-Vietnamese strains. However, some genes were absent in most Vietnamese strains but present in most non-Vietnamese strains, such as ‘MCE-family protein Mce1D’, ‘MCE-family protein Mce1F’, ‘Chromate transport protein ChrA’, ‘NAD(P)H oxidoreductase YRKL (EC 1.6.99.-) Putative NADPH-quinone reductase (modulator of drug activity B) Flavodoxin 2’ and ‘Tellurite resistance protein TerB’. In contrast, ‘Hemolysins and related proteins containing cystathionine-β-synthase domains’ was found only in EM150506. Several experimentally verified virulence factors identified in and other species, including catalase, superoxide dismutase, phospholipase C and protease [63], were present in all Vietnamese and non-Vietnamese strains, indicating that they are highly conserved genes within this genus.

Discussion

PFGE has conventionally been considered the ‘gold standard’ for studying the genetic diversity of many different pathogenic bacteria species, including [19, 20, 30, 64]. PFGE has previously identified multiple pulsotypes among isolates retrieved from fish in Japan and Taiwan [19, 20]. Notably, one study identified identical pulsotypes between certain Taiwanese 1997–2007 outbreak strains and Japanese isolated from yellowtail in 2002 (pulsotypes X1 and A1) and 2005 (pulsotype X11) [20], suggesting at least two transmission events between Taiwan and Japan. Unlike from Japan and Taiwan, all 20 Vietnamese isolates fell into a single cluster, even when using a combination of XbaI and AseI. However, PFGE lacked the resolution to differentiate Vietnamese isolates into the two clades identified using phylogenomic analysis. This limited resolution has also been documented for other bacteria such as [65], Listeria monocytogenes [66] and [67]. It was unfortunately not practical to compare the Vietnamese pulsotypes with published studies due to known challenges with interlaboratory standardization using PFGE [68]; therefore, it is not known whether the Vietnamese PFGE cluster has been previously reported. Next-generation sequencing provides excellent resolution, accuracy and data portability, and as such, has begun replacing PFGE as the new gold standard for nocardiosis outbreak analyses [69]. To illustrate the value of WGS for nocardiosis epidemiological investigations, we sequenced seven representative Vietnamese strains and compared them with all publicly available genomes (n=13). Like PFGE, the limited genomic variation (0–5 SNPs; Fig. 3) observed among Vietnamese strains confirms a recent, single introduction into Vietnam, with subsequent dissemination across multiple mariculture facilities within the South Central Coast region. Phylogenomic analysis showed that Vietnamese strains were most closely related to UTF1, which was isolated from farmed yellowtail in Japan in 2008 [27]; this strain differed from the Vietnamese common ancestor by just 220 SNPs (MRCA: ~1998). Shimahara and colleagues [20] have previously postulated that transboundary translocation of live fish stocks asymptomatically infected with from China and Hong Kong may have introduced new strains into Japan. Wild-caught amberjack juveniles, one of the most susceptible host species for infection, was also reportedly imported into Japan from Vietnam in 2000 [70]. However, there has not yet been a case of nocardiosis reported in Vietnam in other aquatic species besides Trachinotus species, and the first of these cases were only recorded in 2012 [9]; therefore, it is unlikely that the Japanese was introduced from amberjack imported from Vietnam. Based on our genomic analysis, it is plausible that from Japan has been introduced into other countries such as Vietnam given that international export of valuable aquaculture fish species is relatively common; however, there is a paucity of information about import–export of live fish stocks from Japan or Vietnam, and, as such, this hypothesis cannot be confirmed. Our BEAST results (Fig. S3) add further to our hypothesis of a recent introduction of into Vietnam from infected Trachinotus species. Our analysis showed that introduction into Vietnam occurred in ~2001 (95 % HPD: 1999–2003), which fits with the Taiwanese/Japanese outbreaks occurring in the late 1990s and early 2000s. We unfortunately lack isolate data from Taiwan that could suggest the directionality of transfer, or that could provide more accurate source attribution; nevertheless, we have been able to make some interesting and useful insights into the evolutionary history of in Vietnam based on this dated phylogeny. Whilst our results suggest a probable Asian origin for the Vietnamese outbreaks, there are few publicly available genomes (only 20 as of 11 February 2022, including seven from our study), and none from other Asian regions such as Taiwan [20], Singapore, Malaysia, or Indonesia [71], or non-Asian regions such as Mexico [23] and USA [21] where outbreaks have been documented; therefore, the precise origin of the Vietnamese outbreaks and mode of introduction currently remain unresolved. Concerningly, our results, and those of others, demonstrate that, unchecked, transmission may represent a substantial unmitigated risk to fish aquaculture. It is thus an utmost imperative to establish domestic and international monitoring processes for for both farmed and wild species, including the implementation of molecular methods to characterize new outbreaks, to prevent the spread of this devastating pathogen into new environments, and associated heavy economic losses and food security concerns. To facilitate the rapid identification of genotypes among our Vietnamese strains, we designed inexpensive SYBR-MAMAs targeting two phylogenetically informative SNPs. The first SNP assay robustly differentiates Vietnam from non-Vietnamese strains, thereby permitting prospective identification of newly transmitted strains into Vietnam, an essential facet in future fish importation biocontrol efforts. This assay can also be used to monitor for the emergence of Vietnamese strains in new regions, such as new aquaculture facilities in Vietnam, or prior to export of fingerlings to other countries. The second SNP assay rapidly differentiates strains belonging to the two Vietnamese clades. By applying this second assay to the 20 Vietnamese strains, we observed that both clades were well disseminated across all four provinces: Khánh Hòa, Ninh Thuận, Phú Yên and Vũng Tàu. Phylogenomic analysis of seven representative Vietnamese strains also showed dispersal of these two clades among three of the four provinces. Although unconfirmed, it is probable that the widespread trade of eggs, fingerlings and live permit fish for aquaculture in Vietnam since industry inception in the early 2000s, including local unmonitored trade among fish farmers, has driven the successful dissemination of among Vietnamese permit farms. Taken together, our findings highlight the large risk of undetected dispersal among mariculture facilities and the need for establishing strict monitoring practices to prevent further pathogen transmission. WGS is currently laborious, expensive and inaccessible to most laboratories in Vietnam and many other Asian countries. Using comparative genomics, we established a catalogue of SNPs specific to each clade and subclade. This SNP database may be useful for both targeted resequencing efforts and the design of phylogenetically robust genotyping methods to permit source tracing of future outbreaks without the requirement for further WGS or bioinformatic analyses. The SYBR-MAMAs developed in this study successfully detected two phylogenetically informative SNPs, with genotyping results fully concordant with WGS, confirming that SYBR-MAMA is a valuable and inexpensive diagnostic method for SNP characterization. Very little is known about the pathogenesis of species, which are capable of invading host macrophages and preventing the fusion of phagosomes with lysosomes, leading to long-term survival and proliferation in host cells [72]. Due to the paucity of available genomic data for this pathogen, a final aspect of this study was to better understand virulence and antimicrobial resistance factors encoded by the genome. Our analysis of 2020 genomes is the largest genomic assessment of this pathogen to date, and largely corroborates the conclusions drawn from a previous analysis of seven genomes, which showed that have >99.9 % Orthologous Average Nucleotide Identity values [28]. Analysis of the genome content of seven Vietnamese strains revealed that, like non-Vietnamese strains, they encode a high proportion of ‘hypothetical protein’ genes (i.e. 45.8 %), a finding that highlights the need for more studies to investigate the functions of these genes. More than 180 core genes (present in all strains) were found to code for antimicrobial resistance and virulence factors in the Vietnamese strains, including genes associated with Adherence (n=49), Cellular metabolism and nutrient uptake (n=10), Damage (n=6), Invasion and intracellular survival (n=33), Resistance to antibiotics and toxic compounds (n=26), and Others (n=11) that may possibly account for the main virulence traits of this fish pathogen. The presence of conserved genes encoding β-lactamase class C-like and penicillin-binding proteins (n=11), multidrug resistance protein ErmB (n=1), probable multidrug resistance protein NorM (n=1) and a small multidrug resistance family protein (n=1) in all genomes may explain observed antimicrobial resistance towards penicillin and cephalexin, two β-lactam antibiotics that are commonly used to treat nocardiosis in Vietnamese permit fish farms (data not shown). CRISPRs, which are encoded by many bacterial and archaeal species, defend against invasive mobile genetic elements such as viral or plasmid DNA [73], and also play a role in bacterial pathogenesis, biofilm formation, adherence, programmed cell death and quorum sensing [74]. Acquisition and maintenance of CRISPR-Cas systems are greatly influenced by environmental conditions and microbial communities [75]. Recent research has shown that 40 % of CRISPR-Cas loci are away from any associated cas genes or are not associated with cas genes, which are known as orphan CRISPR arrays [76]. Like many other bacterial species such as , , , spp., and [77-81], orphan CRISPR arrays were found in genomes. These incomplete CRISPR-Cas systems may be a remnant of decaying loci that are recruited and/or selectively maintained to perform important, but as yet unknown, biological functions [73]. Alternatively, our results may be an artefact of current CRISPR-Cas prediction tools, which predict the CRISPRs primarily based on the typical CRISPR structure [77]. As the role of these CRISPR loci in is not yet known, further work is needed to uncover their precise role in this pathogen. In conclusion, our study provides novel insights into the epidemiology of outbreaks in farmed permit fish in Vietnam. Our detailed molecular and genomic analyses revealed minimal genomic diversity among Vietnamese isolates. Unlike PFGE, WGS detected strain variation at single-base resolution, and identified two distinct Vietnamese clades that share recent ancestry. Our results indicate recent importation of a single clone into Vietnam, which has then led to a nationwide outbreak of nocardiosis in permit fish farms. The analysis of additional genomes, particularly from other geographical regions, will be important for better understanding evolution, and will enable more precise investigations into the origin and transmission of this devastating pathogen. Finally, our SNP assays provide a rapid and inexpensive method for genotyping of ongoing and future nocardiosis outbreaks in Vietnam. Click here for additional data file.

58 in total

1. Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors: Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal: Genome Res Date: 2004-07 Impact factor: 9.043

2. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors: Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal: Fly (Austin) Date: 2012 Apr-Jun Impact factor: 2.160

3. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

4. Cost-effective interrogation of single nucleotide polymorphisms using the mismatch amplification mutation assay and capillary electrophoresis.

Authors: Erin P Price; Molly A Matthews; Jodi A Beaudry; Jonathan L Allred; James M Schupp; Dawn N Birdsell; Talima Pearson; Paul Keim
Journal: Electrophoresis Date: 2010-12 Impact factor: 3.535

5. Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements.

Authors: Quan Zhang; Yuzhen Ye
Journal: BMC Bioinformatics Date: 2017-02-06 Impact factor: 3.169

6. Genomic characterization of Nocardia seriolae strains isolated from diseased fish.

Authors: Hyun-Ja Han; Min-Jung Kwak; Sung-Min Ha; Seung-Jo Yang; Jin Do Kim; Kyoung-Hee Cho; Tae-Wook Kim; Mi Young Cho; Byung-Yong Kim; Sung-Hee Jung; Jongsik Chun
Journal: Microbiologyopen Date: 2018-08-16 Impact factor: 3.139

Review 7. Typing methods based on whole genome sequencing data.

Authors: Laura Uelze; Josephine Grützke; Maria Borowiak; Jens Andre Hammerl; Katharina Juraschek; Carlus Deneke; Simon H Tausch; Burkhard Malorny
Journal: One Health Outlook Date: 2020-02-18

8. High viral abundance and low diversity are associated with increased CRISPR-Cas prevalence across microbial ecosystems.

Authors: Sean Meaden; Ambarish Biswas; Ksenia Arkhipova; Sergio E Morales; Bas E Dutilh; Edze R Westra; Peter C Fineran
Journal: Curr Biol Date: 2021-11-09 Impact factor: 10.834

9. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

10. Complete genome sequence analysis of Nocardia brasiliensis HUJEG-1 reveals a saprobic lifestyle and the genes needed for human pathogenesis.

Authors: Lucio Vera-Cabrera; Rocio Ortiz-Lopez; Ramiro Elizondo-Gonzalez; Jorge Ocampo-Candiani
Journal: PLoS One Date: 2013-06-03 Impact factor: 3.240