Literature DB >> 34258242

Comparative analysis of draft genome assemblies developed from whole genome sequences of two Hyaloperonospora brassicae isolate samples differing in field virulence on Brassica napus.

Ming Pei You¹, Javed Akhatar², Meenakshi Mittal², Martin J Barbetti¹, Solomon Maina^1,3, Surinder S Banga².

Abstract

Hyaloperonospora brassicae causes downy mildew, a major disease of Brassicaceae species. We sequenced the genomes of two H. brassicae isolates of high (Sample B) and low (Sample C) virulence. Sequencing reads were first assembled de novo with software's SOAPdenovo2, ABySS V2.1 and Velvet V1.1 and later combined to create meta-assemblies with genome sizes of 72.762 and 76.950Mb and predicted gene densities of 1628 and 1644 /Mb, respectively. We could annotate 12.255 and 13,030 genes with high proportions (91-92%) of complete BUSCOs for Sample B and C, respectively. Comparative analysis revealed conserved and varied molecular machinery underlying the physiological specialisation and infection capabilities. BLAST analysis against PHI gene database suggested a relatively higher loss of genes for virulence and pathogenicity in Sample C compared to Sample B, reflecting pathogen evolution through differential rates of mutation and selection. These studies will enable identification and monitoring of H. brassicae virulence factors prevailing in-field.

Entities: Chemical Disease Gene Species

Keywords: Brassicaceae; Crucifers; Downy mildew; Genome sequence; Hyaloperonospora brassicae; Indian mustard; Rapeseed

Year: 2021 PMID： 34258242 PMCID： PMC8254085 DOI： 10.1016/j.btre.2021.e00653

Source DB: PubMed Journal: Biotechnol Rep (Amst) ISSN： 2215-017X

Introduction

Downy mildew (Hyaloperonospora brassicae) is a major foliar disease on oilseed and other Brassicaceae species [1,2]. This disease is endemic across Brassica-growing regions worldwide [3]. Severe outbreaks of downy mildew occur in India on mustard (B. juncea) [4] and across southern Australia in oilseed rape (B. napus), with worst-affected crops showing up to 55% leaves diseased, 15% leaf area lost to lesions and 13% leaf area collapsed [5,6]. Estimated annual losses up to AUS$13 million were reported in Western Australia alone [5,6]. Across southern Australia, downy mildew has become increasingly severe on seedlings and young plants over the past 15 years, likely fostered by increasing temperatures consequent from climate change [5,7]. The definitive taxonomy of downy mildews (Peronosporaceae) remains challenging. Initially, taxonomic classification for Peronospora was based on conidial measurements and specialisation on host species [8,9]. While Constantinescu and Fatehi [10] used morphological and molecular studies to divide Peronospora into Peronospora s. str., Hyaloperonospora and Perofascia, multi-molecular studies by Choi et al. and Göker et al. [11], [12], [13] and Voglmayer [14] highlighted genetic differences within H. parasitica and suggested this genus may be further divided. Subsequently, Göker et al. [15] reclassified the downy mildew pathogen of brassicas as H. brassicae (syn. Hyaloperonospora parasitica). H. brassicae shows a high degree of physiological specialisation, including distinct physiologic races or pathotypes, for example, Natti et al. [16] and Coelho et al. [2] for isolates from B. oleracea; Nashaat and Rawlinson [17] for isolates on B. napus ssp. oleifera; and Nashaat et al. [4,18] for isolates on B. napus. More recently, Mohammed et al. [19] identified appropriate host differentials and characterised eight distinct pathotypes of H. brassicae and defined the phylogenetic relationships for H. brassicae in Australia. Pathotypes and phylogenetic variation not only determine the severity of downy mildew epidemics in Brassica spp. [19], but are also crucial to the identification of appropriate host resistances and their effective deployment [19]. However, importantly, Mohammed et al. [6,20] highlighted high levels of pathotype-independent resistance sources across diverse Brassicaceae to H. brassicae, particularly within R. sativus, B. carinata, B. juncea, B. oleracea and Crambe abyssinica. Increasing the level of virulence (i.e., disease-producing power) of H. brassicae isolates may challenge such resistance sources [19]. Mohammed et al. [19] were the first to highlight the variation in levels of virulence in H. brassicae isolates in Australia. They also showed that while isolates were most virulent on their Brassica species of origin, they were still virulent on other related Brassica species, as shown earlier by Chang et al. [21], McMeekin [22], Dickinson and Greenhalgh [23], Sherriff and Lucas [24], Silue et al. [1] and Vicente et al. [25]. Mohammed et al. [19] also showed that this increased virulence was associated with a corresponding increased severity of downy mildew disease of rapeseed across southern Australia. In addition, there is a historical molecular evidence confirming the existence of genetic differences within H. brassicae (e.g., Choi et al. [11]; Göker et al. [13]; Voglmayr [14]). H. brassicae isolates can be either homothallic or heterothallic [26] and outcrossing of homothallic isolates involving recombination of virulence attributes is probably important for increasing variation for virulence [27]. While there have been investigations of other downy mildews involving linkage mapping, genome-wide association mapping, genome sequencing or enrichment sequencing [28], this is not the case in relation to virulence of H. brassicae. This paper reports de novo whole-genome assemblies and analysis of two contrasting Australian H. brassicae samples, of high versus low virulence. We also show differences between the biological processes (biosynthetic and catabolic process, cell communication, cellular component organization and localization) and cellular components (catalytic complex, coated membrane, endomembrane system, endoplasmic reticulum membrane, and envelope) between lower and more virulent isolates. To the best of our knowledge, this is the first report of a genome assembly for this organism.

Materials and methods

Pathogen isolates

Two separate collections of Hyaloperonospora brassicae isolates, named Samples B and C, were used for the present studies. Sample B comprised 5 isolates of H. brassicae from Victoria, Australia, from commercial rapeseed fields that showed high levels of leaf infection (ranging from 10 to 45% and with a mean % disease incidence of 29.5 (i.e., situations showing high virulence of isolates). Sample C comprised 6 isolates of H. brassicae of which 5 were from New South Wales, Australia, and one from Western Australia. These samples were collected from commercial rapeseed fields with lower levels of leaf infection (0 to 20% and with a mean % disease incidence of 8.2) compared to the Sample B (i.e., situations showing a much lower virulence of isolates) (Table 1). These isolates were obtained as part of an Australia-wide survey of foliar diseases of rapeseed in 2018, where 80 isolates of H. brassicae were obtained. Samples B and C were selected to best represent a range of high and low virulence. As virulence level was the sole criterion for selection of the two H. brassicae samples, isolates used were not pathotyped.

Table 1

Details of Hyaloperonospora brassicae isolates and their Samples B [with high virulence, (B)] and C [with low virulence, (C)]. (% leaf infection range) as used in these studies.

Field survey site number	Australian state	Isolate Sample (B or C) and % leaf infection range	GPS location
Field survey site number	Australian state		S (latitude)	E (longitude)
167	Victoria	B (35-40%)	-36.59325	144.7731333
167	Victoria	B(35-40%)	-36.59325	144.7731333
168	Victoria	B(10-45%)	-36.59056667	144.70565
172	Victoria	B(20-40%)	-36.10483333	145.20465
177	Victoria	B(10-20%)	-35.92498333	145.3684
186	New South Wales	C(5-10%)	-35.16766667	146.9054333
193	New South Wales	C(10-20%)	-35.5147	146.7008667
201	New South Wales	C (0-20%)	-36.06776667	146.1856333
218	New South Wales	C(0-15%)	-36.1562	145.9855833
230	New South Wales	C(4-5%)	-34.96745	147.5596833
239	Western Australia	C(0-10%)	-31.166	116.461

Details of Hyaloperonospora brassicae isolates and their Samples B [with high virulence, (B)] and C [with low virulence, (C)]. (% leaf infection range) as used in these studies.

DNA extraction and PCR conditions

DNA was extracted from 80 isolates of H. brassicae (collected in 2018 from 39 locations across southern Australia), using the procedure of Cenis [29]. This same procedure was used previously by Mohammed et al. [19] for H. brassicae. Briefly, mycelia from leaf samples were harvested directly from infested leaves using tweezers under a dissecting microscope. Mycelia were then macerated using Precellys Evolution Homogeniser (Bertin TECHNOLOGIES) in 300 μl of extraction buffer [200 mM Tris-HCl (pH 8.5), 250 mM NaCl, 25 mM EDTA], 0.5% (wt/vol) sodium dodecyl sulphate (SDS)]. Then, 150 μL of 3 M sodium acetate (pH 5.2) was added and mixed well by pipetting gently and maintained at 20°C for 10 min, centrifuging at 13,000 rpm for 10 min. Subsequently, the supernatant was transferred to a fresh tube before adding an equal volume of cold isopropanol (450 µL) and held at room temperature for at least 15min and then centrifuged for 13,000 rpm for 10 min and the supernatant was discarded. Precipitated DNA pellet was washed with 70% (vol:vol, ethanol:water), air dried, then resuspended in 30–50 μl of TE buffer. Quantity and the quality of the extracted DNA was determined using a NanoDrop 1000 Spectrophotometer (Thermo Scientific). DNA and then stored at 4°C. DNA was subjected to PCR using a master mix of a total volume of (50 µL) that contained 0.2 µM of each primer (primers ITS1-O (5’-CGG AAG GAT CAT TAC CAC-3’[30]) and ITS4-H (5’-TCC TCC GCT TAT TAA TAT GC-3’ [12], a modification of ITS4.ITS1-O was chosen due to its specificity that resolves the need for additional amplification of host ITS rDNA [15, 30]. PCR assays were undertaken as follows: Initial denaturation 94°C for 2 min, followed by 35 cycles at 94°C for 1 min, with annealing gradient temperature set at range 50°C−60°C (in this study annealing temperature of 53°C was selected) for 1 min and extension at 72°C for 2 min; followed by a final extension step at 72°C for 7 min and then held at 4°C. PCR products were subjected to agarose gel electrophoresis at 60 mV for 120 min on a 1% (w/v) agarose gel containing 0.1% GelRed™Biotium Inc. (United States) and then visualized under UV light. The amplified PCR products of 80 isolates were subjected to Sanger genome sequencing and library preparation by Macrogen Inc. (Korea) and sequences analysed using Geneious. The final sequences were subjected to NCBI BLAST to confirm the species and percentage nucleotide relationships. DNA from highly similar sequences was selected as a representative of geographical regions of Victoria (5 isolates), hereafter named Sample B, and Sample C [New South Wales (5 isolates) plus one from Western Australia]. Samples B and C were individually pooled together in each tube (sample) separately for library preparation. Shotgun genome sequencing and library preparation were performed by Macrogen Inc. (Korea).

De novo genome assemblies

Our bioinformatics approach involved constructing separate but multiple de novo genome assemblies, three each for the two samples (Samples B and C) by using different algorithms. Two sets of de novo assemblies were then merged separately, using Metassembler reconciliation algorithm to produce one best assembly for each sample. We used 100 bp paired-end reads with about 100x genome coverage for developing genome assemblies. The raw reads of each sample were first error-corrected with software Trimmomatic and then quality checked with FASTQC program. Software KmerGenieV1.7 [31] was next used to select the most optimum k-mer length for constructing the genome assembly. Subsequently, we used three softwares, SOAPdenovo2 [32, 33], ABySSV2.1 [34] and Velvet V1.1 [35], for developing three de novo assemblies for each of the two samples. Software settings used for these algorithms were SOAPdenovo2 (all –s –K 87–R –F YES), ABySS (k=87), Velvet (k-mer 87 -ins_length 300 –exp_cov 50). Software GapFillerV1.1 (m 20 –o 2 –r 0.7 –n 10 –d 50 –t 10 –T1 –I 1) was used to improve the assemblies and scaffolding was performed using SSPACE V3.0 (-x 0 –m 20 –o 10 –k 3 –a 0.70) [36]. We next used the software tool Metassembler [37] with default settings to merge three sets of de novo assemblies into two meta-assemblies. The development of meta-assemblies was facilitated through pairwise progressive merger, using cross species mate paired libraries for each Sample (A or B) of H. brassicae isolates. Mate-pair libraries were generated in silico by using software Cross-mates as implemented in Cross-species Scaffolding [38]. Genome assembly of Hyaloperonospora arabidopsidis (protists.ensembl.org/Hyaloperonospora_arabidopsidis/Info/Annotation/#assembly;GCA_000173235.2), hereafter Sample A, was used as a reference [39]. The completeness of the meta-assemblies, one each for Sample A and B, was assessed with software BUSCO [40] using Alveolata dataset of BUSCOs.

Gene prediction

We predicted protein-coding genes of H. brassicae isolates using GeneMarkS v2 with default settings and revised gene models ("Hyaar1_GeneModels_FilteredModels3_nt.fasta") available at JGI website (https://phycocosm.jgi.doe.gov/Hyaar1/Hyaar1.home.html). Pseudogenes were detected from genome sequences (meta-assembly scaffolds), both at exon and gene levels. Predicted genes compared favourably with the gene finders that employed supervised annotation in the reference set of H. arabidopsidis. We also conducted a whole-genome BLAST analysis against the pathogen-host interaction (PHI) gene database [41] to identify potential virulence-associated genes. For this, a set of all PHI-base accessions with associated phenotypes and protein sequence, was downloaded (http://www.phi-base.org) and gene sets from Samples B and C were used to run a BLASTX search against PHI-base sequence set with e-value <10–3. The distribution of BLAST2GO annotations of putative genes from H. arabidopsidis and H. brassicae were also computed [42].

Functional genome analysis

The assembled genome sequences were functionally annotated using the OmicsBox pipeline. This involved querying of predicted protein-coding sequences against Oomycete Non-redundant protein database from NCBI using BLASTX with default settings (e-10−3). This analysis assigned a protein ID/accession number to each positive hit. The resulting sequences were run through a mapping process that used assigned accessions to retrieve gene names, then we sought their specific functions in the GO database. The final step in this pipeline was to run annotation analysis. This step helped assign the most specific GO terms to each sequence with respect to their cellular location, role in biological processes, and their molecular functions. The sequences with assigned GO terms were subsequently used for further analysis. Our focus was on the genomic variation between two groups of isolates in terms of their virulence (i.e., Samples B and C). Therefore, sequences showing similarity to various virulence-related genes were further analysed.

Results

Illumina sequencing

Produced data from Sample B were 78,607,680 reads and total read bases 7.9G bp and, for Sample C, reads were 91,282,136 with total read bases of 9.2G bp. GC content and Q30 score for Sample B were 47.98 percent and 92.4 percent, and for Sample C, it was 47.75 percent and 92.7 percent, respectively. The raw reads were first curated by discarding low-quality reads and removing adaptor sequences using the software Trimmomatic. Total clean reads for Samples B and C were 76,082,958 and 88,547,096, respectively. These were used for further analysis (Table 2).

Table 2

Genomic reads stats of Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Sample ID	Total reads bases (bp)	Total raw reads	GC (%)	AT (%)	Q20 (%)	Q30 (%)	Clean reads
Sample B	7,939,375,680	78,607,680	47.98	52.02	95.44	92.4	76,082,958
Sample C	9,219,495,736	91,282,136	47.75	52.25	95.63	92.7	88,547,096

Genomic reads stats of Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

De novo and meta-assemblies of H. brassicae

We used three state-of-the-art de Bruijn graph assemblers, SOAPdenovo2, ABySS, and Velvet, to develop three sets of the genome assemblies that produced varied numbers and sizes of scaffolds. Scaffolds were constructed from contigs produced by Velvet for both samples, using softwares SSAPCE and GapFiller. This software facilitated filling of 1,505 (14.11%) gaps out of 10,665 in Sample B and 1,310 (11.60%) out of 11,395 in Sample C. SOAdenovo2 produced 447,236 and 425,174 scaffolds for Samples B and C, respectively, at k-mer 87. Software KmerGenieV1.7 was used to select 87 as the best k-mer length for de novo genome assembly (S Fig. 1). The longest scaffolds had 47,638bp for Sample B and 58,419bp for Sample C. ABySS assembled Samples B and C had scaffolds producing 198,569 and 194,799, respectively. The N50 value of assemblies were 8,505 bp for Sample B and 8,796bp for sample C. The assembler ABySS constructed longest scaffolds. These were 136,672bp for Sample B and 197,135bp for Sample C. Velvet assembled 296,308 scaffolds for Sample B and 218,453 for Sample C. The longest scaffold in the assembly of Sample B was 58,632 bp and of Sample C was 59,541bp (Table 3). We meta-assembled three de novo assemblies separately for each sample, as described earlier. This step dramatically improved the assemblies, by detection and integration of poorly sequenced regions. Sample B had 6,438 scaffolds with an N50 of 23,533bp, the largest scaffolds being 154,210bp (Table 3). Sample C had 6,470 scaffolds with an N50 of 24,471bp, the longest scaffolds being 197,135bp in length. Total length of Sample B was 72,162,632bp compared to 76,950,397bp for Sample C. The average scaffold length of Sample B was 11,209bp and Sample C was 11,893bp. The criteria used to select the best assemblies were total genome size expected for genome, lower number of scaffolds, larger N50 value and fewer non-ATGC characters. The N50 value of both samples was approximately 24 Kb, showing a reasonably good quality assembly for downstream analysis. Genome assemblies are available on GenBank as BioProject PRJNA540582 with accession numbers SAMN11547018 and SAMN11547017. Assembly parameters were also assessed with Quast [43] to determine contig sizes, N50, and assembly size (Supplementary Table 1). The reliability of the assemblies and genome annotation was also obvious from high proportion of the Benchmarking Universal Single-Copy Orthologs's (BUSCO) found in the predicted gene models, including 91%-92% as complete BUSCO genes (Fig.1).

Fig. 1

Assessment of genome assembly and annotation completeness with software BUSCO for Hyaloperonospora brassicae Samples B [with high virulence] and H. brassicae Sample C [with low virulence].

Table 3

Final whole genome assembly features of Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)], by Metassembler.

Assembly parameters	Sample B	Sample C
Total genome size (bp)	72,162,632	76,950,397
Number of scaffold	6,438	6,470
Minimum scaffold length (bp)	500	500
Maximum scaffold length (bp)	154,210	197,135
Average scaffold length (bp)	11,209	11,893
Total number of non-ATGC characters	779,011	664,485
N50 value (bp)	23,533	24,471
GC (%)	47.33	47.42

Assessment of genome assembly and annotation completeness with software BUSCO for Hyaloperonospora brassicae Samples B [with high virulence] and H. brassicae Sample C [with low virulence]. Final whole genome assembly features of Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)], by Metassembler.

Annotation of protein-coding genes

We annotated the meta-assemblies of two Australian H. brassicae samples for protein-coding genes. We used Ab initio gene prediction after protein homology with the trained optimal parameters with GeneMarkS. This allowed prediction of 36,819 genes in Sample B and 40,346 genes in Sample C (Table 4), on the basis of the reference genome of H. arabidopsidis. Gene densities were 1425/Mb and 1435/Mb for Samples B and C, respectively. For Sample B, 24,055 genes were blasted with Oomycete Non-redundant Protein Database (NCBI); out of which, 16,996 were mapped, 12,255 were annotated. For Sample C, 26,362 genes were blasted, out of a total 40,346 predicted genes with Oomycete Non-redundant Protein Database. Of these, 18,570 genes were mapped, of which 13,030 and 5,540 genes revealed significant and non-significant annotations, respectively (Table 5). We also plotted Venn Diagram comparing H. arabidopsidis genes found in the genome assemblies of H. brassicae Samples B (with high virulence) and C (with low virulence) (Fig. 2). Almost 33.0 percent of the genes were common across all three assemblies and 16.7 percent of H. arabidopsidis genes were not found in H. brassicae assemblies.

Table 4

Details of protein-coding genes in Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Protein-coding genes details:	Sample B	Sample C
Total size (bp)	22,614,046	24,541,500
Total protein-coding genes(≥250bp)	36,819	40,346
Gene density (number of genes per Mb)	1,628	1,644
Annotation details:
Blasted coding genes	24,055	26,362
Mapped of blasted coding genes	16,996	18,570
Annotated of mapped coding genes	12,255	13,030

Table 5

Classes of genes identified in Hyaloperonospora brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)] as compared with Hyaloperonospora arabidopsidis.

Gene Category/Sample ID	H. arabidopsidis	Sample B	Sample C
RXLR effectors	94	382	490
Elicitors	12	22	19
Necrosis-inducing proteins	4	6	6
Pectate lyase	9	19	17
Pectinesterases	7	9	8
Glucanases/Cellulases	30	61	53
CRN proteins	ND	2	2
Cutinases	2	ND	ND
Serine-protease inhibitors	2	5	7
Short cysteine-rich proteins (SCR)	5	6	8
Peptidases/Metallopeptidases	105	171	185
Nudix domain containing protein	8	7	6
Other Pathogenic genes	38	27	29
Defence responsive genes	8	7	6
Avirulence (Avh) protein	6	5	4

Fig. 2

Venn Diagram comparing Hyaloperonospora arabidopsidis genes found in the genome assemblies of H. brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Almost 33 percent of the genes were common in all three assemblies. 16.7% of H. arabidopsidis were not found in H. brassicae assemblies.

Details of protein-coding genes in Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Classes of genes identified in Hyaloperonospora brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)] as compared with Hyaloperonospora arabidopsidis. Venn Diagram comparing Hyaloperonospora arabidopsidis genes found in the genome assemblies of H. brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Almost 33 percent of the genes were common in all three assemblies. 16.7% of H. arabidopsidis were not found in H. brassicae assemblies. Whole genome BLAST analysis conducted against the pathogen-host interaction (PHI) gene database facilitated identification of 6,969 (Sample B) and 7,116 (Sample C) genes putatively homologous to genes in the database. Best fits with five categories were detected. More than half of the genes, in both isolate Samples, were represented by “virulence” (i.e., disease-producing power) category followed by “pathogenicity” (i.e., ability to be pathogenic) (Fig. 3). Although most identified PHI tags were common among the two isolates, 190 and 271 tags were unique to Samples B and C, respectively. Sample C showed a marginally higher number of genes with reduced virulence compared to Sample B. Number of effectors or genes for avirulence was also higher in Sample C. Virulence-associated genes were further classified based on their specific function in the host-pathogen interaction and virulence/pathogenicity (Fig. 4). As expected, Samples B and C shared more common genes (95) among themselves compared to only 83 with H. arabidopsidis.

Fig. 3

Fig. 4

Venn Diagram showing overlap of virulence associated gene sets of Hyaloperonospora arabidopsidis with those of H. brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Distribution of the Pathogen-Host Interaction (PHI) genes Hyaloperonospora arabidopsidis in the Hyaloperonospora brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Venn Diagram showing overlap of virulence associated gene sets of Hyaloperonospora arabidopsidis with those of H. brassicae Sample B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Genomic Ontology (GO) annotations of Australian H. brassicae sequences

BLAST alignment of the assembled sequences conducted against the oomycete database in NCBI highlighted almost 65% sequences with positive BLAST hits in both Samples, and about 70% of those sequences were mapped but only 50% of positive BLAST hits could be linked to any Gene Ontology term after annotation. The remaining sequences could not be assigned to any GO term (Table 5). The sequences belonging to “virulence/pathogenicity” GO entries were extracted and classified on the basis of their specific function in the host-pathogen interaction and virulence/pathogenicity. The predicted gene models were analysed to identify gene functions. From the query sets of more than 37,000 genes, 12-13,000 genes were successfully annotated. GO terms for the annotated genes were placed into three basic categories: biological processes (BP), molecular function (MF), and cellular components (CC). Both H. Brassicae Samples (B and C) showed almost similar molecular functions (MF). However, our samples were distinctively different from the reference, H. arabidopsidis (classified as Sample A) in terms of carbohydrate derivative binding, catalytic activity on a protein, drug binding, heterocyclic compound binding, hydrolase activity, ion binding, organic cyclic compound binding, small molecule binding and transferase activity (Fig. 5). The two samples also differed distinctively for biological processes and cellular components. In BP, biosynthetic process, cellular metabolic process, nitrogen compound metabolic process, and primary metabolic processes were distinctively higher in less virulent Sample C with lower virulence/pathogenicity. Differences between Samples B and C were greater than between Samples C and A (Fig. 6). In CC, most of the GO-categories were represented equally among Samples B and C, but a few GO-classes such as host cellular components and intracellular anatomical structure were distinctively higher in Sample C. Although both the isolate samples did not differ significantly for most their cellular components (Fig. 7) or molecular functions (Fig. 5), that they did differ significantly for important biological processes and cellular components was a key finding. This finding opens the way to identify H. brassicae pathotypes prevailing in the field and to detect early any virulence changes within H. brassicae populations.

Fig. 5

Fig. 6

Distribution of BLAST2GO annotations of putative genes from Hyaloperonospora arabidopsidis and H. brassicae. The graph shows level 3 annotations for top 20 biological processes for H. arabidopsidis (A) and H. brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Fig. 7

Distribution of BLAST2GO annotations of putative genes from Hyaloperonospora arabidopsidis and H. brassicae. The graph shows level 3 annotations for top 20 cellular components for H. arabidopsidis (A) and H. brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Distribution of BLAST2GO annotations of putative genes from Hyaloperonospora arabidopsidis and H. brassicae. The graph shows level 3 annotations for top 20 molecular functions for H. arabidopsidis (A) and H. brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Distribution of BLAST2GO annotations of putative genes from Hyaloperonospora arabidopsidis and H. brassicae. The graph shows level 3 annotations for top 20 biological processes for H. arabidopsidis (A) and H. brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)]. Distribution of BLAST2GO annotations of putative genes from Hyaloperonospora arabidopsidis and H. brassicae. The graph shows level 3 annotations for top 20 cellular components for H. arabidopsidis (A) and H. brassicae Samples B [with high virulence, (B)] and H. brassicae Sample C [with low virulence, (C)].

Discussion

We believe that these studies are not only the first genome assemblies of H. brassicae, but, more importantly, the first comparing a virulent versus a less virulent sample. Together, they provide new understanding about the H. brassicae genome, its metabolic pathways and genes encoding proteins of relevance for virulence. While the two pathogen samples were similar for molecular functions, both biological processes (biosynthetic and catabolic process, cell communication, and cellular component organization and localization) and cellular components (catalytic complex, coated membrane, endomembrane system, endoplasmic reticulum membrane, and envelope) were distinctively different between the more virulent Sample B than the less virulent Sample C. Importantly, the genome sequences presented in the current study constitute an important resource for future functional genomic analyses, fostering DNA barcoding of H. brassicae isolates and the development of genetic markers (e.g. SNPs and short repeats) that could be used to assess the population diversity and to tag genes for virulence in the pathogen population by comparing highly virulent and non-virulent isolates. Oomycetes like Phytophthora spp. and Hyaloperonospora are causal organisms for many serious plant diseases, often with catastrophic consequences for crop productivity. Genome sequences of many Oomycetes are now available (e.g., Tyler et al. [44]; Hane et al. [45]; David et al. [46]). Tools for comparative analyses of Oomycete genomes are also now available to identify conserved and biologically important proteins (e.g., Rujirawat et al. [47]). These have provided new insights into the biology and evolution of virulence/pathogenicity. In this study, we present two genome assemblies for two samples of H. brassicae isolated from commercial rapeseed fields in Australia. Using SOAPdenovo2, ABySS V2.1, and Velvet V1.1 to perform de novo assembly and create a meta-genome assembly, using software Metassembler. We generated 6,438 scaffolds for Sample B with an N50 of 23,533 with the longest scaffold being 154,210 bp. In contrast, 6,470scaffolds were produced for Sample C with an N50 of 24,471 bp and highest scaffold length 197,135 bp. Metassembler has been widely used to develop meta-genome assemblies [48], [49], [50], [51], [52], [53], [54], [55]. The genome size of Sample C was 76.95Mb compared to 72.16Mb for Sample B. The N50 value of both samples approximated 24 Kb, indicating good quality for further downstream analysis. Using reference sequences for H. arabidopsidis, we predicted 36,819 and 40,346 protein-coding genes, for Samples B and C, respectively, including those associated with plant-pathogen interactions. Of the predicted genes only 12,255 and 13,030 could be mapped and annotated for Samples B and C, respectively. We achieved good assembly continuity using short-read data and annotated over 91%-92% of BUSCO Alveolata dataset set of single-copy orthologs, with assemblies relatively contiguous and with a negligible number of duplicated BUSCO genes. Comparative analysis of the genomic data obtained from two H. brassicae isolates revealed conserved and varied molecular machinery, underlying their physiological specialisation and infection capabilities. Several gene classes were identified. These included necrosis-inducing proteins (NLPs) produced by fungal pathogens that cause cell death while stimulating the plant's immune reaction (e.g., Feng et al. [56]). Oome et al. [57] showed that “downy mildew pathogen H. arabidopsidis encodes 10 noncytotoxic NLPs (HaNLPs) that do not cause necrosis” that “act as potent activators of the plant immune system in Arabidopsis thaliana.” Many gene classes we identified differed significantly between the model pathogen H. arabidopsidis and H. brassicae, in both identified class and numbers within each class. Of particular interest were the RXLR effectors that are primarily associated with the suppression of host immunity and play a major role in virulence for downy mildew species [58]. The predicted RXLR effectors and elicitors were more in the H. brassicae samples B and C compared with H. arabidopsidis. Zhang et al. [59] also reported RXLR effectors in P. sojae isolates, and of these, 42 core RxLR effectors were considered important for infection. In our study, the number of elicitors in both H. brassicae Samples B and C was higher than in H. arabidopsidis. Elicitors are pathogen signal metabolites recognized by plant cells, and most constitute Pathogen-Associated Molecular Patterns (PAMPs) (e.g., Fawke et al. [60]). Plant pathogens must overcome host resistance to successfully infect plants [61] and, toward this outcome, fungal pathogens produce pectin lyases (PL) to catalyse the depolymerisation of esterified pectin by a b-elimination mechanism [62]. The contribution of PL to infection by H. brassicae remains largely unknown, similar to other oomycete pathogens, despite PL having been extensively studied in major fungal pathogens (e.g., Fu et al. [63]). Pectinesterases are involved in plant tissue maceration and soft-rotting, and a few cellulases (glucanases/cellulases) of plant‐pathogenic fungi have been shown to be involved in pathogenicity (e.g., Eshel et al. [64]). A further example is the fungal pathogen Alternaria alternata that produces an endocellulase that is an important factor in disease development in persimmon fruit [64]. H arabidopsidis secretes proteins and fibrillar b-1,3-glucans that binds its germ tubes to the substratum [65], likely helping resist detachment by wind or rain and to help protect against desiccation [66]. Plant pathogens rely on virulence factors (effectors) to modulate host immunity and ensure progressive infection. The CRNs (i.e., CRinkling and Necrosis) are among the highly conserved protein families among secreted proteins. CRN proteins were first identified in the plant pathogenic Oomycetes where these are secreted and translocated inside host cells through a conserved N-terminal domain. Study of CRN functions may allow a better understanding of CRN effector biology and the processes contributing to host susceptibility and immunity [67]. Further, CRN effectors exhibit various pathogenic functions, including induction of Programmed Cell Death (PCD) and suppression of PCD via PAMP-triggered immunity or/and effector-triggered immunity [68]. Additionally, peptidases/metallopeptidases, enzymes that cleave peptide bonds, yielding proteins and peptides [69], also regulate/activate or inactivate target substrates via proteolysis [70]. Peptidases play an important role in plant-pathogen interactions, including nutrient acquisition and catabolic activities and can influence the outcome of plant-pathogen interactions. The first examples of avirulence proteins were from oomycete species such as Melampsora lini, Phytophthora sojae, P. infestans, and H. arabidopsidis [71]. Avirulence genes in pathogens directly or indirectly encode molecules that are recognized by plant receptors encoded by major resistance genes (R genes) triggering plant immunity. Avirulence proteins (Avh) are effectors involved in pathogenicity [72].

5. Conclusions

The genome assemblies presented in this study constitute critical genetic resources that can spur further research into the mechanisms of virulence caused by H. brassicae, a destructive but less understood pathogen of many crop species. Our genome sequences, while differing significantly to those of H. arabidopsidis, did show a large degree of consistency with those of H. arabidopsidis, at least to the extent these find use in population genomic studies for the identification of virulence haplotypes and alleles. Comparative analysis of annotated protein-coding genes obtained from two H. brassicae isolates revealed conserved and varied molecular machinery underlying the physiological specialisation and infection capabilities of this important pathogen. In particular, H. brassicae Samples A and B differed in terms of biological processes for their cellular components. Our studies will pave the way for early detection of any virulence changes within the pathogen populations, and perhaps even for molecular identification of pathotypes of this pathogen. Such information is crucial for H. brassicae as there is not only virulence/pathotype diversity among its populations, but that such diversity is rapidly evolving [19]. Further, genome sequencing or enrichment analysis may also be used to identify alleles associated with particular virulence/avirulence effects. However, while linkage mapping or genome-wide association mapping can enable more rapid generation of cultivars resistant to H. brassicae, more virulent newly evolving downy mildew genotypes will keep emerging [28]. Hence, developing markers for virulence/pathotype classification would allow rapid molecular identification and monitoring of the prevalent virulence potentials existing within pathogen populations in-field [28].

Credit authorship contribution statement

Ming Pei You: Conceptualization, Methodology, Investigation, Writing - original draft, Writing – review and editing (e.g., designed studies, carried out the DNA isolation and purification and jointly wrote the paper). Javed Akhatar: Methodology, Investigation, Data curation, Formal analysis (e.g., undertook genome assemblies and annotation and helped with their interpretation). Meenakshi Mittal: Methodology, Investigation, Data curation, Formal analysis (e.g., undertook genome assemblies and annotation and helped with their interpretation). Martin J. Barbetti: Conceptualization, Methodology, Investigation, Writing - original draft, Writing – review and editing (e.g., designed studies and jointly wrote the paper). Solomon Maina: Investigation, Conceptualization and Methodology as relates to initial DNA isolation and purification. Surinder S. Banga: Conceptualization, Methodology, Investigation, Writing - original draft, Writing – review and editing (e.g., undertook genome assemblies and annotation and their interpretation and jointly wrote the paper).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability statement

Genome assemblies have been deposited in GenBank via BioProject SUB5539449.

CRediT authorship contribution statement

Ming Pei You: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing. Javed Akhatar: Methodology, Investigation, Data curation, Formal analysis. Meenakshi Mittal: Methodology, Investigation, Data curation, Formal analysis. Martin J. Barbetti: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing. Solomon Maina: Investigation, Conceptualization, Methodology. Surinder S. Banga: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing.

45 in total

1. Scaffolding pre-assembled contigs using SSPACE.

Authors: Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal: Bioinformatics Date: 2010-12-12 Impact factor: 6.937

2. Rapid extraction of fungal DNA for PCR amplification.

Authors: J L Cenis
Journal: Nucleic Acids Res Date: 1992-05-11 Impact factor: 16.971

3. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors: Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal: Bioinformatics Date: 2015-06-09 Impact factor: 6.937

4. Diversity of the Hyaloperonospora parasitica complex from core brassicaceous hosts based on ITS rDNA sequences.

Authors: Young-Joon Choi; Seung-Beom Hong; Hyeon-Dong Shin
Journal: Mycol Res Date: 2003-11

5. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.

Authors: Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol
Journal: Genome Res Date: 2017-02-23 Impact factor: 9.043

6. Evolution and functional characterization of pectate lyase PEL12, a member of a highly expanded Clonostachys rosea polysaccharide lyase 1 family.

Authors: Lea Atanasova; Mukesh Dubey; Marica Grujić; Mikael Gudmundsson; Cindy Lorenz; Mats Sandgren; Christian P Kubicek; Dan Funck Jensen; Magnus Karlsson
Journal: BMC Microbiol Date: 2018-11-07 Impact factor: 3.605

7. Genome sequencing of oomycete isolates from Chile supports the New Zealand origin of Phytophthora kernoviae and makes available the first Nothophytophthora sp. genome.

Authors: David J Studholme; Preeti Panda; Eugenio Sanfuentes Von Stowasser; Mariela González; Rowena Hill; Christine Sambles; Murray Grant; Nari M Williams; Rebecca L McDougal
Journal: Mol Plant Pathol Date: 2018-12-05 Impact factor: 5.663

8. Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference.

Authors: Jun Takayama; Shu Tadaka; Kenji Yano; Fumiki Katsuoka; Chinatsu Gocho; Takamitsu Funayama; Satoshi Makino; Yasunobu Okamura; Atsuo Kikuchi; Sachiyo Sugimoto; Junko Kawashima; Akihito Otsuki; Mika Sakurai-Yageta; Jun Yasuda; Shigeo Kure; Kengo Kinoshita; Masayuki Yamamoto; Gen Tamiya
Journal: Nat Commun Date: 2021-01-11 Impact factor: 14.919

9. Characterization of necrosis-inducing NLP proteins in Phytophthora capsici.

Authors: Bao-Zhen Feng; Xiao-Ping Zhu; Li Fu; Rong-Fei Lv; Dylan Storey; Paul Tooley; Xiu-Guo Zhang
Journal: BMC Plant Biol Date: 2014-05-08 Impact factor: 4.215

Review 10. Specificity of peptidases secreted by filamentous fungi.

Authors: Youssef Ali Abou Hamin Neto; Nathália Gonsales da Rosa Garzon; Rafael Pedezzi; Hamilton Cabral
Journal: Bioengineered Date: 2017-09-21 Impact factor: 3.269