Literature DB >> 31253647

Population Genome Sequencing of the Scab Fungal Species Venturia inaequalis, Venturia pirina, Venturia aucupariae and Venturia asperata.

Bruno Le Cam¹, Dan Sargent², Jérôme Gouzy³, Joëlle Amselem⁴, Marie-Noëlle Bellanger⁵, Olivier Bouchez⁶, Spencer Brown⁷, Valérie Caffier⁵, Marie De Gracia⁵, Robert Debuchy⁸, Ludovic Duvaux⁵, Thibaut Payen⁵, Mélanie Sannier⁵, Jason Shiller⁵, Jérôme Collemare^5,9, Christophe Lemaire⁵.

Abstract

The Venturia genus comprises fungal species that are pathogens on Rosaceae host plants, including V. inaequalis and V. asperata on apple, V. aucupariae on sorbus and V. pirina on pear. Although the genetic structure of V. inaequalis populations has been investigated in detail, genomic features underlying these subdivisions remain poorly understood. Here, we report whole genome sequencing of 87 Venturia strains that represent each species and each population within V. inaequalis We present a PacBio genome assembly for the V. inaequalis EU-B04 reference isolate. The size of selected genomes was determined by flow cytometry, and varied from 45 to 93 Mb. Genome assemblies of V. inaequalis and V. aucupariae contain a high content of transposable elements (TEs), most of which belong to the Gypsy or Copia LTR superfamilies and have been inactivated by Repeat-Induced Point mutations. The reference assembly of V. inaequalis presents a mosaic structure of GC-equilibrated regions that mainly contain predicted genes and AT-rich regions, mainly composed of TEs. Six pairs of strains were identified as clones. Single-Nucleotide Polymorphism (SNP) analysis between these clones revealed a high number of SNPs that are mostly located in AT-rich regions due to misalignments and allowed determining a false discovery rate. The availability of these genome sequences is expected to stimulate genetics and population genomics research of Venturia pathogens. Especially, it will help understanding the evolutionary history of Venturia species that are pathogenic on different hosts, a history that has probably been substantially influenced by TEs.

Entities: Chemical Disease Gene Species

Keywords: Fusicladium; Venturia; apple; apple scab; effectors; formae specialis; pear; transposable elements

Mesh：

Year: 2019 PMID： 31253647 PMCID： PMC6686934 DOI： 10.1534/g3.119.400047

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Fungal species from the Venturia Sacc. genus (anamorph is Fusicladium Bonord.; Beck ) are pathogenic on important Rosaceae host crops including apple and pear (Sivanesan, 1977). They are responsible for scab diseases which dramatically reduce the economic value of fruits by causing black lesions on the skin (Bowen ). Taxonomically, scab fungi belong to the Dothideomycetes class and to the order Venturiales within the Venturiaceae family (Zhang ). Scab caused by Venturia inaequalis Cooke (Wint.) is the main fungal disease on apple, control of which may require over 20 chemical treatments each year. V. inaequalis has been extensively studied as a model for pathogens on woody hosts, particularly because it is amenable to in vitro crossing (Bénaouf and Parisi, 2000). V. inaequalis has been described on several genera of Rosaceous including Malus, Pyracantha, Eriobotrya and Sorbus (Sivanesan 1977; Jones and Aldwinckle 1990). In some cases, host specificity occurs; for instance strains from Pyracantha are not able to infect Malus and vice-versa, which is referred to as formae specialis f. sp pyracantha and f.sp pomi, respectively (Le Cam ). Worldwide sampling of V. inaequalis strains has identified divergent populations, mostly reflecting their hosts: wild apple (Malus sieversii, M. orientalis, M sylvestris, M. floribunda) and domesticated apple (Malus x domestica) trees (Gladieux ; Gladieux ; Gladieux ; Leroy ; Ebrahimi ; Leroy ). Certain populations have experienced a secondary contact with subsequent gene flow, resulting in the introduction from a wild population into the agricultural compartment of strains that are virulent on resistant cultivars carrying the Rvi6 resistance gene (Guérin ; Lemaire ). On apple, scab can also be caused by another Venturia sp., V. asperata initially described as a saprophyte (Samuels and Sivanesan, 1975), and recently reported as a pathogen on apple trees (Caffier ). Other Venturia species cause scab damages on their respective Rosaceae hosts, V. pirina (Aderh.) on European pear (Pyrus communis), V. nashicola on Asian pear (Pyrus pyrifolia), and V. carpophila on apricot (Crous ; Sivanesan 1977). Other scab fungi are pathogenic on various tree hosts, including Fusicladium effusum on pecan, F. oleaginum on olive, V. populina on poplar, V. saliciperda on willow and V. fraxini on ash, among many others (Sivanesan 1977; Zhang ; Young ). Despite being economically important pathogens, the genetic basis of the interactions between scab fungi and their respective hosts remains poorly understood (Singh ; Steiner and Oerke 2007; Kucheryava ; Bowen ; Mesarich ; Cooke ; Shiller ; Deng ). Genome sequences are already available for V. inaequalis f. sp. pomi, V. inaequalis f. sp. eriobotrya, V. pirina and F. effusum, and they revealed that V. inaequalis strains tend to have a higher repeat content compared to other species, with up to 47% of their genome sequence covered by repetitive elements (Shiller ; Bock ; Deng ; Passey ). These studies highlighted in Venturia genomes the expansion of putative effector gene families, genes that are known to determine plant-fungus interactions (Shiller ; Deng ). The high repeat content and the employed Illumina sequencing strategy resulted in assemblies with between 1,012 and 1,680 scaffolds (Shiller ; Deng ) and up to 231 contigs using the PacBio technology (Passey ). Genetic studies need higher quality genome assemblies and genomic population studies require re-sequencing of many strains from diverse populations. Here, we report the PacBio genome assembly of the V. inaequalis EU-B04 reference strain and re-sequencing of 79 strains from two formae specialis and from diverse populations within V. inaequalis f. sp. pomi. We also present re-sequencing of V. pirina strains and whole genome sequencing of V. aucupariae and V. asperata. Availability of these genome sequences will greatly facilitate genetic and genomic population studies to better understand the evolution of Venturia pathogens and their interactions with different hosts.

Materials and Methods

Origin of Venturia strains

All 87 Venturia strains used in this study are described in Table S1. Forty strains of V. inaequalis sampled in Kazakhstan on wild apple M. sieversii were used, of which half belong to the wild CAM (Central Asian Mountains) population and the other half to the domestic CAP (Central Asian Plains) population as described in Gladieux . Other V. inaequalis strains were isolated from M. x domestica cultivars that do not carry the Rvi6 resistance gene (6 domEU and 2 domASIA), from M. x domestica cultivars carrying the Rvi6 resistance gene (5 domRvi6), from ornamental M. floribunda (4 FLO), from wild M. sylvestris (6 SYL) and M. orientalis (6 ORI), from Pyracantha (5 PYR), from E. japonica (5 LOQ) and from Sorbus aucuparia (1). In addition, strains of V. pirina (3), V. aucupariae (1) and V. asperata (3) were isolated from Pyrus communis (pear), S. aucuparia and M. x domestica, respectively (Table S1). All strains were grown at 18° on cellophane membranes displayed on PDA-YE (Potato-Dextrose Agar (BD Difco) supplemented with 0.3% Yeast Extract (Sigma-Aldrich)) plates with a 12h/12h light/night regime. They are stored at -20° on dried cellophane membranes in the Venturia collection of the Institut de Recherche en Horticulture et Semences (IRHS, Beaucouzé, France).

Genomic DNA extraction, library construction and DNA sequencing

High molecular weight genomic DNA from V. inaequalis EU-B04 strain was extracted from mycelium grown on PDA-YE covered with cellophane membrane using the method described in Cheeseman (extraction with CTAB followed by purification on an isopycnic gradient), except that ethidium bromide was used instead of DAPI for the purification of genomic DNA on a cesium chloride gradient. Ethidum bromide was preferred because it resulted in a single band of genomic DNA, while DAPI resulted in the dispersion of genomic DNA in several bands of different AT contents. Genomic DNA quantity and quality were validated at the GeT-PlaGe core facility (Toulouse, France) using Qubit (Thermofisher), Nanodrop (Thermofischer), and Fragment Analyzer (Advanced Analytical) before purification on 0.5X AMPure PB beads and shearing at 40 kb using the Megaruptor system (Diagenode). Libraries were prepared according to the “20-kb Template Preparation Using BluePippin (Sage Science) Size-Selection System (15-kb size cutoff)” PACBIO protocol using 10 µg of sheared genomic DNA and the SMRTBell template Prep Kit. DNA fragments from 10 kb to 90 kb were selected. The sequencing run was performed on a PacBio RSII sequencer with the OneCellPerWell Protocol (OCPW) on C4 P6 chemistry and 360 min movies. Another PacBio sequencing was performed at the Cold Spring Harbor core genomics facility (CSHL, NY, USA); genomic DNA was sheared and purified, and a long-insert library was prepared following the protocol described in Quail . The library was sequenced on four SMRT cells using the C3 P6 chemistry using 2x 60 min movies. Genomic DNA of all other Venturia strains was extracted from mycelium grown on PDA-YE plates using Möller’s protocol (Möller ). Libraries for CAM and CAP strains were prepared using SPRIworks- Fragment Library System I (Beckman Coulter) and sequenced at GATC Biotech (Konstanz, Germany) on HiSeq 2000 using TruSeq Multiplex Sequencing Primer and TruSeq PE (2x50bp) (Illumina, San Diego, CA). Libraries for the remaining strains were prepared using the TruSeq Nano kit (Illumina, San Diego, CA) and sequenced at the GeT-PlaGe core facility (Toulouse, France) using SBS v3 (2x100 pb) or v4 (2x125 pb) reagent kits and using HiSeq 2000 or HiSeq 2500. For library construction, genomic DNAs were fragmented by sonication and sequencing adaptors were ligated. Libraries were amplified with eight PCR cycles; their quality was assessed using a Bioanalyzer (Agilent) and quantity measured by quantitative PCR using the Kapa Library Quantification Kit (KapaBiosystems).

Genome assembly

For the PacBio genome, the PBcR wgs8.3rc2 assembly pipeline (Berlin ) was used to perform read correction and assemble the corrected reads. The high sequencing depth and quality of raw data allowed modifying several parameters to increase stringency: i) only raw reads longer than 5 kb were considered; ii) 50x of corrected reads were collected (assembleCoverage = 50); iii) the runCA part of the pipeline was run with a minimum overlap length of 1 kb (ovlMinLen = 1000); and iv) the unitigger bogart was tuned to consider overlaps with a very low error rate only (utgErrorRate = 0.0075 and utgMergeErrorRate = 0.04). Spurious contigs that were fully included in larger contigs were removed using a script based on ncbi-megablast to perform the all-vs-all analysis, combined with additional filters (minimum identity of 93%, minimum overlap of 500 nt and maximum overhang length of 500 nt). Bacterial contaminant contigs were identified using ncbi-blastx search against the NR database (downloaded in May 2016) and manual inspection. A total of 46 contigs were removed (maximum length of 113 kb) based on best hits with bacterial proteins confirmed with a GC percent analysis. Finally, the genome was polished with quiver (Chin ) using stringent alignment cutoffs for pbalign (–minLength 3000–maxHits 1), ensuring a mapping as specific as possible. Assemblies of other genomes were performed by two algorithms: SOAPdeNovo2 (Zerbino and Birney 2008) and Velvet (Luo ) adapted for de novo genome assembly using next-generation sequencing (NGS) short reads. Genome statistics were determined using Quast 2.3 (Gurevich ). The GC content and distribution in each genome was analyzed using OcculterCut (Testa ). Completeness of genome assemblies was assessed using BUSCO v1.1b1 (Simão ) and the fungal BUSCO profile (busco.ezlab.org/v1/files/fungi_buscos.tar.gz). Fungal BUSCO genes are expected to be core genes, i.e., they are present in most fungal species. Thus, the absence of BUSCO genes in a genome assembly is indicative of an assembly that is likely not complete. The program was run in “genome assembly” mode using the metaparameters of Fusarium graminearum for AUGUSTUS gene prediction.

SNP discovery, false discovery analysis and population structure of Venturia spp.

For all strains, genome sequencing reads were cleaned using Trimmomatic 0.36 (Bolger ) before mapping on the EU-B04 PacBio reference genome using BWA-MEM (Li and Durbin 2009). For species other than V. inaequalis, read mapping was refined with Stampy 1.0.29 (Lunter and Goodson 2011) using the parameter “–bamkeepgoodreads” and the parameter “–substitutionrate” was set to 0.15, 0.075, and 0.07 for V. asperata, V. pirina and V. aucupariae, respectively. Multisample SNP discovery was performed using the Unified Genotyper module of GATK 3.6 (McKenna ), applying indel realignment and duplicate removal according to GATK Best Practices recommendations (DePristo ; Van der Auwera ). Instead of applying any posterior quality score recalibration and standard hard filtering, we performed a SNP false discovery rate (FDR) analysis making use of clones present in our dataset (Table S1). The FDR was estimated by simply counting the number of SNPs for each pair of clones. All steps but the Trimmomatic, Stampy and FDR analysis were integrated in a pipeline set using the TOGGLE toolbox (Monat ). In order to assess the discrimination power of our resulting true positive SNP dataset, we computed a PCA of all the non-clonal Venturia inaequalis strains using PLINK v1.90b3.44 (Purcell ).

RNA extraction and sequencing

V. inaequalis EU-B04, V. pirina VP36 and V. asperata 2349 strains were grown on seven different agar media: PDA-YE, apple medium (10% organic apple sauce in water), and Vogel media (Vogel, 1956) without carbon or nitrogen source, or supplemented with 1% glucose or 1% saccharose (with and without 80 µM zebularin). Total RNAs were extracted using the kit Nucleospin RNA Plant (Macherey-Nagel, Düren, Germany), according to the manufacturer’s recommendations. Total RNAs from complex media and from defined media were combined in two respective pools. Paired-end 2x150 bp libraries were prepared using the Illumina TruSeq Stranded mRNA kit (Illumina, San Diego, CA) and sequenced at the GeT-PlaGe core facility (Toulouse, France) using a paired-end read length of 2x150 pb with the Illumina HiSeq3000 sequencing kits and using HiSeq 3000. Libraries were prepared according to Illumina’s protocols on a Tecan EVO200 liquid handler. Messenger RNAs were selected using poly-T beads, fragmented to generate double stranded cDNA; and then sequencing adaptors were ligated. Libraries were amplified with ten PCR cycles. Their quality was assessed using a Fragment Analyzer (Advanced Analytical) and quantity measured by quantitative PCR using the Kapa Library Quantification Kit (KapaBiosystems).

Flow cytometry

Analyses were performed according to Marie and Brown (1993). About 1 cm2 of mycelium grown on PDA-YE plate was scraped and resuspended in 500 µL of cold nuclear buffer (45 mM MgCl2, 30 mM sodium citrate, 20 mM MOPS pH 7, 0.02% Triton X-100, 1% (w/v) cold polyvinyl pyrrolidone, 5 mM sodium metabisulfite, 0.05 mg/mL RNase (Roche, Basel, Switzerland)). A ligule of Arabidopsis thaliana cv. Columbia-0 was added as an internal standard. Samples were cut with a razor blade, filtered with 30 µm CellTrics paper (Sysmex Partec, Kobe, Japan) and stained with 50 µg/mL propidium iodide (Fluka). After 5 min incubation at 4°, cytometric analysis was performed on 20,000 nuclei using a CyFlo3 cytometer (Sysmex Partec, Kobe, Japan). Intensity ratio between 1C fungal nuclei and 2C (= 0.33 pg, corresponding to 326 Mbp) A. thaliana nuclei was determined to measure the size of fungal genomes (Dolezel ).

Gene and transposable element prediction and annotation

Genes of the EU-B04 reference genome were predicted using the Braker1 pipeline (Hoff ). RNA-alignments to the EU-B04 genome from all the in vitro conditions described above were used to provide extrinsic hints for gene prediction in this pipeline. Genes in all other genomes were predicted using AUGUSTUS version 3.2.3 (Stanke ) utilizing the species training parameter files created in the Braker1 pipeline above. Secreted proteins in the predicted proteome were annotated with SignalP version 4.1 (Petersen ) using default parameters. Small-secreted proteins (SSPs) were identified as secreted proteins with a maximum of 300 amino acids, containing no Glycosylphosphatidylinositol (GPI) anchor motif according to Big-PI (default parameters; Eisenhaber ) and PredGPI (False-Positive rate lower than 1e-3; Pierleoni ), and no transmembrane domain in the mature form according to TMHMM (default parameters; Krogh ). The REPET pipeline (urgi.versailles.inra.fr/Tools/REPET) was used for the detection, classification (TEdenovo; Flutre ; Hoede ) and annotation (TEannot; Quesneville ) of transposable elements (TEs). The TEdenovo pipeline was used to search for repeats in V. inaequalis EU-B04 genome sequence. The first step used Blaster with the following parameters: identity > 90%, HSP (high-scoring segment pairs) length >100 bp and <20 kb, e-value ≤ 1e-300. HSPs were clustered using three different methods: Piler (Edgar and Myers, 2005), Grouper (Quesneville ) and Recon (Bao and Eddy, 2002). The MAP software (Huang, 1994) was used to make multiple alignments of the 20 longest members of each cluster containing at least 3 members and build a consensus. Consensus sequences were then classified on the basis of their structure and similarities relative to Repbase Update (v17.11) (Jurka ) and PFAM domain library v2 (Finn ), before redundancy removal with Blaster and Matcher as performed in the TEdenovo pipeline. Consensus sequences with no known structure or similarity were classified as “unknown”.

Phylogenetic species tree

All strains of V. pirina, V. asperata and V. aucupariae, and one isolate from each population of V. inaequalis were used to build a phylogenetic tree. The Dothideomycetes Zymoseptoria tritici IPO323 and Alternaria alternata SRC1lrK2f (genome.jgi.doe.gov) were used as outgroups. The DNA sequence of β-tubulin TUB1 (KOG1375), eukaryotic initiation factor 3, subunit b eIF3b (KOG2314), phosphoadenylphosphosulfate reductase PAP (KOG0189) and ribosomal protein S9 RPS9 (KOG1753) genes were retrieved in all genomes and each were aligned using MUSCLE (Edgar, 2004). Regions with low quality alignment were removed using Gblocks (default parameters; Castresana, 2000) prior to concatenation of the four alignments. A maximum-likelihood tree was built with MEGA6 (Tamura ) using the best-fit Tamura-Nei substitution model with 5 gamma distribution categories and 500 bootstrap replicates.

Data availability

All genome sequencing data generated in this study are accessible (assembly fasta files) in GenBank under the BioProject number PRJNA407103. Supplemental material available at FigShare: https://doi.org/10.25387/g3.7346714.

Results and Discussion

Phylogeny of Venturia species

Phylogenetic relationships between the different Venturia species were determined using the Dothideomycete species Zymoseptoria tritici and Alternaria alternata as outgroup to the Venturiales order (Zhang ; Bock ). The maximum likelihood tree shows that V. asperata is basal to the other Venturia species (Figure 1A). V. aucupariae has diverged more recently from V. inaequalis than V. pirina. Phylogenetic relationships among V. inaequalis strains from different populations cannot be accurately resolved, probably because of gene flow between populations (Gladieux ; Lemaire ; Leroy ).

Figure 1

Phylogeny of Venturia species and genome characteristics. A) A four-gene maximum-likelihood phylogenetic tree was built using the Dothideomycete species Alternaria alternata and Zymoseptoria tritici as outgroup. Numbers on the branches indicate bootstrap values. The name of the isolates indicates the species (Vina: V. inaequalis; Vauc: V. aucupariae; Vpir: V. pirina; Vasp: V. asperata), name in the collection and, for V. inaequalis isolates, the population of origin (see Table S1). B) Assembly size is shown for isolates used in the phylogenetic analysis. The size of the assembly covered by transposable elements was determined using the REPET pipeline (urgi.versailles.inra.fr/Tools/REPET). Red lines indicate actual genome sizes as determined by flow cytometry. Stars indicate that the genome size value is the average of other isolates from a given population. C) The GC composition in each genome was determined using OcculterCut (Testa ). D) Gene models were predicted using Braker1 (Hoff ). Small-secreted proteins (SSPs) contain a predicted leader signal peptide, do not contain a transmembrane domain or GPI-anchor signal, and are shorter than 300 amino acids. Numbers on the right indicate the fraction of the predicted proteome that is represented by SSPs.

Genome assemblies

The assembly size of the different V. inaequalis strains varies greatly from 30 to 73 Mb, with a median of 7,435 scaffolds (from 66 to 31,301), an N50 median of 55.7 kb and an estimated sequencing coverage ranging from 8.8x to 150x (Figure 1B and Table S1), consistent with previous sequencing of V. inaequalis strains (Shiller ; Deng ; Passey ). As expected, the best assembly result was obtained for the EU-B04 reference strain with PacBio sequencing (66 scaffolds, N50 = 2,978 kb). Genomic DNA library construction likely explains the variability in assembly size as certain protocols might have excluded long repetitive regions. It is especially the case of CAM and CAP strains that both exhibit a genome size half the size of genomes of strains found on apple trees. In this case, library construction kits included PCR amplification that might have filtered out the major part of AT-rich regions (Aird ), a bias already reported in other fungi harboring an AT-rich genome (Oyola ). The actual genome size of V. inaequalis strains is over 80 Mb as measured using flow cytometry (average size of 85.1 ± 5.0 Mb; Figure 1B and Table S1). The measured genome sizes suggest that scaffolds in the PacBio assembly are separated by extremely long stretches of repetitive elements. The average assembly size for V. aucupariae, V. pirina and V. asperata strains is 52, 38 and 34 Mb, respectively (Figure 1B and Table S1). The genome of these species appeared to really contain fewer repeats as library construction and sequencing strategy for these genomes was similar to those of V. inaequalis genomes for which repetitive elements were sequenced. However, flow cytometry determined genome sizes of 63.6 ± 1.6 Mb, 70.6 ± 2.7 Mb and 46.1 ± 1.2 Mb for V. aucupariae, V. pirina and V. asperata, respectively (Figure 1B and Table S1). The former two species appear to contain a significant amount of repetitive elements that could not be deduced from their genome assemblies likely because Illumina sequencing tends to collapse repeat regions (Shiller ; Deng ). The genome structure of V. inaequalis is also a mosaic of GC-equilibrated and AT-rich regions as shown for the EU-B04 reference isolate (Figure 2). Indeed, the GC-content distribution in all genomes exhibits a bimodal pattern typical of genomes with large proportions of AT-rich regions (Figure 1C; Testa ). These regions cover on average 45.3% of V. inaequalis genome assemblies with a size over 60 Mb, while they cover on average 32.5%, 6.6% and 9.9% of V. aucupariae, V. pirina and V. asperata assembled genomes, respectively (Table S1). Considering the measured genome sizes, these values are likely underestimated and AT-rich regions may represent from 30% in V. asperata to 50% in V. pirina. The GC content of AT-rich regions in Venturia species ranges from 33.5 to 41.3% (Table S1), which is similar to the 33.8% reported for Leptosphaeria maculans (Testa ), another species with a mosaic genome structure of GC-equilibrated and AT-rich regions (Rouxel ).

Figure 2

Genome features of the Venturia inaequalis EU-B04 reference strain. Twenty-nine out of 66 scaffolds are shown, representing 95% of the total assembly. Scaffold number is indicated on the outer circle. Tracks show (A) Single-Nucleotide Polymorphism (SNP) density between isolates (black) and clones (yellow); (B) GC content (%), the orange line indicate 50%; (C) Gene density according to Braker1 gene prediction, magenta circles indicate the localization of small secreted proteins (SSPs). (D) transposable element density according to REPET analysis. (E) RIP (Repeat Induced Point mutation) index). (F) close-up on scaffold 1 showing SNP densities and GC content as described in (A) and (B) tracks, respectively, the dotted blue line indicating the GC content threshold used by OcculterCut (Testa ) to separate GC-quilibrated and AT-rich regions; the latter are highlighted with gray shading. (G) Difference in di-nucleotide frequency between GC-equilibrated and AT-rich regions in genomes of all isolates as determined by OcculterCut. Negative and positive values indicate over-representation in GC-equilibrated and AT-rich regions, respectively. Values for the EU-B04 strain reference assembly are indicated in yellow. Despite the difference in assembly size, the BUSCO assessment of gene content indicates a similar and very high level of completeness for most strains (median of 98% complete single copy genes). This result also indicates that most of the predicted genes are located in GC-equilibrated regions, which represent the most well sequenced and assembled in our dataset. Six strains show a retrieval rate of BUSCO genes lower than 90% (Table S1), meaning that genome sequencing and/or assembly quality is very low and these genomes should be used with caution in further analyses.

SNP discovery and population structure of Venturia inaequalis

Using multiple sequence alignments of the identified BUSCO genes, we computed genetic distances between isolates and identified six pairs of strains and one triplet with very low divergence, suggesting they were clones (2225/2446, 2226/2450, 2447/2458, 2474/2475, 2478/2480, 186/2269 and 2503/2504/2263). Consistently, fewer SNPs were discovered between these pairs compared to other strains from the same populations (Table 1). We could use this information to perform a SNP false discovery analysis because we expect de novo mutations to be rare between clones of V. inaequalis as it reproduces sexually each year (Bowen ). Our data suggests that 25–50% of SNPs identified without masking genomes are false positives (Table 1). About 90% of SNPs identified between clones are located throughout the genome within AT-rich regions (Table 1, Figure 2A, 2D and 2F). Reads that originate from repetitive elements sharing high sequence similarity are likely to be inaccurately mapped on the reference genome, which could explain the vast majority of SNPs observed between clones. In contrast, GC-equilibrated regions appear to contain a higher proportion of SNPs discovered between non-clonal strains, which are enriched on certain scaffolds (Table 1, Figure 2A and 2F). Masking AT-rich regions as determined by OcculterCut eliminated between 88.7% and 96.6% of false positive SNPs for a saving of 16.5–48.4% of the initial SNPs between non-clonal strains (Table 1). The proportion of retained SNPs reached 52% in the global dataset of 12.7 million SNPs identified between all strains. The fact that GC-equilibrated regions still contain 3.4–11.3% of false positive SNPs is likely explained by the expansion of gene families through gene duplication in the genome of Venturia species (Table 1; Shiller ; Deng ). This observation suggests that masking AT-rich regions or transposable elements alone often may be not enough to reach an acceptable FDR. As reported here, sequencing two clones or twice the same individuals easily allows detecting regions with high level of FDRs and is of great help to define hard filtering and/or SNP recalibration settings, especially for highly repetitive or polyploid genomes of sexual organisms.

Table 1

Single Nucleotide Polymorphism (SNP) false discovery analysis in Venturia inaequalis populations. Isolates originate from populations isolated on loquat (LOQ), firethorn (PYR), Malus sylvestris (SYL) and on Malus sieversii in Central Asian Mountains (CAM) and Central Asian Plains (CAP)

	LOQ population			CAM population			CAP population			SYL population			PYR population
	2503	2503	FDR (a)	2225	2225	FDR (a)	2475	2229	FDR (a)	2478	2478	FDR (a)	186	2266	FDR (a)
	2504(c)	2505		2446(c)	2447		2474(c)	2474		2480(c)	2479		2269(c)	2269
	Total number of SNPs	130,507		333,955	39.1%		94,845	341,581		27.8%	125,654		522,134	24.1%		120,622	393,845	30.6%	125,196	257,874	48.5%
SNPs in AT-rich region (b)	120,670	201,303	59.9%	88,345	205,100	43.1%	116,931	269,189	43.4%	113,284	328,956	34.4%	118,700	196,239	60.5%
SNPs in GC-equilibrated regions (b)	9,837	132,652	7.4%	6,500	136,481	4.8%	8,723	252,945	3.4%	7,338	64,889	11.3%	6,496	61,635	10.5%

FDR: False Discovery Rate.

GC-equilibrated and AT-rich regions (51.4% and 48.6% of the EU-B04 reference genome, respectively) were determined using OcculterCut (Testa ).

indicate clonal isolates.

FDR: False Discovery Rate. GC-equilibrated and AT-rich regions (51.4% and 48.6% of the EU-B04 reference genome, respectively) were determined using OcculterCut (Testa ). indicate clonal isolates. Once cleared of the false positive SNPs, our dataset allowed a clear discrimination of the V. inaequalis strains according to their host of origin as shown by a principle component analysis (Figure 3). This result is highly congruent with our phylogenetic analysis (Figure 1A) and with previous studies (Gladieux ).

Figure 3

Population structure of Venturia inaequalis. Principle components analysis of 71 V. inaequalis strains sampled on Malus sp. (green), Eriobotrya japonica (in red), Pyracantha sp. (blue), and Sorbus aucuparia (purple). The first axis differentiates strains sampled on apple trees. The second axis differentiates strains sampled on Pyracantha sp. from those sampled on E. japonica and S. aucuparia.

Gene predictions

Most of V. inaequalis, V. pirina and V. aucupariae strains contain around 11,600 predicted genes, while V. asperata strains contain fewer predicted gene models, around 10,400 (Figure 1D and Table S1). Virulence of plant pathogens relies on the secretion of small-secreted proteins (SSPs) that are often rich in cysteine residues (Lo Presti ). The number of predicted SSPs of less than 300 amino acids ranges between 527 and 756 in V. inaequalis and V. pirina strains, representing between 4.8% and 6.1% of the predicted proteome (Figure 1D and Table S1). The PacBio assembly of the V. inaequalis EU-B04 isolate certainly explains the much higher number of predicted SSPs and percentage of the proteome. Despite a similar total number of predicted genes, the V. aucupariae isolate contains fewer SSPs (461; Figure 1D). In addition to a slightly reduced gene complement, V. asperata strains contain fewer predicted SSPs that represent less than 4% of the predicted proteome (between 395 and 427; Figure 1D and Table S1). Detailed analysis of the V. inaequalis and V. pirina predicted secretome identified specific SSPs putatively involved in host specificity and recognition by apple resistant genes (Deng ). Similarly, in our report, differences in gene complement might explain the host range of the different species, especially the host specialization of V. inaequalis formae specialis and the low virulence of V. asperata on apple (Caffier ). In the PacBio assembly, genes tend to be located within GC-equilibrated regions (Figure 2B and 2C). Although SSPs of V. inaequalis were reported closer to repetitive elements compared to other genes (Deng ), SSPs do not tend to be located within long AT-rich regions as observed in L. maculans (Figure 2C and 2D; Rouxel ). Further detailed analyses of specific gene families, especially SSPs, carbohydrate degrading enzymes and secondary metabolite gene clusters will provide new insights about the pathogenicity and host specificity of Venturia species.

Invasion of transposable elements

The repetitive element content across all genomes was annotated using the REPET pipeline (Figure 1B; Table S1). The transposable element (TE) content covers 55.7% of the reference EU-B04 genome assembly, of which 66.4% are Class I LTR element, mainly assigned to Gypsy or Copia superfamilies (37.5% and 26.3%, respectively); and 9.5% are Class II elements, mostly belonging to the TIR superfamily (8.2%) (Figure S1). Consistent with the exclusion of repeat content from CAM and CAP populations DNA libraries, only a median of 8.5% of the assemblies are covered with TEs (Figure 1B and Table S1). In other V. inaequalis genomes, the median coverage of TEs is 41.7% (Figure 1B and Table S1). TE coverage in the V. aucupariae assembly is similar, with 17.4% (Figure 1B and Table S1). In contrast, TE coverage is very limited in V. pirina (4.1% on average) and V. asperata (3.4% on average), consistent with the exclusion of repeat content from genomic assemblies of the former and the limited number of TEs in the latter (Figures 1B and Table S1). The repetitive content of V. asperata strains differs from others as it does not contain Helitron and LINE elements, and is almost devoid of Copia elements (Figure S1 and Table S2). Gypsy and Copia TEs are predominant in all other Venturia strains. Notably Copia elements and class II TEs, especially TIR elements, appear to have greatly expanded after the divergence with V. asperata (Table S2 and Figure S1). Hierarchical clustering of the coverage of TE models annotated y REPET suggests recent expansion of 32 Copia and Gypsy models in all Venturia species (Table S3 and Figure S2); particularly two models (RLG-comp_VeinUTG-B-G309-Map3 and noCat_VeinUTG-B-R23-Map20) exhibits high coverage and are likely still actively transposing (Table S3 and Figure S2). A second cluster of 31 models might represent ancestral expansion of currently inactive TEs and a third cluster contains models with limited coverage (Figure S2). A few elements appear to be actively multiplying in certain genomes. For example, one Gypsy element (RLG-comp_VeinUTG-B-G1005-Map14) exhibits higher coverage in V. asperata, V. aucupariae and, to a lesser extent, in V. pirina compared to V. inaequalis strains (Table S3 and Figure S2), while another Gypsy element (RLG-comp_VeinUTG-B-R34-Map20) and a LARD element (RXX-LARD_VeinUTG-B-G646-Map11) show highest coverage in V. inaequalis 2199 and V. inaequalis B04 strains, respectively (Table S3 and Figure S2). In the V. inaequalis B04 reference genome, TEs are mostly located within AT-rich regions (Figure 2B and 2D), but many appear to be located in close vicinity to genes as reported for SSPs (Deng ). In fungi, TE activity can be suppressed by Repeat-Induced Point mutation (RIP), a process that induces in repetitive DNA cytosine to thymine transition mutations during heterokaryon formation prior to meiosis (Watters ). As such, RIP is a mechanism that produces AT-rich regions. Evaluation of RIP in Venturia strains showed that most TEs have been mutated through RIP (Figure 2E and Table S1). AT-rich regions in all strains are enriched in TpA di-nucleotides (Figure 2G), a feature typical of RIP in non-coding regions (Testa ). Detailed analysis of the dynamics of TEs will determine their impact on the different interactions between Venturia species and their diverse hosts.

Conclusions

Strains of the Venturia species complex are responsible for important diseases on Rosaceae fruit trees, as such they constitute an environmental and public health concern due to the massive use of fungicides necessary for their control. Our study provides valuable genomic resources for those interested in identification of molecular basis of Venturia species adaptation to their host and in mapping traits of interest (virulence, aggressiveness, resistance to fungicides etc…). In addition, these genomes will be used for comparative genomic studies to develop diagnostic molecular tools and to understand the evolution of genomes in relation to their hosts and in particular during the domestication of the apple tree.

67 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. A simple and efficient protocol for isolation of high molecular weight DNA from filamentous fungi, fruit bodies, and infected plant tissues.

Authors: E M Möller; G Bahnweg; H Sandermann; H H Geiger
Journal: Nucleic Acids Res Date: 1992-11-25 Impact factor: 16.971

3. MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors: Robert C Edgar
Journal: Nucleic Acids Res Date: 2004-03-19 Impact factor: 16.971

4. Structure, dynamics and domain organization of the repeat protein Cin1 from the apple scab fungus.

Authors: Carl H Mesarich; Michael Schmitz; Pierre Tremouilhac; Duncan J McGillivray; Matthew D Templeton; Andrew J Dingley
Journal: Biochim Biophys Acta Date: 2012-07-04

5. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors: Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal: Bioinformatics Date: 2015-11-11 Impact factor: 6.937

6. Using native and syntenically mapped cDNA alignments to improve de novo gene finding.

Authors: Mario Stanke; Mark Diekhans; Robert Baertsch; David Haussler
Journal: Bioinformatics Date: 2008-01-24 Impact factor: 6.937

7. QUAST: quality assessment tool for genome assemblies.

Authors: Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal: Bioinformatics Date: 2013-02-19 Impact factor: 6.937

Review 8. A cytometric exercise in plant DNA histograms, with 2C values for 70 species.

Authors: D Marie; S C Brown
Journal: Biol Cell Date: 1993 Impact factor: 4.458

9. A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330

10. Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

Authors: Kevin Cheeseman; Jeanne Ropars; Pierre Renault; Joëlle Dupont; Jérôme Gouzy; Antoine Branca; Anne-Laure Abraham; Maurizio Ceppi; Emmanuel Conseiller; Robert Debuchy; Fabienne Malagnac; Anne Goarin; Philippe Silar; Sandrine Lacoste; Erika Sallet; Aaron Bensimon; Tatiana Giraud; Yves Brygoo
Journal: Nat Commun Date: 2014 Impact factor: 14.919