Literature DB >> 30018880

IMA Genome-F 9: Draft genome sequence of Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina, and Morchella septimelata.

Brenda D Wingfield¹, Gerald F Bills², Yang Dong^3,4,5, Wenli Huang⁶, Wilma J Nel¹, Benedicta S Swalarsk-Parry¹, Niloofar Vaghefi⁷, P Markus Wilken¹, Zhiqiang An², Z Wilhelm de Beer¹, Lieschen De Vos¹, Li Chen², Tuan A Duong¹, Yun Gao⁸, Almuth Hammerbacher⁹, Julie R Kikkert¹⁰, Yan Li^2,11, Huiying Li¹², Kuan Li¹³, Qiang Li⁶, Xingzhong Liu¹³, Xiao Ma¹⁴, Kershney Naidoo¹, Sarah J Pethybridge⁷, Jingzu Sun^11,13, Emma T Steenkamp¹, Magriet A van der Nest¹, Stephanie van Wyk¹, Michael J Wingfield¹, Chuan Xiong⁶, Qun Yue^2,15, Xiaoling Zhang¹³.

Abstract

Draft genomes of the species Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina and Morchella septimelata are presented. Both mating types (MAT1-1 and MAT1-2) of Cercospora beticola are included. Two strains of Coleophoma cylindrospora that produce sulfated homotyrosine echinocandin variants, FR209602, FR220897 and FR220899 are presented. The sequencing of Aspergillus mulundensis, Coleophoma cylindrospora and Phialophora cf. hyalina has enabled mapping of the gene clusters encoding the chemical diversity from the echinocandin pathways, providing data that reveals the complexity of secondary metabolism in these different species. Overall these genomes provide a valuable resource for understanding the molecular processes underlying pathogenicity (in some cases), biology and toxin production of these economically important fungi.

Entities: CellLine Chemical Disease Species

Keywords: Beta vulgaris; Carya cordiformis; Pitch canker; echinocandin gene clusters; mulundocandins; peumocandins

Year: 2018 PMID： 30018880 PMCID： PMC6048567 DOI： 10.5598/imafungus.2018.09.01.13

Source DB: PubMed Journal: IMA Fungus ISSN： 2210-6340 Impact factor: 3.515

INTRODUCTION

Annulohypoxylon stygium (Xylariales, Ascomycota) is a white-rot fungus commonly found on dead wood (Hsieh ). Annulohypoxylon stygium displays an extremely high performance in lignin and carbohydrate degradation. Some species of Annulohypoxylon may be used in the cultivation of Tremella fuciformis, one of the foremost medicinal and culinary fungi of China (Stamets 2000). Tremella fuciformis, the white jelly mushroom, is a symbiotic fungus that does not form an edible basidiome without the presence of a specific host fungus (Li ). Its preferred host has traditionally been indicated as “Xianghui” in China. Recently, A. stygium was identified to be the main Xianghui species and this has been confirmed experimentally (Deng ). Cultivators usually pair cultures of T. fuciformis with this species for industrial production and the formation of T. fuciformis basidiomes is highly dependent on the presence of the specific host fungus, both in nature and for industrial production. To date, the symbiotic mechanism of A. stygium and T. fuciformis has not been understood yet. The genome sequence of A. stygium from this study may provide some useful information to reveal the symbiotic mechanism of A. stygium with T. fuciformis.

SEQUENCED STRAIN

China: Sichuan: Tongjiang, N 31°42’, E 120°17’, alt. 1523 m, solated from dead wood, 8 Aug. 2015, Qiang Li & Chuan Xiong (MG137 – dried culture).

NUCLEOTIDE SEQUENCE ACCESSION NUMBER

The Whole Genome Shotgun project isolate (culture collection number SAAS137) has been deposited at DBJ/EMBL/GenBank under accession number PYLT00000000. The version described in this paper is version PYLT01000000.

MATERIALS AND METHODS

Annulohypoxylon stygium MG137 was isolated from dead wood in Tongjiang, Sichuan province, China, and was preserved in the Fungal Culture Collection, Center in Biotechnology and Nuclear Technology Research Institute, Chengdu, Sichuan, China. Genomic DNA was isolated from this isolate and subjected to sequencing on the Genome Analyzer IIx next-generation sequencing platform (Illumina) at the Beijing Genomics Institute (Shenzhen, China). Paired-end libraries with respective insert sizes of 425 bp and 725 bp were used to generate read lengths of 150 bases. The CLC Genomics Workbench v. 6.0.1 (CLCBio, Aarhus, Denmark) was subsequently used to trim reads of poor quality (limit of 0.05) as well as terminal nucleotides. The remaining reads were assembled using the SPAdes 3.0.0 with an optimized k-mer value of 103 (Bankevich ). Thereafter, scaffolding was completed using SSPACE v. 2.0 (Boetzer ) and gaps reduced with the use of GapFiller v. 2.2.1 (Boetzer & Pirovano 2012). The completeness of the assembly was evaluated using the BUSCO v3 (Simão ). Homology-based gene prediction and ab initio prediction were performed to search A. stygium gene models. Homologous protein from Laccaria bicolor was used for alignment to the repeat-masked A. stygium genome using Exonerate v 2.2.0 (Slater & Birney, 2005). The filtered alignment results (above 300 bp and 90 % coverage) were built as training models for ab initio gene prediction. The ab initio prediction was conducted using Augustus v. 3.2.3 (Stanke ) and GeneMark-ES (Ter-Hovhannisyan ) guided by training models from homology-based alignments. All gene prediction results were intergrated into the final gene models by EVidenceModeler (Haas ). Carbohydrate-active enzymes (CAZyme), including the repertoire of auxiliary enzymes, were predicted using dbCAN (Yin ).

RESULTS AND DISCUSSION

The genome of A. stygium had an estimated size of 47.5 Mb with an average coverage of 31.26× (Table 1). The N50 size was 598 310 bases, and the assembly had a mean GC content of 46 %. The total number of scaffold generated was 1854. MAKER predicted a total of 12 498 genes with an average length of 1662 bp. The average gene density of A. stygium was 263 genes/Mb. A phylogenetic analysis of the genus Annulohypoxylon and the closely related genus Hypoxylon is presented to reflect the position of this genome (Fig. 1.).

Table 1.

Whole genome DNA sequence assemblies generated in Annulohypoxylon stygium MG137. The genomes of A. stygium MG137 were generated using next generation sequencing technology.

Genome	Annulohypoxylon stygium MG137
Coverage	31.26
BUSCO	96.6%
Total sequence length (Mb)	47.5
Scaffolds	1854
Scaffold N50	598 310
GC (%)	46
Predicted gene models	12 498

Predicted CAZYmes
Total CAZYmes	757
Auxiliary activities	153
Pectate lyases	13
Glycosyltransferases	106
Glycoside hydrolases	297
Carbohydrate esterases	125
Carbohydrate binding motifs	63

Predicted secondary Metabolite (sM) Clusters
Total SM clusters	90
Type I polyketide synthetases (PKSs)	36
Type III PKSs	1
Nonribosomal peptide synthetases (NRPSs)	21
Terpene clusters	10
Hglks	0

Fig. 1.

Maximum Likelihood (ML) phylogenetic analysis of the genus Annulohypoxylon and the closely related genus Hypoxylon using MEGA 6.06 based on partial gene sequences of β-tubulin. Bootstrap values were calculated using 1000 replicates to assess node support. Annulohypoxylon stygium isolates used for verification was extracted from the assembled genomes. Reference sequences are obtained from the NCBI database with accession number.

The draft genome of A. stygium is larger than that of the allied species Xylaria hypoxylon OSC100004 and Hypoxylon sp. CI-4A (Wu ), which are 42.9 Mb and 37.7 Mb, respectively. The genome is closer in size to that of Hypoxylon sp. CO27-5 and Hypoxylon sp. EC38 (Wu ), which have genome sizes of 46.6 Mb and 47.7 Mb, respectively. Annulohypoxylon stygium also has a similar number of putative genes when compared to Hypoxylon sp. EC38 (12261 predicted gene models) and Hypoxylon sp. CO27-5 (12 256 predicted gene models). A total 757 CAZymes were identified in the genome of A. stygium, more than that in the closely related Hypoxylon sp. CO27-5 (599 CAZymes) and Hypoxylon sp. CI-4A (526 CAZymes). The number of CAZymes in A. stygium was much higher than that in Tremella enchepala (265 CAZYmes; Magnuson et al. 2017) and T. mesenterica (206 CAZYmes; Floudas ), indicating that A. stygium may assist Tremella species in the degradation of lignin and carbohydrates in nature or for industrial production. The genome sequence data of A. stygium in this study will provide useful information for understanding the mechanism of the symbiotic interaction between A. stygium and T. fuciformis. A strain of Aspergillus (Y-30462 = DSMZ 5745) was isolated at Hoechst India, then located in the Mulund district of Mumbai, India, from a soil sample collected in Bangladesh (Mukhopadhyay , Roy ). In the original publication, the fungus was described as an unusual variant of A. sydowii because of the presence of abundant Hülle cells and was published without a Latin description or type specimen as “A. sydowii var. mulundensis”. This strain was subsequently re-examined using multi-gene phylogenetic analysis, chemotaxonomic markers, and morphological data and was determined as representing a novel species within Aspergillus sect. Nidulantes (Bills , Chen ). The primary objective for sequencing the genome of A. mulundensis was the identification of the gene cluster-encoding the biosynthesis of the muludocandins (Yue ). Mulundocandin and deoxymulundocandin (Fig. 2) are lipohexapeptides and potent antifungal antibiotics of the echinocandin class (Mukhopadhyay , Roy , Mukhopadhyay ). Biosynthetically, they are closely related to echinocandin B, but they differ in the substitution of serine instead of threonine in the fifth position of the hexapeptide core and by a 12-methyl myristoyl side chain instead of a lineolyl side chain. Mulundocandin and its deoxymulundocandin have been investigated extensively as potential lead structures for the development of echinocandin-type antifungal drugs (Mukhopadhyay , Hawser , Lal , 2004). This draft genome will expand genomic data sets for comparative genomics of species in Aspergillus sect. Nidulantes.

Fig. 2.

Some naturally occurring echinocandins described in the patent literature.

Bangladesh: unknown location, isolated from soil at Hoechst (Mumbai, India) (Hoechst Y-30462 = DSMZ 5745 = CBS 140610 = IBT 33104). The Aspergillus mulundensis isolate DSMZ 5745 Whole Genome Shotgun project has been deposited in GenBank under the accession number PVWQ00000000. For methods for DNA extraction, sequencing, and genome assembly and annotation, see Bills . The genome of DSMZ 5745 was sequenced to 100-fold coverage, yielding 160 scaffolds with N50 of 2.8 Mb (Table 2). The assembled genome size was 45 Mb, and a total of 11603 genes were predicted. The GC content of this genome is 43.2 %. The genome contains 53 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 25 PKSs, 19 NRPSs, one PKS-NRPS hybrids, four dimethylallyl tryptophan synthases, and four terpene synthases. These genes are distributed among 45 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, 14 secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue ) was recognized, and predicted to be responsible for the biosynthesis of muludocandins. The nuclear-encoded secondary metabolomes of A. mulundensis and A. nidulans FGSC A4 were compared previously (Bills ). A phylogenetic tree reflecting the position of this genomes in relation to other Aspergillus species is presented (Fig. 3.)

Table 2.

General features of the genomes of Coleophoma cylindrospora BP6252 and BP5796, Phialophora cf. hyalina BP5553, and Aspergillus mulundensis DSMZ 5745.

Features	BP6252	BP5796	BP5553	DSMZ 5745
Assembly size (Mb)	42.4	40.4	33.6	45
Scaffolds	77	45	32	160
Scaffold N50 (Mb)	2.3	2	3.8	2.8
Coverage (fold)	100	100	102	100
G+C content (%)	48.7	48.5	48.2	43.2
Protein-coding genes	14177	13257	10707	11603
Gene density (per Mb)	337.55	331.42	324.45	257.84
Exons per gene	3.15	3.13	3.12	3.02
PKSs	15	15	19	25
NRPSs	8	6	13	19
PKS-NRPS hybrids	0	0	6	1
DMATSs	2	2	2	4
Terpene synthases	1	1	4	4
Chalcone or stilbene synthase gene	0	0	1	0
Secondary metabolite gene clusters	30	28	48	59

Fig. 3.

Maximum Likelihood tree of ex-type and authentic strains of Aspergillus sect. Nidulantes (25 strains) inferred based on an alignment of the concatenated sequences of the ITS-28S rDNA, ribosomal polymerase II, β-tubulin, and calmodulin genes. Data were resampled from Chen . DMSZ 5745 is labelled in red, and A. unguis was positioned as the outgroup. The Maximum Likelihood tree was based on the Tamura-Nei model. The tree with the highest log likelihood (–13 959.85) is shown. Branches are labelled with the percentage of trees in which the associated taxa clustered together. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.2371)). Branch lengths were measured in the number of substitutions/site. The dataset included 3329 positions. Data were analyzed in MEGA7 (Kumar ). "Type" = ex-type cultures.

Berkeleyomyces basicola (Ascomycota: Microascales), previously known as Thielaviopsis basicola (Nel ), is an important plant pathogen responsible for root rot of many important agricultural and ornamental plants (Johnson 1916, Stover 1950, Nehl , Pereg 2013). Since its description in the mid-1800s (Berkeley & Broome 1850), there has been considerable debate surrounding its appropriate taxonomic placement resulting in numerous name changes. The phylogenetic re-evaluation of Ceratocystidaceae by De Beer raised new questions regarding the appropriate taxonomic placement of the species. Their results suggested that T. basicola did not group in Thielaviopsis or any other genus described in the family. Because the authors included only the sequence data of a single isolate in their analyses they concluded that no taxonomic changes could be made without further study. In a recent investigation, Nel confirmed that T. basicola represented a distinct generic lineage in Ceratocystidaceae and introduced the new generic name Berkeleyomyces. In addition, they showed that isolates of Berkeleyomyces represented two cryptic sister species for which they provided the names B. basicola and B. rouxiae. The aim of this study was to generate a high-quality genome sequence for B. basicola. This would allow for comparisons to be made with the available genomes of other species in Ceratocystidaceae, including those in the genera Ceratocystis, Huntiella, Davidsoniella, Thielaviopsis, Chalaropsis, Endoconidiophora and the recently described Bretziella (De Beer ). Here we report the complete genome sequence of isolate CMW 49352, the designated reference specimen for B. basicola logged in CBS (Westerdijk Fungal Biodiversity Institute, Utrecht, The Netherlands), and the culture collection of the Forestry and Agricultural Biotechnology Institute (CMW), University of Pretoria, South Africa. The Netherlands: South Holland, Boskoop, isol ex Betula sp., June 1974. S.G. De Hoog (CMW 49352 = CBS 142796; PREM 62125 = dried culture). The draft genome sequence of Berkeleyomyces basicola (CMW 49352 = CBS 142796) has been deposited at DDBJ/ENA/GenBank under the accession number PJAC00000000. The version presented here is PJAC00000000. Genomic DNA was extracted from lyophilized mycelium of Berkeleyomyces basicola isolate CMW 49352 grown in malt yeast broth (2 % Malt extract, 0.5 % yeast extract; Biolab, Midrand, South Africa) using the method described by Duong . A paired-end library was prepared (350 bp average insert sizes) and sequenced using the Illumina HiSeqX Platform. A mate-pair library was prepared (10 Kb average insert size) and sequenced using the Illumina HiSeq2500 platform. Long reads were also generated using one cell of the Single-molecule real time (SMRT or PacBio) sequencing platform (Pacific BioScience). All sequencing was conducted at Macrogen (Seoul, Korea). Quality and adapter trimming of pair-end and mate-pair reads was carried out using Trimmomatic v. 0.36 (Bolger ). De novo assembly of the genome was carried out using SPAdes v. 3.9 (Bankevich ) using all pair-end, mate-pair and PacBio data. Contigs smaller than 500 bp were removed from the dataset. Initial scaffolding was done using SSPACE-standard v. 3.0 (Boetzer ) with the paired-end and mate-pair reads. A second round of scaffolding was done using SSPACE-Longread with the PacBio reads. Assembly gaps were filled using GapFiller v. 1.10 (Boetzer & Pirovano 2012) with the paired-end and mate-pair reads, and using PBJelly (English ) with PacBio reads. Final genome polishing was done using Pilon (Walker ). Genome completeness was assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO v. 1.1b1) tool using the Ascomycota dataset (Simão ). The number of protein coding genes was determined using Augustus v. 3.3.2 (Stanke ) using pre-optimised species models for Fusarium graminearum. The paired-end, mate-pair, and PacBio sequencing yielded 431 141 384, 60 673 400 and 42 422 reads, respectively. Final assembly consisted of 81 contigs, with the largest around 3.8 Mb and an N50 of 1.2 Mb. The estimated size of the genome is around 25.1 Mb with a GC content of 52 %. This estimated size is similar to that of other species in Ceratocystidaceae, which range between 25.4 Mb for Huntiella moniliformis and 33.6 Mb for Davidsoniella virescens (Wilken , Van der Nest , b, Wingfield , b, 2016a, b). The phylogenetic position of Berkeleyomyces basicola is presented in Fig. 4.

Fig. 4.

Maximum Likelihood (ML) phylogram derived from the analyses of the partial MCM7 gene sequences for species in Ceratocystidaceae. CLCbio Genomics Workbench v. 9.5 (CLCbio, QIAGEN, Aarthus, Denmark) was used to screen the genome of B. basicola isolate CMW 49352 to identify and extract the MCM7 gene using an available reference sequence for the gene from B. basicola (Accession: MF967102). A dataset was prepared based on the phylogenies of Nel and sequences were downloaded from NCBI GenBank. DNA sequence alignments of the dataset were done using the online version of MAFFT v. 7 (Katoh & Standley, 2013). The ML analyses were performed in MEGA v. 6.06 (Tamura ) using the GTR model. Values shown at nodes are confidence values >75 %. The sequence from the B. basicola genome is indicated in bold.

BUSCO analysis predicted an assembly completeness of 97.4 %. The assembly contained 1280 complete single-copy BUSCOs, one complete and duplicated BUSCOs, 10 fragmented BUSCOs and 24 missing BUSCOs out of a total 1315 BUSCO groups searched. AUGUSTUS annotation predicted 10 074 putative coding regions, corresponding to around 401 ORFs/Mb. The availability of the genome for B. basicola will make possible genome comparisons with other species in Ceratocystidaceae and facilitate investigations into factors involved in pathogenicity, ecology, mating, and evolution of this important plant pathogen. The genus Ceratocystis as defined by De Beer is a diverse assemblage of species that are best known as pathogens of angiosperm trees and commercially grown root crops (De Beer , Li , Seifert , Van Wyk ). Among these, C. fimbriata s. lat. is arguably the best-known pathogen, and has been associated with diseases of sweet potato (Halsted & Fairchild 1891), taro (Huang ), pomegranate (Somasekhara 1999) and kiwifruit (Piveta ). The related C. manginecans causes disease on mango and Acacia mangium trees in Oman and Pakistan (Al-Subhi , Al Adawi ), while C. eucalypticola is responsible for mortality on commercially planted eucalypt trees in South Africa (Van Wyk ). These fungi all share a very similar morphology, making their species boundaries difficult to determine (Fourie , Harrington ). In contrast, several other species in the genus are clearly defined, with universally accepted species status (Engelbrecht & Harrington 2005). These include C. albifundus (a pathogen of commercially propagated Acacia mearnsii and Protea cynaroides in South Africa; Lee ), C. cacaofunesta (causing cacao wilt in the Caribbean and Central and South America; Engelbrecht ), and C. smalleyi (agent of hickory decline in the USA; Johnson ). Ceratocystis smalleyi was first isolated from a hickory tree (Carya sp.) that had been infested by the hickory bark beetle Scolytus quadrispinosus (Johnson ). In 2005, C. smalleyi was formally named and described after additional isolates were collected from Carya trees that had been attacked by the hickory bark beetle across parts of the eastern US (Johnson ). The authors subsequently linked C. smalleyi with the decline of hickory through a possible association with the bark beetle S. quadrispinosus (Johnson ). Later studies have confirmed C. smalleyi as a pathogen on Carya species (Park , 2013), and established the close association between the fungus and the bark-beetle (Juzwik ). This makes C. smalleyi the only known Ceratocystis species to be associated with a bark-beetle. In other Ceratocystis species, the production of volatiles is linked to attracting insects for dispersal (Van Wyk , 2012). The specific association between C. smalleyi and the vector S. quadrispinosus would eliminate the need for producing volatile attractants, and could explain the inability of this species to produce the fruity odours characteristic of other Ceratocystis species (Harrington 2009; Johnson ). In this study, we aimed to produce a draft genome assembly for C. smalleyi. This assembly would be the seventh Ceratocystis species for which a genome sequence is published, and adds to the valuable genomic resource available for members of Ceratocystidaceae (Molano , Van der Nest , b, 2015, Vanderpool , Wilken , 2018, Wingfield , b, 2016a, b). Furthermore, the availability of a genome assembly will afford the opportunity in future to investigate aspects of the unique biology of C. smalleyi. USA: Wisconsin: Hickory Ridge, isol. Carya cordiformis. Oct. 1993, G. Smalley (CMW 14800, CBS 114724, BPI 843722 – dried culture). This Whole Genome Shotgun project for Ceratocystis smalleyi isolate CMW 14800 has been deposited at DDBJ/ENA/GenBank under accession NETT00000000. The version described in this paper is version NETT01000000. Ceratocystis smalleyi isolate CMW 14800 was obtained from the culture collection of the Forestry and Agricultural Biotechnology Institute (FABI) and grown on 2 % malt extract agar (MEA: 2 % w/v, Biolab, South Africa) at 25 °C. A 14 d old culture was used to isolate genomic DNA using a previously described phenol-chloroform protocol (Roux ). The isolated DNA was submitted for sequencing on an Illumina Genomics Analyzer IIx at the UC Davis Genome Centre (University of California, Davis). For sequencing, paired-end libraries of 350 bp and 600 bp insert sizes were prepared and sequenced following the protocol provided by Illumina (www.illumina.com). The raw sequencing reads were imported into CLC Genomics Workbench v. 7.5.1 (CLCBio. Aarhus), and default settings were used to both trim the reads for quality and to produce a de novo genome assembly using the trimmed reads. Scaffolds were generated from the assembly using SSPACE v. 2.0 (Boetzer ), while GapFiller v. 2.2.1 (Boetzer & Pirovano 2012) was used to fill any gaps created during scaffolding. Sequencing coverage was estimated by mapping the trimmed sequencing reads to the contigs, while an estimate of the number of putative open reading frames (ORFs) were obtained through de novo gene prediction using the web-based version of AUGUSTUS and gene models from Fusarium graminearum (Keller ). The Benchmarking Universal Single-Copy Orthologs (BUSCO v. 1.22) tool was used in combination with the fungal data set to provide a quantitative measure of the level of genome completeness (Simão ). The 60S, LSU and MCM7 gene regions were extracted from the genome and, together with these regions from the recently sequenced species C. cacaofunesta (Molano ), T. punctulata isolate CMW1032 (Wilken ), H. savannae (Van der Nest ) and A. xylebori (Vanderpool ) were added to the Ceratocystidaceae dataset used for phylogenetic analysis by Wingfield . The resulting datasets were aligned using MUSCLE (Edgar 2004), concatenated, and used to construct a Maximum Likelihood phylogeny using PhyML 3.1 (Guindon ) based on model parameters estimated with jModelTest 2.1.10 (Darriba ). The 27 311 342 bp Ceratocystis smalleyi genome was present in 2261 contigs, of which 1242 contigs were larger than 500 bp. The draft assembly yielded a genome with a G/C content of 50.6 %, an average coverage of 84x and 6682 predicted open reading frames at an average gene density of 245 ORFs/Mb. BUSCO analysis indicated a genome completeness of 97 % with 1394 of the 1438 searched orthologs present in the genome being complete. In total 1330 ORFs occurred as single copies while 64 were duplicates. Of the remaining searched homologs, 37 were fragmented while the remaining seven were missing from the genome assembly. The genome of C. smalleyi was comparable in size and gene content to that of other Ceratocystis species (Wingfield , 2016a, b). At 27.3 Mb, the C. smalleyi genome is slightly larger than that of the related species C. harringtonii (genome size of 26 Mb; Wingfield ), but smaller than the genome of C. manginecans (31.7 Mb; Van der Nest ). Gene densities for published Ceratocystis genomes range from 204–257 ORF/Mb (Wingfield , 2016a, b), and the C. smalleyi gene density falls within this range. In contrast, the 50.6 % G/C content of the C. smalleyi genome is unusually high, with all other Ceratocystis species showing G/C contents below 49 % (Wingfield , 2016a, b). The availability of multiple Ceratocystis genomes (Fig. 5) provides the opportunity to study the genetic aspects that underlie ecological and life-style differences between members of this genus. Understanding these differences will also be crucial in explaining at least some of the variations in gene content, genome size, and G/C content evident among these genomes. Currently, the published Ceratocystis genomes make up the bulk of the Ceratocystidaceae genome resource with published genomes available for seven species (Fig. 5; Molano , Van der Nest , b, Wilken , Wingfield , 2016b). In addition, published genome sequences are available for five Huntiella species (Van der Nest , b, 2015, Wingfield , 2017), two Endoconidiophora species (Wingfield ), three isolates representing two Thielaviopsis species (Wilken , Wingfield ; Wingfield ), one Davidsoniella species (Wingfield ), one Bretziella species (previously Ceratocystis fagacearum; De Beer , Wingfield ), one Ambrosiella species (Vanderpool ) as well as for C. adiposa (Wingfield ). This brings the number of published Ceratocystidaceae genomes to 21, with the genome assemblies of several others publicly available (www.ncbi.nlm.nih.gov/assembly/?term=ceratocystidaceae). Such a vast genomic resource will prove valuable to future studies on Ceratocystidaceae, a family that include fungal species with diverse life-styles and hosts.

Fig. 5.

A Maximum Likelihood phylogeny showing Ceratocystidaceae isolates for which published whole-genome sequences are available, including that of C. smalleyi discussed here. The 60S, LSU, and MCM7 gene regions were used, and was either extracted from the assembled genomes or were obtained from the study of Wingfield . Phylogeny constructed using the TrN+I+G model with confidence values based on 1000 bootstrap replicates. Only bootstrap values ≥ 75 are shown.

The genus Cercospora (Mycosphaerellaceae) includes several economically important plant pathogens causing leaf and fruit spots on a range of agricultural crops worldwide (Groenewald ). Cercospora species are known to produce cercosporin, a photo-activated toxin that contributes to pathogenicity on a broad range of crops (Daub ). Cercospora beticola is the cause of Cercospora leaf spot (CLS) on sugar and table beet (Beta vulgaris ssp. vulgaris), and Swiss chard (Beta vulgaris ssp. cicla) worldwide (Franc 2010). In New York, CLS is the most important disease affecting foliar health of table beet. Symptoms include leaf spots and necrotic lesions with red to purple margins, which coalesce as the disease progresses, and can result in complete defoliation (Pethybridge ). In broadacre production systems, maintenance of foliar health is important to enable mechanized harvest. For fresh market sales, the presence of CLS lesions on the leaves may result in rejection (Pethybridge ). The control of CLS in table beet is dependent on fungicides (Pethybridge ). However, resistance to single-site mode of action fungicides threatens the durability of CLS control. Recent studies reported a high frequency of isolates with resistance to quinone outside inhibitor fungicides in New York (Vaghefi ). Moreover, succinate dehydrogenase inhibitor fungicides, which are known to be effective in controlling CLS on sugar beet, failed to provide efficacious control on table beet (Pethybridge ), and a few isolates with reduced sensitivity to demethylation inhibitors have been detected (Pethybridge, unpubl.). Identifying genomic regions associated with sensitivity to fungicides will enable rapid screening of C. beticola populations. Enhanced genomic information for this pathogen will also facilitate studies into the mechanisms of pathogenicity. De novo genome assembly of two C. beticola strains from table beet are presented here, and made publically available to facilitate genetic studies of this globally important plant pathogen.

SEQUENCED STRAINS

USA: New York: western New York, Batavia, from Beta vulgaris ssp. vulgaris (table beet), 2014, F.S. Hay (Tb14-085 = ICMP 21692); ibid. (Tb14-047 = ICMP 21690). The Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank under the accessions PDUH00000000 and PDUI00000000.

METHODS

Two C. beticola isolates belonging to opposite mating-types (ICMP 21690 [MAT-2] and ICMP 21692 [MAT-1]), collected from table beet in New York, were selected for whole genome sequencing. The identity of the strains as C. beticola was confirmed through multi-locus sequence typing of five loci; ITS, actin, calmodulin, histone H3 and translation elongation factor 1-a (Fig. 6). Fungal strains were cultured in clarified V8 broth (10 % (v/v) clarified V8 juice (Campbell’s Soup, USA), 0.5 % (w/v) CaCO3). Seven-day-old mycelia were harvested, and genomic DNA was extracted as described in Vaghefi . The extracted DNA samples were quantified using a Qubit fluorometer (Invitrogen, NY).

Fig. 6.

Identity verification of Cercospora beticola isolates sequenced in this study. The phylogeny was constructed by Bayesian inference based on the sequences of five loci; ITS, act, cmd, his and tef1-α. Sequence alignments were produced using MAFFT v. 7 (Katoh & Standley 2013) (MrBayes v. 3.1.2; Ronquist & Huelsenbeck, 2003). Branches with posterior probability of 1.00 are thickened. The tree was rooted to C. zeae-maydis (CBS 117757).

A total of 5.6 and 5.0 μg genomic DNA of ICMP 21692 and ICMP 21690 were used to prepare PCR-free libraries with average insert of ~550 bp, using the Illumina paired-end (2×300 bp) MiSeq platform at the Cornell University Institute of Biotechnology Genomics Facility (Ithaca, NY). PCR-free libraries were constructed using Illumina’s TruSeq Nano DNA LT Sample Preparation kits, according to the manufacturer’s protocol. This yielded 4 607 564 and 4 798 846 paired-end reads, totalling 2.7 and 2.9 Gb data for ICMP 21692 and ICMP 21690, respectively. Quality control of the sequences was conducted using FastQC v.0.11.2 (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) in the GALAXY portal (Afgan ). The Kmer counting software Jellyfish v.2.2.3 (Marçais & Kingsford 2011) was used to estimate the genome size. De novo genome assembly was conducted using DISCOVAR de novo v.52488; an assembler designed for de novo assembly of long Illumina paired-end reads from single PCR-free libraries (Weisenfeld ). The completeness of the final assemblies was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.1.2 (Simão ). Gene prediction was conducted in the genome annotation pipeline Maker v.2.31.9 (Cantarel ), using contigs at least 500 bp in length only for ICMP 21692. A preliminary annotation used the ab initio gene prediction program SNAP (Korf 2004). The resulting annotation was used to produce a hidden-markov-model (HMM) profile for C. beticola, which was further refined with a second stage of SNAP training. The refined HMM file was used for the final annotation (Cantarel ). The Illumina paired-end (2×300 bp) sequencing of C. beticola isolates ICMP 21692 and ICMP 21690, resulted in 4 607 564 and 4 798 846 reads for each strain, respectively, with mean base quality of 28.5 and 28.7. The estimated genome size of C. beticola was ~37 Mb, based on an approximated genome coverage for both strains of at least 74×. The draft genome of ICMP 21692 had a total assembly size of ~35.03 Mbp (for 1 kb+ scaffolds), a scaffold N50 value of 1 023 488 bp, and maximum contig size of 3 283 856 bp. The draft genome of ICMP 21690 had a total assembly size of ~34.5 Mbp (for 1 kb+ scaffolds), and a scaffold N50 value of 654 439 bp, and maximum contig size of 2 437 838 bp. Both assemblies were 97 % complete based on the content of BUSCO. All contigs with a length of ≥ 200 bp were submitted to the genome database of NCBI. Ab initio gene prediction for ICMP 21692 using trained SNAP identified 12 834 Open Reading Frames (ORFs). The estimated genome size of C. beticola (37 Mb) is comparable to that of multiple Cercospora species, including C. canescens (~34 Mb; Chand ), C. cf. sigesbeckiae (~35 Mb; Albu ), and C. zeina (~37 Mb; Wingfield ). The draft genome of ICMP 21692 has already provided the foundation for global population genetics studies of C. beticola using microsatellite markers and Genotyping-By-Sequencing (Vaghefi , b). Current studies are focused on identification and characterisation of the genes responsible for sensitivity to fungicides. Availability of genomic data will provide a powerful tool for characterising the genes involved in pathogenicity. Researchers at Fujisawa Pharmaceutical (now Astellas Pharma) isolated strain BP6252 (No. 14573) from an unidentified decaying leaf from Tsushima Island, Japan and identified it as Coelophoma empetri. They fermented the strain to produce two water-soluble echinocandin analogues FR220897 and FR220899 (WF14573A and WF14573B) (Hori et al. 2004, Kanasaki ) (Fig. 2). FR220897 and FR220899 are isomers of FR901379 which is used for semisynthesis of micafungin. FR901379 is produced by a different strain of C. empetri F-11899 (Iwamoto , b). Differential antifungal activity of these isomers was critical to understanding the effects of the position of the homotyrosine sulfate residue on the antifungal activity (Hino , Kanasaki ). Like other echinocandins, the metabolites strongly inhibited β-1,3-glucan synthase and exhibited potent in vitro activity against Candida albicans and Aspergillus fumigatus, and FR220897 was effective in mouse candidiasis models. The discovery of these echinocandin variants was significant because sulfation of the homotyrosine residue overcomes the inherent poor water-solubility that had previously impeded development of echinocandin-type of antibiotics, including echinocandin B, aculeacins, and the pneumocandins. Coleophoma cylindrospora is a widespread endophyte and leaf saprobe and can be a weak pathogen of leaves and fruits of many woody plants (Sutton 1980, Wu , Polashock , Crous & Groenewald 2016). The phylogenetic affinity of the strain producing FR220897 and FR220899 was established with multiple phylogenetic marker sequences and was found to be conspecific with other strains of C. empetri (Yue ) (Fig. 7). Subsequently, during a revision of the polyphyletic genus Coleophoma, C. empetri was found to be phylogenetically indistinct from the similar C. cylindrospora and was considered to be a synonym of the latter (Crous & Groenewald 2016).

Fig. 7.

Maximum Likelihood tree of genome-sequenced strains producing echinocandins (red) and selected strains of the Leotiomycetes (55 strains total) based on an alignment of the ITS and 28S rDNA. Botryotinia fuckeliana was positioned as the outgroup. The tree was inferred by using the maximum likelihood method based on the Kimura 2-parameter model. The tree with the highest log likelihood (-4229.10) is shown. The percentage of trees in which the associated taxa clustered together is labelled on branch nodes. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.4881)). Branch lengths were measured in the number of substitutions/site. All positions containing gaps and missing data were eliminated. The dataset included 955 positions. Data were analyzed in MEGA7 (

The primary objective behind the sequencing the genome of C. cylindrospora was the identification of the gene cluster-encoding the biosynthesis of FR220897 and FR220899 (Yue ). The genome sequence will be essential for identifying the mechanism of the regiospecific sulfation reaction. The draft genome also has revealed that the strain harbours an auxiliary copy of β-1,3-glucan synthase that may function as an echinocandin resistance gene (Yue ). This draft genome will expand genomic data sets for comparative genomics of species in Leotiomycetes, Dermataceae, and endophytic fungi in general. Japan: Tsushima Island: Nagasaki Prefecture, isolated from decaying leaf, [no further information] (NBRC-NITE BP6252, Fujisawa No. 14573). The C. cylindrospora isolate BP6252 Whole Genome Shotgun project has been deposited in GenBank under the accession number PDLM00000000. Lyophilized mycelia harvested from liquid cultures were ground in liquid nitrogen and genomic DNA was isolated by using the CTAB protocol (http://1000.fungalgenomes.org/home/wp-content/uploads/2013/02/genomicDNAProtocol-AK0511.pdf). A 180 bp insert library and a 5 kb mate-pair library were constructed for Illumina sequencing and were sequenced on an Illumina Hiseq2000 V4 sequencing platform (Yue ). The Illumina sequencing reads were assembled using Velvet 1.2 (Zerbino & Birney 2008). Ab initio gene predictions from the genome assembly were made with Augustus (Stanke ). Predicted genes were annotated by BLAST searches against UniProt databases (http://www.uniprot.org/). Polyketide synthases (PKSs), non-ribosomal peptide synthetases (NRPSs), dimethylallyl tryptophan synthases and related biosynthetic gene clusters were predicted by antiSMASH ver. 3.0 and manual annotation (Weber ). The genome of BP6252 was sequenced to 100-fold coverage, yielding 77 scaffolds with N50 of 2.3 megabases (Mb). The assembled genome size was 42.4 Mb, and a total of 14,177 genes were predicted. The GC content of this genome is 48.7 %. The genome contains 26 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 15 PKSs, eight NRPSs, two dimethylallyl tryptophan synthases, and one terpene synthase. These genes are distributed among 21 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, nine secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue ) was recognized and predicted to be responsible for the biosynthesis of FR220897 and FR220899. Researchers at Fujisawa Pharmaceuticals (now Astellas Pharma) isolated strain BP5796 (Fujisawa No. 738) from an unidentified leaf sample collected in Japan and identified it as C. crateriformis based on comparisons in conidia dimensions with respect to the known Coleophoma species at the time (Wu ). The strain was fermented to produce three water-soluble echinocandin analogues, designated FR209602, FR209603 and FR209604 (Fig. 2). These analogues differ from FR901379 (WF11899A) and its analogues by a substitution of threonine for serine at the peptide’s third amino acid and deoxygenation of the homotyrosine residue at C-4. Like other echinocandins, these metabolites strongly inhibited activity of β-1,3-glucan synthase and exhibited potent in vivo activity against C. albicans and A. fumigatus in murine systemic infection models. The phylogenetic affinity of the strain producing FR209602 and analogues was established with multiple phylogenetic marker sequences (Yue ) (Fig. 7). Although, we had retained the original identification of C. crateriformis in previous work on the evolution of the echinocandin pathways, a multi-gene phylogeny indicated the strain was conspecific with other strains named as C. empetri. Subsequently, during a revision of the polyphyletic genus Coleophoma, it was noted that an authentic strain of C. crateriformis, the type species of the genus Coleophoma, was lacking, and thus, its phylogenetic affinities within the genus remained to be determined (Crous & Groenewald 2016). Because strain BP5796 appears to be phylogenetically indistinct from the similar C. cylindrospora, we consider it to be conspecific with the latter (Crous & Groenewald, 2016). The primary motivation for sequencing the genome of C. cylindrospora BP5796 was to identify the gene cluster-encoding the biosynthesis of FR209602. The genome sequence will be essential for identification of the mechanism of the regiospecific sulfation reaction. The draft genome also has revealed, that like BP6252, the strain harbours an auxiliary copy of β-1,3-glucan synthase that may function as an echinocandin resistance gene (Yue ). This draft genome will expand resources for comparative genomics of species in Dermataceae and endophytic fungi. Japan: Toyama Prefecture: Mount Tateyama, isolated from leaf sample, [no further information] (NBRC-NITE BP5796, Fujisawa No. 738). The C. cylindrospora isolate BP5796 Whole Genome Shotgun project has been deposited in GenBank under the accession number PDLN00000000. The methods for DNA extraction, sequencing, and genome assembly and annotation were essentially the same as for strain BP6252 above. The genome of BP5796 was sequenced to 100-fold coverage, yielding 45 scaffolds with N50 of 2.0 Mb. The assembled genome size was 40.4 Mb, and a total of 13257 genes were predicted. The GC content of this genome is 48.5 %. The genome contains 24 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 15 PKSs, six NRPSs, two dimethylallyl tryptophan synthases, and one terpene synthase. These genes are distributed among 21 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, seven secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition, a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue ) was recognized, and predicted to be responsible for the biosynthesis of FR206902. This gene cluster deviated from other echinocandin gene clusters by the loss of a cytochrome P450 gene orthologous to htyF in A. pachycristatus and GLP450-1 in Glarea lozoyensis which would account for the absence of the hydroxylation at the homotyrosine C4. The genus Fusarium contains numerous well-known socio-economically important fungi (Nelson ). Many of these fungi form part of the Fusarium fujikuroi Species Complex (Geiser ) for which various whole genome sequences have been published, e.g. Fusarium fujikuroi (Jeong , Wiemann , Chiara ), Fusarium temperatum (Wingfield ) and Fusarium circinatum (Wingfield , van der Nest ). The latter causes pitch canker, which is a devastating disease of pine (Wingfield ). Of the five other species found to be associated with F. circinatum-like symptoms on pine in Colombia (Herron ), the genome of F. pininemorale has been sequenced (Wingfield ). In this study, we determined the whole genome sequence for F. fracticaudum, which was also described by Herron . Like F. pininemorale, this species does not seem to be a pathogen of pine as it could not incite lesions on the stems of pine seedlings in standard pathogenicity assays (Herron ).These differences between F. circinatum and these non-pathogenic Fusarium species on Pinus will provide an opportunity for genome comparisons. The association of F. fracticaudum with diseased pines and the genetic basis of biological traits in F. fracticaudum is not yet understood. Availability of various sequenced genomes of species within the FFSC is enabling studies into the biology and evolution of these fungi (Ma , De Vos , Niehaus ). Here we determine the genome sequence of F. fracticaudum, which will provide an additional resource for comparative genomic studies aimed at understanding the evolution of these fungi and unravelling the molecular basis of their plant interactions. Colombia: Angela Maria, Santa Rosa Risalda, 75°36’21” W 4°49’18” N, isolated from diseased Pinus maximinoi trees (CMWF25245; FCC5385; CBS137234; Herron ).

NUCLEOTIDE ACCESSION NUMBER:

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession PDNT00000000. The version described in this paper is version PDNT01000000.

Genome sequence

The F. fracticaudum isolate was grown on ½ Potato Dextrose agar (PDA; BD DifcoTM) at 25 °C for 7 d. Genomic DNA was extracted from fungal mycelium following the protocol of Möller et al. (1992). Genome sequencing was done with one paired-end (350 bp median insert size) and one mate-pair (5 kb median insert size) library using Illumina HiSeq XTen and Hiseq2000 platforms respectively, at Macrogen (Seoul, Korea). CLC Genomics Workbench v. 8.0.1 (CLCBio, Aarhus Denmark) was used to trim sequences less than 18 bp. The quality filtered reads were subjected to de novo assembly in ABySS v. 1.3.7 (Simpson ), followed by scaffolding with SSPACE v 2.0 (Boetzer ). The gaps within the sequences were closed using Gapfiller v. 1.11 (Boetzer & Pirovano 2012). To determine the completeness of the genome assembly, BUSCO v2.0.1 (Benchmarking Universal Single Copy Orthologs; Simão ) was employed using the sordariomycete dataset. Scaffolds were compared to those of the chromosomes of F. fujikuroi (Wiemann ) and F. temperatum (Wingfield ) using the LASTZ plugin (Harris 2007) of Geneious v 7.0.4 (Kearse ). WebAUGUSTUS (Hoff & Stanke 2013) was used to predict genes using the Fusarium graminearium model (http://bioinf.uni-greifswald.de/augustus) and the cDNA data from the F. circinatum genome (Wingfield ) as gene evidences.

Phylogenetic analysis

Phylogenetic analysis was conducted using partial sequences of the elongation factor 1-α (ef1-α) and beta-tubulin genes from other species in the Fusarium fujikuroi species complex (Herron ), including the genome of F. fracticaudum determined here. All gene sequences were aligned using MAFFT (Katoh ). A maximum likelihood phylogenetic analysis was carried out in PhyML v 3.1 (Guindon ) using the GTR+I+G substitution model with 1000 bootstraps, as determined using jModelTest v 2.1.7 (Darriba ). The assembled genome of F. fracticaudum was 46.29 Mb long with a GC content of 47.6 %. The assembly consisted of 50 scaffolds with an N50 value of 4 491 441 bp. WebAUGUSTUS predicted a total of 14 729 open reading frames (ORFs) in the assembly. Based on the BUSCO results, the assembly was 98.8 % complete (i.e., complete and single-copy BUSCOs = 97.6 %; complete and duplicated BUSCOs = 1.2 %; fragmented BUSCOs = 0.9 %; missing BUSCOs = 0.3 %; number of BUSCOs searched = 3725). The phylogeny inferred using two protein-coding genes also shows the previously reported relationships among the FFSC species included (Fig. 8.) (O’Donnell , Geiser , Herron ). The sequences extracted from F. fracticaudum also grouped with those of another isolate (CBS 137233) and the original GenBank accessions for the isolate sequenced here.

Fig. 8.

A Maximum Likelihood phylogeny showing the placement of the F. fracticaudum isolate (indicated in bold) that was sequenced in this study. The tree was inferred from combined β-tubulin and translation elongation factor 1-α gene sequences (Herron ). Values at branch nodes are the bootstrapping confidence values with those ≥ 85% shown. The scale bar indicates substitution per site.

In terms of overall genome statistics, the whole genome sequence of F. fracticaudum is similar to those reported for F. pininemorale, F. circinatum, and F. temperatum (Table 3). Also, F. fracticaudum contained the reciprocal translocation between chromosome 8 and 11 known in these fungi (De Vos ). However, sequence comparisons showed that chromosome 12, which is dispensable in other members of the FFSC (Xu ), is 1 094 708 bp in size in F. fracticaudum. This is considerably larger than the 692 922 bp reported for F. fujikuroi (Wiemann ), 986 231 bp in F. temperatum (Wingfield ), 791 442 bp in F. nygamai (Wingfield ) and 968 722 bp reported in F. pininemorale (Wingfield ). The differences observed in these genomes highlight the importance of sequencing the genomes of additional species in the FFSC. The F. fracticaudum sequenced here, together with those of other FFSC species will undoubtedly provide a platform to answer numerous questions pertaining to the evolutionary history of these fungi and their species-specific traits.

Table 3.

Genome statistics for F. fracticaudum and its close relatives.

	F. fracticaudum	F. pininemorale¹	F. circinatum²	F. temperatum³
Genome size (Mb)	46.29	47.83	43.43	45.46
GC content (%)	47.6	46.0	47.4	47.0
Predicted orfs⁴	14 729	14 640	15 056	14 284
Average gene length (bp)	1531	1472	1312	1575
Gene density (ORFs/Mb)	318	306	347	314

1Wingfield ;

2Wingfield ;

3Wingfield ;

4Open reading frames.

Researchers at Fujisawa Pharmaceuticals (now Astellas Pharma) isolated strain BP5553 (Fujisawa No. 16616) from soil collected in Japan and identified it as Tolypocladium parasiticum. They fermented the strain to produce the water-soluble echinocandin analogue FR190293 (Fig. 2). Like other echinocandins, FR190293 strongly inhibited β-1,3-glucan synthase and exhibited potent in vitro activity against Candida. albicans and Aspergillus fumigatus. The discovery of this new echinocandin variant was significant because it is the first of the echinocandins to have a dimethyl myristic acid acyl side chain, as in the pneumocandins, in combination with a sulfated homotyrosine residue. As previously reported, in-depth phylogenetic and morphological analysis of BP5553 demonstrated that the identification as the rotifer parasite T. parasiticum (syn. Pochonia parasitica) was erroneous. Rather than belonging to Clavicipitaceae, BP5553 was found to belong in Helotiales (Yue ). Based on rDNA and other protein-encoding sequences, BP5553 falls within a monophyletic lineage along with ex-type strains of Phialophora hyalina, Pleuroascus nicholsonii, Scopulariopsis parva, and Scopulariopsis parvula (Fig. 7). These strains, along with other species with Phialophora-like conidial morphs in Helotiales and BP5553 will eventually comprise a new genus in a new family of Helotiales (W. Untereiner et al., unpubl.). The primary objective behind the sequencing the genome of Phialophora cf. hyalina was the identification of the gene cluster-encoding the biosynthesis of FR190293 (Yue ). This draft genome will expand genomic data sets for comparative genomics of species in Leotiomycetes and Helotiales. Japan: Fukushima Prefecture: Iwaki, isolated from soil, [no further information] (NBRC-NITE BP5553, Fujisawa No. 16616). The Phialophora cf. hyalina isolate BP5553 Whole Genome Shotgun project has been deposited in GenBank under the accession number NPIC00000000. The methods for DNA extraction, sequencing, and genome assembly were essentially the same as for strain BP6252 above. The genome of BP5553 was sequenced to 102-fold coverage, yielding 32 scaffolds with N50 of 3.8 Mb. The assembled genome size was 33.6 Mb, and a total of 10 707 genes were predicted. The GC content of this genome is 48.2 %. The genome contains 45 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 19 PKSs, 13 NRPSs, six PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, four terpene synthases, and one chalcone synthase. These genes are distributed among 40 putative biosynthetic gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, eight secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue ) was recognized, and predicted to be responsible for the biosynthesis of FR190293. Morels (Morchella spp., Ascomycota) are a highly desired group of edible fungi with a worldwide distribution (Du , Kanwal , Richard ). They have been collected by mycophiles and gourmets for hundreds of years for their delicate taste and unique appearance (Tietel & Masaphy 2017, 2018, Rotzoll ). Morels are also found containing a variety of secondary metabolites with medicinal properties (Tietel & Masaphy 2018, Shameem , Pfab , Vieira ). Morchella septimelata is a black morel, belonging to the Morchella elata clade (Kuo ). It was often found in lightly to moderately burned conifer forests, near creek beds, springs and seeps, at an altitude of 1000–2000 m (Pildain ). The ascomata of M. septimelata can be found primarily in years immediately following forest fires, and then often appearing in dwindling numbers for several seasons thereafter (Hobbie ). In recent years, artificial cultivation of true morels has made great progress (Masaphy 2010); several Morchella species, such as M. sextelata, M. septimelata, and M. importuna, have been successfully cultivated in China, America, and other parts of the world. However, the mechanism of the growth and development of Morchella remains unclear, which causes the frequent failure and unstable yield of Morchella cultivation (Liu ). The genome sequence of M. septimelata from this study may reveal the mechanism of secondary metabolites synthesis in M. septimelata and provide some insights into the growth, development, and carbohydrate degradation of M. septimelata. China: Sichuan: Liangshan Yi, N 27°49’ E 100°48’, alt. 1468 m, isolated from forest soil, 19 Sep. 2015, Chuan Xiong & Qiang Li (MG91-dried culture). The Whole Genome Shotgun project M. septimelata isolate (Culture collection number SAAS91) has been deposited at DBJ/EMBL/GenBank under the accession number PYSJ00000000. The version described in this paper is version PYSJ01000000. Morchella septimelata MG91 was isolated from forest soil in Liangshan Yi Autonomous Prefecture, Sichuan, China, and was preserved in the Fungal Culture Collection Center of Biotechnology and Nuclear Technology Research Institute (Chengdu, Sichuan). Genomic DNA was extracted from MG91 and subjected to sequencing on the Genome Analyzer IIx next-generation sequencing platform (Illumina) at the BGI (Shenzhen, China). Paired-end libraries with respective insert sizes of 425 bp and 725 bp were used to generate read lengths of 150 bases. The CLC Genomics Workbench v. 6.0.1 (CLCBio, Aarhus, Denmark) was subsequently used to trim reads of poor quality (limit of 0.05) as well as terminal nucleotides. The remaining reads were assembled using the SPAdes 3.0.0 with an optimized k-mer value of 21 (Bankevich ). Thereafter, scaffolding was completed using SSPACE v. 2.0 (Boetzer ) and gaps reduced with the use of GapFiller v. 2.2.1 (Boetzer & Pirovano 2012). The completeness of the assembly was evaluated using the BUSCO v3 (Simão ). Homology-based gene prediction and ab initio prediction were performed to search M. septimelata gene models. Homologous protein from Tuber melanosporum was used for alignment to the repeat-masked M. septimelata genome using Exonerate v 2.2.0 (Slater & Birney 2005). The filtered alignment results (above 300 bp and 90 % coverage) were built as training models for ab initio gene prediction. The ab initio prediction was conducted using Augustus v. 3.2.3 (Stanke ) and GeneMark-ES (Ter-Hovhannisyan ) guided by training models from homology-based alignments. All gene prediction results were integrated into the final gene models by EVidenceModeler (Haas ). Carbohydrate-active enzymes (CAZyme), including the repertoire of auxiliary enzymes, were predicted using dbCAN (Yin ). To verify the species identity of the sequenced strain, the Translation Elongation Factor 1-alpha gene for selected Morchella species (Fig. 9.) were aligned with mafft (Katoh & Standley 2013). The Bayesian inference (BI) method (Erixon ) was used to construct the phylogenetic tree of different Morchella species. JMODELTEST 2.0.2 was used to ascertain the best-fit model for nucleotide substitutions (Darriba ). BI analysis was performed with MrBayes v3.2.6 (Ronquist ). Two independent runs with four chains (three heated and one cold) each were conducted simultaneously for 2 x 106 generations. Each run was sampled every 100 generations. We assumed that stationarity had been reached when estimated sample size (ESS) was greater than 100, and the potential scale reduction factor (PSRF) approached 1.0. The first 25 % samples were discarded as burn-in, and the remaining trees were used to calculate Bayesian posterior probabilities (BPP) in a 50 % majority-rule consensus tree.

Fig. 9.

A Bayesian inference (BI) phylogenetic analysis of genus Morchella using MrBayes v3.2.6 based on partial gene sequences of elongation factor 1-alpha (EF1-α) gene. Posterior probabilities are shown on the nodes of the tree. The Morchella septimelata isolate used for verification was extracted from the assembled genomes. Reference sequences are obtained from the NCBI database with accession number.

The genome of M. septimelata had an estimated size of 49.81 Mb with an average coverage of 151.17 times (Table 4). The Scaffold N50 size was 37 734 bases, and the assembly had a mean GC content of 47.40 %. The total number of scaffold generated was 6525. A total of 11 427 genes were predicted with an average length of 1 571 bp. A phylogenetic analysis of the genus Morchella is provided to show position of M. septimelata (Fig. 9).

Table 4.

Genome statistics, CAZYme richness and secondary metabolite clusters for the Morchella septimelata MG91 genome sequence.

Genome	M. septimelata MG91
Coverage	151.17x
BUSCO	97.7%
Total sequence length (Mb)	49.81
Scaffolds	6 525
Scaffold N50 (bp)	37 734
GC (%)	47.40
Predicted gene models	11 427
Average gene length (bp)	1 571
Average gene density (genes/Mb)	229

Predicted CAZYmes
Total CAZYmes	512
Auxiliary activities	75
Pectate lyases	23
Glycosyl transferases	75
Glycoside hydrolases	201
Carbohydrate esterases	72
Carbohydrate binding motifs	66

Predicted secondary Metabolite (sM) Clusters
Total SM clusters	9
Terpene clusters	3
Type I polyketide synthetases (PKSs)	1
Nonribosomal peptide synthetases (NRPSs)	1
Others	4

The draft genome of M. septimelata is larger than that of the closely related species, M. conica CCBAS932 (JGI: 1023999) and M. importuna SCYDJ1-A1 (JGI: 1047732), which are 48.21 Mb and 48.80 Mb, respectively. Less gene models are found in M. septimelata compared to that of these closely related species, M. conica CCBAS932 (11 600 gene models) and M. importuna SCYDJ1-A1 (11971 gene models). The average gene length of M. septimelata is also smaller than that of M. conica CCBAS932 (1668 bp) and M. importuna SCYDJ1-A1 (1625 bp). The average gene density of M. septimelata was 229 genes/Mb, which is smaller than that of M. conica CCBAS932 (240 genes/Mb) and M. importuna SCYDJ1-A1 (245 genes/Mb). A total 512 CAZymes were identified in the genome of M. septimelata, which is more than that of the closely related species, M. conica CCBAS932 (401 CAZymes) and M. importuna SCYDJ1-A1 (403 CAZymes), indicating that the carbohydrate degradation ability of M. septimelata may be stronger than that of the other two closely related species. A total of 9 secondary metabolite (sM) clusters were found in the M. septimelata genome, of which 3 sM clusters were for terpenes. The genome sequence data of M. septimelata presented in this study will provide useful information for understanding the synthesis mechanism of secondary metabolites in M. septimelata and lay a foundation for the artificial cultivation of M. septimelata.

105 in total

1. Taxonomic revision of true morels (Morchella) in Canada and the United States.

Authors: Michael Kuo; Damon R Dewsbury; Kerry O'Donnell; M Carol Carter; Stephen A Rehner; John David Moore; Jean-Marc Moncalvo; Stephen A Canfield; Steven L Stephenson; Andrew S Methven; Thomas J Volk
Journal: Mycologia Date: 2012-04-11 Impact factor: 2.696

2. The genetic landscape of Ceratocystis albifundus populations in South Africa reveals a recent fungal introduction event.

Authors: Dong-Hyeon Lee; Jolanda Roux; Brenda D Wingfield; Irene Barnes; Lizel Mostert; Michael J Wingfield
Journal: Fungal Biol Date: 2016-03-11

3. Aspergillus mulundensis sp. nov., a new species for the fungus producing the antifungal echinocandin lipopeptides, mulundocandins.

Authors: Gerald F Bills; Qun Yue; Li Chen; Yan Li; Zhiqiang An; Jens C Frisvad
Journal: J Antibiot (Tokyo) Date: 2015-10-14 Impact factor: 2.649

4. Aroma-volatile profile of black morel (Morchella importuna) grown in Israel.

Authors: Zipora Tietel; Segula Masaphy
Journal: J Sci Food Agric Date: 2017-07-28 Impact factor: 3.638

5. Intraspecific and intragenomic variability of ITS rDNA sequences reveals taxonomic problems in Ceratocystis fimbriata sensu stricto.

Authors: T C Harrington; M R Kazmi; A M Al-Sadi; S I Ismail
Journal: Mycologia Date: 2014 Mar-Apr Impact factor: 2.696

Review 6. Artificial cultivation of true morels: current state, issues and perspectives.

Authors: Qizheng Liu; Husheng Ma; Ya Zhang; Caihong Dong
Journal: Crit Rev Biotechnol Date: 2017-06-06 Impact factor: 8.429

7. Intersterility, morphology and taxonomy of Ceratocystis fimbriata on sweet potato, cacao and sycamore.

Authors: Christine J Baker Engelbrecht; Thomas C Harrington
Journal: Mycologia Date: 2005 Jan-Feb Impact factor: 2.696

8. IMA Genome-F 1: Ceratocystis fimbriata: Draft nuclear genome sequence for the plant pathogen, Ceratocystis fimbriata.

Authors: P Markus Wilken; Emma T Steenkamp; Michael J Wingfield; Z Wilhelm de Beer; Brenda D Wingfield
Journal: IMA Fungus Date: 2013-12-06 Impact factor: 3.515

9. Species concepts in Cercospora: spotting the weeds among the roses.

Authors: J Z Groenewald; C Nakashima; J Nishikawa; H-D Shin; J-H Park; A N Jama; M Groenewald; U Braun; P W Crous
Journal: Stud Mycol Date: 2013-06-30 Impact factor: 16.097

10. Redefining Ceratocystis and allied genera.

Authors: Z W de Beer; T A Duong; I Barnes; B D Wingfield; M J Wingfield
Journal: Stud Mycol Date: 2014-09 Impact factor: 16.097

13 in total

1. Apc.LaeA and Apc.VeA of the velvet complex govern secondary metabolism and morphological development in the echinocandin-producing fungus Aspergillus pachycristatus.

Authors: Nan Lan; Qun Yue; Zhiqiang An; Gerald F Bills
Journal: J Ind Microbiol Biotechnol Date: 2019-11-23 Impact factor: 3.346

2. Nine draft genome sequences of Claviceps purpurea s.lat., including C. arundinis, C. humidiphila, and C. cf. spartinae, pseudomolecules for the pitch canker pathogen Fusarium circinatum, draft genome of Davidsoniella eucalypti, Grosmannia galeiformis, Quambalaria eucalypti, and Teratosphaeria destructans.

Authors: Brenda D Wingfield; Miao Liu; Hai D T Nguyen; Frances A Lane; Seamus W Morgan; Lieschen De Vos; P Markus Wilken; Tuan A Duong; Janneke Aylward; Martin P A Coetzee; Kasia Dadej; Z Wilhelm De Beer; Wendy Findlay; Minette Havenga; Miroslav Kolařík; Jim G Menzies; Kershney Naidoo; Olivia Pochopski; Parivash Shoukouhi; Quentin C Santana; Keith A Seifert; Nicole Soal; Emma T Steenkamp; Catherine T Tatham; Margriet A van der Nest; Michael J Wingfield
Journal: IMA Fungus Date: 2018-12-14 Impact factor: 3.515

3. Phylogenomic incongruence in Ceratocystis: a clue to speciation?

Authors: Aquillah M Kanzi; Conrad Trollip; Michael J Wingfield; Irene Barnes; Magriet A Van der Nest; Brenda D Wingfield
Journal: BMC Genomics Date: 2020-05-14 Impact factor: 3.969

Review 4. Echinocandins: structural diversity, biosynthesis, and development of antimycotics.

Authors: Wolfgang Hüttel
Journal: Appl Microbiol Biotechnol Date: 2020-12-03 Impact factor: 4.813

5. IMA genome - F14 : Draft genome sequences of Penicillium roqueforti, Fusarium sororula, Chrysoporthe puriensis, and Chalaropsis populi.

Authors: Magriet A van der Nest; Renato Chávez; Lieschen De Vos; Tuan A Duong; Carlos Gil-Durán; Maria Alves Ferreira; Frances A Lane; Gloria Levicán; Quentin C Santana; Emma T Steenkamp; Hiroyuki Suzuki; Mario Tello; Jostina R Rakoma; Inmaculada Vaca; Natalia Valdés; P Markus Wilken; Michael J Wingfield; Brenda D Wingfield
Journal: IMA Fungus Date: 2021-03-05 Impact factor: 3.515

6. Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars.

Authors: Marlena Gąsior-Głogowska; Monika Szefczyk; Witold Dyrka; Natalia Szulc
Journal: BMC Bioinformatics Date: 2021-04-29 Impact factor: 3.169

7. Dual RNA-Seq Analysis of the Interaction Between Edible Fungus Morchella sextelata and Its Pathogenic Fungus Paecilomyces penicillatus Uncovers the Candidate Defense and Pathogenic Factors.

Authors: Yang Yu; Hao Tan; Tianhai Liu; Lixu Liu; Jie Tang; Weihong Peng
Journal: Front Microbiol Date: 2021-12-02 Impact factor: 5.640

8. Subchromosome-Scale Nuclear and Complete Mitochondrial Genome Characteristics of Morchella crassipes.

Authors: Wei Liu; Yingli Cai; Qianqian Zhang; Fang Shu; Lianfu Chen; Xiaolong Ma; Yinbing Bian
Journal: Int J Mol Sci Date: 2020-01-12 Impact factor: 5.923

9. Characterization of the Ergosterol Biosynthesis Pathway in Ceratocystidaceae.

Authors: Mohammad Sayari; Magrieta A van der Nest; Emma T Steenkamp; Saleh Rahimlou; Almuth Hammerbacher; Brenda D Wingfield
Journal: J Fungi (Basel) Date: 2021-03-22

10. Increased abundance of secreted hydrolytic enzymes and secondary metabolite gene clusters define the genomes of latent plant pathogens in the Botryosphaeriaceae.

Authors: Jan H Nagel; Michael J Wingfield; Bernard Slippers
Journal: BMC Genomics Date: 2021-08-04 Impact factor: 3.969