Javier F Tabima1, Ian A Trautman2, Ying Chang2, Yan Wang3,4, Stephen Mondo5, Alan Kuo5, Asaf Salamov5, Igor V Grigoriev6,5, Jason E Stajich3,4, Joseph W Spatafora2. 1. Department of Botany and Plant Pathology, College of Agricultural Sciences, Oregon State University, Corvallis tabimaj@oregonstate.edu. 2. Department of Botany and Plant Pathology, College of Agricultural Sciences, Oregon State University, Corvallis. 3. Department of Microbiology and Plant Pathology, University of California-Riverside, California. 4. Institute for Integrative Genome Biology, University of California-Riverside, California. 5. US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, California. 6. Department of Plant and Microbial Biology, University of California-Berkeley, California.
Abstract
Research into secondary metabolism (SM) production by fungi has resulted in the discovery of diverse, biologically active compounds with significant medicinal applications. The fungi rich in SM production are taxonomically concentrated in the subkingdom Dikarya, which comprises the phyla Ascomycota and Basidiomycota. Here, we explore the potential for SM production in Mucoromycota and Zoopagomycota, two phyla of nonflagellated fungi that are not members of Dikarya, by predicting and identifying core genes and gene clusters involved in SM. The majority of non-Dikarya have few genes and gene clusters involved in SM production except for the amphibian gut symbionts in the genus Basidiobolus Basidiobolus genomes exhibit an enrichment of SM genes involved in siderophore, surfactin-like, and terpene cyclase production, all these with evidence of constitutive gene expression. Gene expression and chemical assays also confirm that Basidiobolus has significant siderophore activity. The expansion of SMs in Basidiobolus are partially due to horizontal gene transfer from bacteria, likely as a consequence of its ecology as an amphibian gut endosymbiont.
Research into secondary metabolism (SM) production by fungi has resulted in the discovery of diverse, biologically active compounds with significant medicinal applications. The fungi rich in SM production are taxonomically concentrated in the subkingdom Dikarya, which comprises the phyla Ascomycota and Basidiomycota. Here, we explore the potential for SM production in Mucoromycota and Zoopagomycota, two phyla of nonflagellated fungi that are not members of Dikarya, by predicting and identifying core genes and gene clusters involved in SM. The majority of non-Dikarya have few genes and gene clusters involved in SM production except for the amphibian gut symbionts in the genus Basidiobolus Basidiobolus genomes exhibit an enrichment of SM genes involved in siderophore, surfactin-like, and terpene cyclase production, all these with evidence of constitutive gene expression. Gene expression and chemical assays also confirm that Basidiobolus has significant siderophore activity. The expansion of SMs in Basidiobolus are partially due to horizontal gene transfer from bacteria, likely as a consequence of its ecology as an amphibian gut endosymbiont.
Fungi produce a wealth of biologically active small molecules – secondary or specialized metabolites – that function in interactions with other organisms, environmental sensing, growth and development, and numerous other processes (Rokas ). Several of these compounds have led to the successful development of pharmaceuticals (e.g., antibiotics, immunosuppressants, statins, etc.) that have had dramatic and positive impacts on human health. Understanding the evolution of fungal secondary metabolites and linking them with their ecological and physiological functions in nature can inform searches for compounds with applications in human society.Secondary metabolism (SM) is imprecisely defined but can be characterized generally as the production of bioactive compounds that are not part of primary metabolism and that are not required for growth and survival in the laboratory (Keller ; Brakhage 2013; Rokas ). In fungi, the genes responsible for the synthesis of secondary metabolites are frequently co-located in biosynthetic gene clusters (Smith ; Brakhage 2013), which contain the genes that control regulation of expression, biosynthesis, tailoring, and transport of these compounds out of the cell (Smith ; Keller ; Osbourn 2010). In the kingdom Fungi, the diversity of products synthesized via SM is substantial and primarily includes alkaloids, peptides, polyketides, and terpenes (Collemare ; Helaly ). Each of these groups of compounds are synthesized by core genes that are characteristic of the pathways and include, but are not limited to, dimethylallyl tryptophan synthases (DMAT), non-ribosomal peptide synthetases (NRPS), polyketide synthetases (PKS), and terpene cyclases (TC). These bioactive compounds fulfill various roles that are hypothesized to increase the fitness of the fungus by promoting better recognition and adaptation to environmental cues.Biosynthesis of secondary metabolites is heterogeneous across the fungal tree of life, but the vast majority of discovered and predicted secondary metabolites are reported within the fungal phyla Ascomycota and Basidiomycota of the subkingdom Dikarya (Wisecaver ; Rokas ). Filamentous ascomycetes are the major producers of polyketide and peptidic secondary metabolites (Rokas ) (e.g., penicillin, cyclosporin, etc.), with the majority of genes, gene clusters, and enzymes involved with fungal SM found in the subphylum Pezizomycotina (Collemare ; Wisecaver ; Helaly ; Rokas ). Although less than Ascomycota, Basidiomycota is also a prominent producer of SM, including diverse terpene genes (e.g., sesquiterpenes, Quin ), some of the better-known hallucinogens (e.g., psilocybin of Psilocybe; Reynolds ), and compounds toxic to humans (e.g., amanitin of Amanita; Luo et al. 2010).For reasons that are unclear, the remainder of kingdom Fungi is characterized by a paucity of secondary metabolites (Voigt ). This includes the zoosporic fungi and relatives classified in Blastocladiomycota, Chytridiomycota and Rozellomycota, and the nonflagellated, zygomycete fungi of Mucoromycota and Zoopagomycota. This pattern of SM diversity supports the hypothesis that diversification of secondary metabolism is a characteristic of Ascomycota and Basidiomycota (subkingdom Dikarya), which share a more recent common ancestor relative to the other phyla. Recent genome sampling efforts have focused on increased sequencing of non-Dikarya species (Nagy ; Kohler ; Spatafora ; Quandt ; Ahrendt ). These efforts have provided a better understanding of the relationships of the phyla of kingdom Fungi (e.g., Spatafora ), and processes and patterns that shaped the evolution of morphologies (e.g., Nagy ) and ecologies (e.g., Chang ; Quandt ) within the kingdom. The availability of a diversity of these genomes provides an opportunity to characterize and focus on the secondary metabolism composition of non-Dikarya taxa, which have remained relatively unexplored.While the majority of non-Dikarya taxa have low SM diversity, genomic sequencing of the genus Basidiobolus (Phylum Zoopagomycota) revealed that it possesses an unusually large composition of SM gene clusters. Species of Basidiobolus have complex life cycles, that involve the production of multiple spore types that occur in various environmental niches. Species have been found in the digestive tracts and feces of fish, reptiles, and amphibians (Nickerson and Hutchison 1971), where their function and impact on the host remains unknown. They have also been isolated from cadavers of mites and collembola species (Manning ; Werner , 2018), centipedes (Fonseca ), spiders (Henriksen ), and associated with the gut microbiota of rove beetles (Stefani ). Basidiobolus is known from leaf litter (Smith and Callaghan 1987; Manning and Callaghan 2008) and it has been identified as part of the microbiome of aquatic carnivorous plants (Sirová ). Finally, species of the genus have also been implicated in opportunistic infections of humans and other mammals (Vikram ; Geramizadeh ; Okada ). Across these environments Basidiobolus is dispersed by both forcibly discharged asexual spores (blastoconidia) and passively dispersed asexual spores (capilloconidia) that adhere to exoskeletons of small insects. Basidiobolus also reproduces sexually through the production of zygospores (meiospores) either by selfing (homothallic) or outcrossing (heterothallic) according to species. Based on these diverse ecologies, Basidiobolus must have adapted for survival in numerous environmental niches including the digestive systems of diverse animal species, feces, insect phoresis, and on decaying plant matter or leaf litter.In this study we demonstrate that the genomes of Basidiobolus contain a larger number of genes related to SM than predicted by phylogeny and that in several cases the evolution of many of these SM genes is inconsistent with vertical evolution. Our objectives were to: i) characterize the diversity of SM in Basidiobolus, ii) identify the phylogenetic sources of this diversity, and iii) determine which classes of SM gene clusters are functional and may predict the secondary metabolites produced by species of Basidiobolus. Finally, we propose a model in which the amphibian gastrointestinal system is an environment that promotes noncanonical evolution of its fungal inhabitants.
Materials And Methods
Data collection
Annotated genome and amino-acid translation of predicted gene model sequences for three isolates of two species within the genus Basidiobolus were used in this study: Basidiobolus meristosporus CBS 931.73 (Mondo ), isolated from gecko dung in the locality of Lamco, Ivory Coast; B. meristosporus B9252 (Chibucos ) isolated from human eye in Saudi Arabia; and B. heterosporus B8920 (Chibucos ) isolated from plant debris in India. The genomic sequence of B. meristosporus CBS 931.73 was sequenced with PacBio and annotated by Mondo and obtained from the US Department of Energy Joint Genome Institute MycoCosm genome portal (https://mycocosm.jgi.doe.gov; Grigoriev ). Genomic sequences and annotation of B. meristosporus B9252 and B. heterosporus B8920 sequenced by Chibucos were obtained directly from the authors. The raw reads for these two species are available in GenBank (Accession numbers GCA_000697375.1 and GCA_000697455.1). Additional genomes of 66 Mucoromycota and Zoopagomycota species were used in this study (Ma , Tisserant , Wang , Schwartze , Chang , Chibucos , Corrochano , Lastovetsky , Wang , Mondo et at. 2017, Uehling , Ahrendt , Chen ). Data sources for the genomes included in these analyses are available in Table S1
Secondary metabolite gene cluster prediction
Predicted SM proteins were retrieved for genomes of 66 Mucoromycota and Zoopagomycota species (not including Basidiobolus) based on the Secondary Metabolite Unique Regions Finder (SMURF) (Khaldi ) predictions available at MycoCosm for NRPS and PKS, and using local prediction with AntiSMASH V4.2.0 (Blin ) for Terpene Cyclases. Secondary Metabolite gene clusters for the three genomes of Basidiobolus were predicted with AntiSMASH v4.2.0 and SMURF from the annotated genomes of the three isolates used in this study. The AntiSMASH prediction was performed on local HPC, while SMURF predictions were obtained by submission of the genomes to SMURF web server (). Predictions were contrasted manually to include all clusters of secondary metabolites from both sources. Orthologous sets of core SM genes across the three genomes of Basidiobolus were identified using OrthoFinder (Emms and Kelly 2015).
Secondary metabolite expression analysis
To assess the expression of predicted SM proteins, we calculated summarized counts of RNA transcript per million (TPM) for genes in B. meristosporus CBS 931.73 isolate by aligning RNA-Seq reads to the assembled B. meristosporus CBS 931.73 genome with HiSat v2.1.0 (Kim et al. 2019). The aligned sequence reads were processed with HTS-seq (Anders ) to generate the counts of overlapping reads found for each gene and the normalized TPM for the genes was calculated using the cpm function in edgeR (Robinson ). A distribution of RNA-Seq read counts per gene was plotted using the ggplot2 package in the R statistical framework (R Core Team 2018).
Phylogenomic analyses
Phylogenetic analyses were used to assess the evolutionary relationships of the NRPS, PKS, and terpene cyclase/synthase predicted proteins. For the NRPS genes, the adenylation domains (A-domains) were identified by hmmsearch from HMMer 3.0 suite (Eddy 2004), using the A-domain profile reported by Bushley and Turgeon (2010) as a reference profile HMM. The predicted A-domains were extracted from the resulting HMMER table into a FASTA file using the esl-reformat program included in the HMMER suite. The predicted A-domains for all Basidiobolus, Mucoromycota and Zoopagomycota species were added to an A-domain amino-acid alignment reported by Bushley and Turgeon (2010) with MAFFT v7 (Katoh ) (File S1). This reference alignment contains A-domains from NRPS proteins from nine major subfamilies of fungal and bacterial NRPS proteins. The phylogenetic domain tree was constructed using a maximum likelihood approach implemented in RAxML v. 8.2.11 (Stamatakis 2014) with the JTT amino acid substitution matrix, after model selection using the PROTGAMMAAUTO option, and 1000 bootstrap replicates (raxmlHPC-PTHREADS -T 12 -n NRPS -s infile.fasta -f a -x 12345 -p 12345 -m PROTCATJTT –N 1000). A graphical representation of the A-domains from the NRPS/NRPS-like core gene models was constructed by coloring the A-domain position in the core gene according to its phylogenetic origin using the ggplot2 R package (Wickham 2016).For the PKS genes, the KS domains of all predicted PKS proteins from the Basidiobolus, Mucoromycota and Zoopagomycota species were identified using hmmsearch, using the KS domain profile (PF001009) available in PFAM v31 (Finn ). The predicted KS domains for all Basidiobolus, Mucoromycota and Zoopagomycota species were added to an existing KS domain amino-acid alignment reported by Kroken using MAFFT v7 (Katoh ) (File S2). This existing alignment contained KS domains from PKS proteins from reduced and unreduced PKS from bacterial and fungal species. The Kroken database was expanded by adding predicted KS domains from PKS proteins of additional published fungal genomes in order to include more fungal diversity in the dataset: Eight Ascomycota isolates (Aspergillus nidulans FGSC A4, Beauveria bassiana ARSEF 2860 (Xiao ), Capronia coronata CBS 617.96 (Teixeira ), Capronia semiimmersa CBS 27337 (Teixeira ), Cladophialophora bantiana CBS 173.52 (Teixeira ), Cochliobolus victoriae FI3 v1.0 (Condon ), Microsporum canis CBS 113480 (Martinez ), Trichoderma atroviride v2.0), nine Basidiomycota isolates (Acaromyces ingoldii MCA 4198 v1.0 (Kijpornyongpan ), Fibroporia radiculosa TFFH 294 (Tang ), Fomitiporia mediterranea v1.0 (Floudas ), Gloeophyllum trabeum v1.0 (Floudas ), Gymnopus luxurians v1.0 (Kohler ), Laccaria bicolor v2.0 (Martin ), Microbotryum lychnidis−dioicae p1A1 Lamole (Perlin ), Piloderma croceum F 1598 v1.0 (Kohler ), Pisolithus tinctorius Marx 270 v1.0 (Kohler )), four Neocallimastigomycota isolates (Anaeromyces robustus v1.0 (Haitjema ), Orpinomyces sp. (Youssef ), Piromyces finnis v3.0 (Haitjema ), Piromyces sp. E2 v1.0 (Haitjema )) and one Chytridiomycota species (Spizellomyces punctatus DAOM BR117 (Russ )). The predicted PKS proteins from additional species were all obtained from the DOE-JGI MycoCosm genome portal by searching for all genes with “PKS” on the SM annotation from MycoCosm. The KS domains were identified for the subset of PKS proteins using a HMMER KS profile as mentioned above. A phylogenetic tree was reconstructed using maximum likelihood using all KS domains using similar parameters described in the NRPS step. To determine if the novel PKS candidates had the common domains of PKS proteins (AT, KR, KS, DH and PP), an identification of these profiles were performed based on HMMER models from PFAM (Finn ) and SMART (Schultz ) databases (KS (PF001009), AT (SM00827), KR (SM00822), DH (SM00826), and PP (SM00823))The terpene cyclase (TC) proteins were predicted in AntiSMASH for all 69 assembled genome sequences from Basidiobolus, Mucoromycota and Zoopagomycota. To include additional fungal TC proteins in the phylogenetic reconstruction, we identified TC proteins from 58 published genomes of Dikarya isolates (21 Basidiomycetes and 37 Ascomycetes, Table S3 (Armaleo , Burmester , DiGuistini , Yang , Zuccaro , Floudas , Morin , Ohm , Padamsee , Xiao , Gostincar et al. 2014, Pendleton , Riley , Terfehr , Wang , Kohler , Okagaki , Nagy , Nguyen , Teixeira , Martino , Vesth , Crouch et al. 2020, Haridas )), available in DOE-JGI MycoCosm, each from a different family. One isolate per family was randomly selected for the analysis. To screen for TC proteins in Dikarya, OrthoFinder was used to build orthologous clusters of genes between Basidiobolus, Mucoromycota and Zoopagomycota TC and the Dikarya proteome dataset. Dikarya proteins clustered within orthologous groups that contain TC of Basidiobolus, Mucoromycota and Zoopagomycota were considered valid TC orthologs and used in downstream analyses. Finally, we identified bacterial TC to evaluate the potential for HGT into Basidiobolus, Mucoromycota or Zoopagomycota species. Bacterial TC’s were identified by screening each protein from the RefSeq, release 87 (May 2018, O’Leary ) FASTA dataset against a BLAST database (Altschul ), one from each of the zygomycete orthologous groups identified by OrthoFinder in the previous step, for a total of four protein sequences in the database. The BLASTP program was used to perform the searches, using an e-value of 1e-10 and the BLOSUM62 substitution matrix. Positive matches from the BLASTP assay were used as bacterial TC for subsequent analyses. A multi-sequence alignment containing all predicted TC from Basidiobolus, Mucoromycota and Zoopagomycota species, the orthologous TC from additional fungal species, and the TC proteins from reference bacterial predicted gene models result of the BLASTP search was performed in MAFFT v7 using the G-INS-1 algorithm for progressive global alignment (Katoh ) (File S5). A phylogenetic tree was reconstructed using maximum likelihood in RAxML using similar parameters as mentioned before.
Horizontal gene transfer in Basidiobolus species
Identification of genic regions with evidence for horizontal gene transfer (HGT) from bacteria was performed by searching all translated proteins from the predicted gene models of the Basidiobolus isolates against a custom BLAST protein database. This database included all amino-acid sequences available in the NCBI RefSeq proteomics database and all available amino-acid sequences for Mucoromycota and Zoopagomycota species at the DOE-JGI MycoCosm genome portal. The proteins were searched with BLASTP against the combined RefSeq/Mucoromycota/Zoopagomycota custom database, using an e-value cutoff of 1e-10 and the BLOSUM62 substitution matrix. A summary of the taxonomy identifier for all best hits was obtained by identifying whether the best hit was a Mucoromycota/Zoopagomycota protein, or a RefSeq protein. We used the rentrez package (Winter 2017) in the R statistical framework to extract the top-ten taxonomic identifiers from the NCBI database when hits did not correspond to Mucoromycota/Zoopagomycota. Proteins that had no hits to a fungal protein and only hits to a bacterial protein were considered candidate HGT genes.To increase the accuracy of the prediction of genes product of HGT, we tested whether HGT candidate genes showed signs of errant assembly or in silica incorporation into the Basidiobolus genomes. The mean genomic read coverage of each candidate gene was calculated and compared to the mean coverage of the scaffold harboring the HGT candidate gene for all three Basidiobolus isolates. A z-score was calculated in R to determine the number of standard deviations of the HGT candidate from the mean coverage of the harboring scaffold. All HGT candidates with standard deviations greater or less than two were removed from the analysis. The genomic reads were mapped to each reference genome using BWA (Li and Durbin 2009). Coverage per HGT and harboring scaffold was estimated using samtools depth (Li ). A summary plot of the proportion of genes with bacterial best hits was constructed using the ggplot2 package in R. Lastly, the best taxonomic hit of genes that co-occur on common scaffolds with HGT candidate genes was assayed to determine if the HGT genes are located in areas with fungal genes. HGT candidates that had no fungal genes either upstream or downstream from the HGT candidates were removed from subsequent HGT analyses.To identify differences in the GC content between HGT candidates and genes of fungal origin, ANOVA was performed between the mean GC content for HGT candidates, fungal genes on common scaffolds with HGT genes, and all fungal genes. ANOVA was performed on Basidiobolus meristosporus CBS 931.73, as this genome possessed the longest contigs and highest quality assembly. A Tukey’s HSD test was performed to test significant differences between each set of genes. In addition, to identify if the HGT candidate genes had a reduced number of introns than the fungal genes, a summary of the number of introns and normalized intron length (intron length divided by gene model length) was performed in the GenomicRanges R package (Lawrence ). A Kruskal-Wallis test was performed in R to identify significant differences in the number of introns and the normalized intron length for the HGT candidates and the fungal genes. A nucleotide composition analysis based on 5-mers and codon usage was performed to observe differences between candidate HGT genes and fungal genes for Basidiobolus meristosporus CBS 931.73. The 5-mer analysis was conducted in all predicted coding sequences from the annotated gene models using the oligonucleotideFrequency function of the Biostring R package (Pagès et al. 2019). Codon usage was estimated using the uco function of the seqinr R package (Charif and Lobry 2007). Both these indices were divided by gene length to normalize the nucleotide composition by the effect of gene length. Principal component analyses were performed in R for both 5-mer and codon usage analysis to compare the HGT candidates to the fungal candidate genes. Lastly, the putative functions of the genes with evidence for HGT were summarized using the functional annotations available in the gene format file (GFF) of B. meristosporus CBS 931.73.
Measuring siderophore activity in Basidiobolus
To measure the siderophore activity (chelation of ferric ions) of Basidiobolus, an assay of detection of siderophore activity based on the universal chrome azurol S (CAS) assay (Andrews ) was performed for the strain Basidiobolus meristosporus CBS 931.73. This colorimetric assay uses a complex of Fe(III) – CAS – DDAPS (Surfactant). When this complex is combined with acetateyeastagar (AY agar), it results in a greenish-blue color, where the color changes to yellow upon the removal of the iron. A total of 450 ml of AY agar (Andrews ) was mixed with 50 ml of autoclaved 10X CAS assay (20 ml of 10 mM Fe(NO3)3, 40 ml of 10 mM chrome azurol S, and 100 ml of 10 mM DDAPS). A 10 cm layer of the AY-CAS media was poured in small petri dishes and cooled. An upper layer of 10 cm of AY agar was poured after cooling, and the media was left to diffuse overnight and stored at 4° for 24 hr. B. meristosporus, Conidiobolus thromboides FSU 785 (negative control), and Cladosporium sp. from the herbarum species complex PE-07 (positive control) were transferred into the AY-CAS plates by transferring a small amount of mycelium via a sterile toothpick and piercing the media in the center. The assay was performed in triplicate for each isolate used. Cultures were grown at room temperature for 12 days. Siderophore activity was measured as the area of the plate that has changed to yellow color, when compared to a negative control and an empty petri dish with AY-CAS agar. Pictures of the plates were taken. These images were imported into Adobe Photoshop CC 2019 and size-corrected to 5.5 cm (diameter of the small petri dish). All images were concatenated in the same file. The Color Range tool of Adobe Photoshop CC tool was used to measure the yellow area for all plates. The area measurements were exported into R, where an analysis of variance was performed to determine significant differences across the siderophore activity of the strains.
Data availability
File S1 includes the ortholog groups of SM for Basidiobolus species. File S2 includes the alignment of all A-domains from NRPS sequences used in the study. File S3 includes the alignment of A-domains from Zoopagomycota and Mucoromycota NRPS sequences. File S4 includes the KS alignment for all PKS sequences used in this study. File S5 includes all terpene cyclase core genes used in this study. Figure S1 contains maximum likelihood reconstruction of NRPS-A domains. Figure S2 contains graphical representation of cluster 5 from B. meristosporus CBS 931.73. Figure S3 contains phylogenetic sources and abundance of KS domains. Figure S4 contains maximum likelihood reconstruction of PKS KS domains. Figure S5 contains presence and absence of domains characteristic of PKS core gene models in non-zygomycete fungal species. Figure S6 contains presence and absence of domains characteristic of PKS core gene models in zygomycete fungal species. Figure S7 contains phylogenetic sources and abundance of terpene cyclases. Figure S8 includes the proof of concept for HGT assays. Figure S9 includes intron number comparisons for HGT genes. Figure S10 includes a PCA 5-mer analysis for HGT genes. Figure S11 includes a PCA for codon usage analysis for HGT genes. Figure S12 includes siderophore activity assays. Table S1 includes all Mucoromycota and Zoopagomycota isolates used in the manuscript. Table S2 is a summary of secondary metabolite genes predicted for Basidiobolus species. Table S3 includes the isolates of Dikarya used for the terpene cyclase assay. Table S4 includes the summary of hits of HGT genes against NBCI taxonomy. Table S5 is a summary of gene annotations from HGT candidates. Supplementary files, tables and figures have been deposited to figshare. All additional files included as supplementary materials in this manuscript and custom scripts used for this analysis can be found in the ZyGoLife GitHub repository (https://github.com/zygolife/Basidiobolus_SM_repo). Supplemental material available at figshare: https://doi.org/10.25387/g3.12691859.
Results
A total of 834 SM gene clusters were predicted for the 69 Mucoromycota and Zoopagomycota genomes including 117 non-ribosomal peptide synthetases (NRPS), 204 NRPS-like, 103 polyketide synthases (PKS), 97 PKS-like, one PKS-NRPS hybrid, and 312 terpene cyclase (TC) gene models across both phyla (Figure 1, Table S1). For Zoopagomycota, 300 SM gene models were predicted, including one NRPS-PKS hybrid, 72 NRPS, 61 NRPS-Like, 60 PKS, 47 PKS-Like, and 59 TC gene models. In Mucoromycota, 534 total SM gene models were predicted, including 45 NRPS, 143 NRPS-Like, 43 PKS, 50 PKS-Like, and 253 TC gene models.
Figure 1
Genome size, number of genes, proportion of secondary metabolism (SM) gene clusters and number of predicted SM gene clusters for 69 Zoopagomycota and Mucoromycota sequenced genomes. Color code represents the category of SM predicted per isolate. NRPS: Non-ribosomal peptide synthetases. PKS: Polyketide synthases. Glom.: Glomeromycotina. Mort.: Morteriellomycotina. Ent.: Entomophtoromycotina. Kickx.: Kickxellomycotina. Zoo.: Zoopagomycotina
Genome size, number of genes, proportion of secondary metabolism (SM) gene clusters and number of predicted SM gene clusters for 69 Zoopagomycota and Mucoromycota sequenced genomes. Color code represents the category of SM predicted per isolate. NRPS: Non-ribosomal peptide synthetases. PKS: Polyketide synthases. Glom.: Glomeromycotina. Mort.: Morteriellomycotina. Ent.: Entomophtoromycotina. Kickx.: Kickxellomycotina. Zoo.: ZoopagomycotinaThe highest number of SM gene clusters were predicted for Basidiobolus. A total of 38 gene clusters and 44 SM core genes were predicted in the B. meristosporus CBS 931.73 genome, 40 SM gene clusters and 44 SM core genes for B. meristosporus B9252, and 23 SM gene clusters and 23 SM core genes for B. heterosporus B8920 (Table 1; Table S2). Seventy-eight percent of the SM gene models predicted were found to be shared across the Basidiobolus isolates. Ten SM core genes were found to be unique to B. meristosporus CBS 931.73, 10 SM genes unique to B. meristosporus B9252, and 5 SM genes unique to B. heterosporus B8920 (File S1).
Table 1
Predicted secondary metabolite (SM) core genes for Basidiobolus isolates used in this study. NRPS: Non-ribosomal peptide synthetases, PKS: Polyketide synthases, TC: Terpene cyclases
Isolate
Total SM
NRPS/NRPS-like
PKS/PKS-Like
NRPS-PKS hybrids
TC
B. meristosporus CBS 931.73
44
30
4
0
10
B. meristosporus B9252
44
30
7
1
6
B. heterosporus B8920
23
18
1
0
4
The next three isolates with the most numerous predicted SM gene clusters, not including Basidiobolus genomes, were Dimargaris cristalligena RSA 468 with 33 predicted SM proteins (21 NRPS, 7 NRPS-Like, 2 PKS-Like, 3 TC), Linderina pennispora ATCC 12442 V 1.0 with 23 SM predicted (1 NRPS, 15 PKS, 3 PKS-like, 4 TC), and Martensiomyces pterosporus CBS 209.56 v1.0 with 23 SM proteins predicted (1 NRPS, 5 PKS, 14 PKS-Like, 3 TC). No DMAT gene models were predicted for any member of Mucoromycota or Zoopagomycota.
Expression of core SM genes
A total of 83.45% of the RNA sequenced reads were mapped uniquely to the reference genome of B. meristosporus CBS 931.73, while 12.08% of the reads were mapped in more than one location. Only 4.47% of the RNA sequenced reads did not map to the reference genome. The majority of predicted SM for B. meristosporus CBS 931.73 were expressed at the same or higher levels than constitutive housekeeping genes, such as Beta-tubulin, Elongation Factor 1, Actin, and Ubiquitin (Figure 2). The highest expressed SM core genes per SM group were: NRPS – gene model 387529 (Cluster 5) with 74.03 transcripts per million (TPM) mapped; NRPS-like – gene model 221915 (Cluster 20) with 45.58 TPM mapped; PKS – gene model 290138 (Cluster 37) with 687.55 TPM mapped; PKS-like – gene model 207695 (Cluster 38) with 26.15 TPM mapped, and Terpene cyclase – gene model 301341 (Cluster 13) with 78.26 TPM mapped.
Figure 2
Distribution of number of RNAseq counts in transcripts per million (TPM) per genic feature from B. meristosporus CBS 931.73. Colors represent SM and genes of interest. X-axis represents the TPM count. Y-axis represents distribution of TPM. The scale is in log(TPM), and the values are in TPM absolute values. Histogram represents the distribution of mapped reads in TPM for all predicted gene models with non-zero TPM values across the B. meristosporus CBS 931.73 genome.
Distribution of number of RNAseq counts in transcripts per million (TPM) per genic feature from B. meristosporus CBS 931.73. Colors represent SM and genes of interest. X-axis represents the TPM count. Y-axis represents distribution of TPM. The scale is in log(TPM), and the values are in TPM absolute values. Histogram represents the distribution of mapped reads in TPM for all predicted gene models with non-zero TPM values across the B. meristosporus CBS 931.73 genome.
Phylogenetic analysis of NRPS/NRPS-Like A-domains
The phylogenetic reconstruction of A-domains was performed with 951 A-domains from a combined dataset including the A-domain dataset from Bushley and Turgeon (2010) and the predictions of NRPS/NRPS-like A-domains from Mucoromycota and Zoopagomycota genome sequences (File S2). A total of 395 NRPS A-domains were predicted for Basidiobolus, Mucoromycota and Zoopagomycota genome sequences (File S3). The phylogenetic analyses recovered the nine major families reported by Bushley and Turgeon (2010) with the addition of the two new clades, surfactin-like and ChNSP 12-11-like, reported here. The total number of Mucoromycota/Zoopagomycota A-domains and their distribution across these clades are as follows: 74 to the AAR clade; 115 to the major bacterial clade (MBC), including 104 A-domains with the surfactin-like clade with and an additional 11 A-domains scattered elsewhere in the MBC; the CYCLO clade with eight A-domains; 65 A-domains to the ChNSP 12-11-like clade; 25 A-domains to the ChNSP 12-11 clade; 11 A-domains to the PKS/NRPS clade; 16 A-domains to the SID clade; 76 A-domains to the SIDE clade; and 5 A-domains to the EAS clade (Figure 3). No A-domains were clustered into the ACV clade or the ChNSP 10 clade, and the remaining A-domains grouped within the outgroup clade (Figures 3 and 4, Figure S1). Similar phylogenetic origins for multiple A-domains were found in a single NRPS core gene (Figure 4).
Figure 3
Phylogenetic sources and abundance of A-domains from NRPS predicted gene models for Zoopagomycota and Mucoromycota species. The maximum likelihood phylogenetic tree (left) represent a simplification of the reconstructed tree which includes the clades that include more than one A-domain from Mucoromycota/Zoopagomycota genomes. The heatmap (right) represents the abundances A-domains predicted for each domain clustered within each clade, including the dikarya species included in the analysis. Basidiobolus species are enriched in NRPS from the major bacterial clade (MBC), cyclosporin (CYCLO), surfactin-like, and siderophore (SIDE, SID) clades.
Figure 4
Graphical representation of the A-domains of each NRPS core gene predicted for Basidiobolus genomes. Horizontal gray lines represent the length of the predicted NRPS core gene. Arrows represent A-domains and are located in the position within the gene model. Colors represent the NRPS cluster assigned in Fig. 3. Numbers represent the name of each gene model and predicted SM cluster for each genome.
Phylogenetic sources and abundance of A-domains from NRPS predicted gene models for Zoopagomycota and Mucoromycota species. The maximum likelihood phylogenetic tree (left) represent a simplification of the reconstructed tree which includes the clades that include more than one A-domain from Mucoromycota/Zoopagomycota genomes. The heatmap (right) represents the abundances A-domains predicted for each domain clustered within each clade, including the dikarya species included in the analysis. Basidiobolus species are enriched in NRPS from the major bacterial clade (MBC), cyclosporin (CYCLO), surfactin-like, and siderophore (SIDE, SID) clades.Graphical representation of the A-domains of each NRPS core gene predicted for Basidiobolus genomes. Horizontal gray lines represent the length of the predicted NRPS core gene. Arrows represent A-domains and are located in the position within the gene model. Colors represent the NRPS cluster assigned in Fig. 3. Numbers represent the name of each gene model and predicted SM cluster for each genome.Of fungi classified in Mucoromycota or Zoopagomycota, Basidiobolus genomes contained the most A-domains clustered in the EAS, SIDE, and CYCLO clades. The A-domains clustered within EAS represent two domains from B. meristosporus CBS 931.73, and one A-domain of Rhizophagus irregularis DAOM 197198 v2.0. The A-domains clustered within SIDE represent 76 domains, 44 from B. meristosporus CBS 931.73, 19 from B. meristosporus B9252, and 13 from B. heterosporus. The A-domains clustered within the CYCLO clade represent six domains, four from B. meristosporus CBS 931.73 and two from B. meristosporus B9252. The NRPS A-domains grouped in the CYCLO cluster correspond to cluster 5 NRPS from B. meristosporus CBS 931.73 (Protein ID 387529) and cluster 9 of B. meristosporus B9252 (Protein ID N161_4477). Gene model 387529 was predicted as a tetra-modular protein, with two N-methyltransferase domains and a thioesterase domain. In addition, AntiSMASH analyses predict a six gene cluster (cluster 5) that includes gene model 387529, a zinc finger transcription factor (312194), carrier protein (277549), transporter (277552), peptidase (277554), and a SNARE associated protein (334759) (Figure S2). All of these gene models have evidence of gene expression (Figure 2).The Basidiobolus NRPS A-domains clustered in the surfactin-like clade included 32 A-domains from eight gene models from B. meristosporus CBS 931.73. These included gene model 375475 (Cluster 19) with eight A-domains, gene models 307892 (Cluster 33) and 343011 (Cluster 35) with seven A-domains each; gene model 368581 (Cluster 8) with three A-domains; gene models 337511 (Cluster 16), 372991 (Cluster 17), and 298977 (Cluster 7) each with two A-domains; and gene model 322666 (Cluster 2) with one A-domain. B. meristosporus B9252 contained two surfactin-like A-domains from one gene model (N161_8304; Cluster 18). B. heterosporus possessed 13 A-domains from seven gene models, including N168_07733 (Cluster 6), N168_06479 (Cluster 13), N168_02885 (Cluster 1) and N168_00034 (Cluster 14) with two A-domains each, and gene models N168_05934 (Cluster 8), N168_04239 (Cluster 12) and N168_07140 (Cluster 9) with one A-domain each.
Evolutionary relationships of predicted PKSs
A total of 388 KS domains were included in the phylogenetic reconstruction (File S4). A total of 253 domains were predicted here including: two KS domains from Chytridiomycota genomes, 21 from Neocallimastigomycota, 46 from Basidiomycota, 78 from Ascomycota, 44 from Mucoromycota, and 61 from Zoopagomycota (File S4). The 136 additional KS domains were obtained from Kroken . For Mucoromycota and Zygomycota, the genomes with the highest number of KS domains were Linderina pennispora ATCC 12442 v1.0 (15 KS domains), Coemansia spiralis RSA 1278 v1.0 (11 KS domains), Coemansia reversa NRRL 1564 v1.0 (6 KS domains), Coemansia mojavensis RSA 71 v1.0 (6 KS domains), Martensiomyces pterosporus CBS 209.56 v1.0 (6 KS domains). The KS domain numbers for the Basidiobolus genomes included three for B. meristosporus CBS 931.73, two for B. meristosporus B9252, and one for B. heterosporus. All of the predicted PKS for non-Basidiobolus species were obtained from the DOE-JGI MycoCosm genome portal predictions, as AntiSMASH only predicted PKS for the Basidiobolus species.The major clades of PKS proteins reported by Kroken were recovered by this study, including the fungal reducing (R) PKS, animal fatty acid synthases (FAS), fungal non-reducing (NR) PKS, and bacterial PKS, as well as fungal FAS I and II clades (Figure S3 and S4). The Fungal R PKS1 clade included domains from B. meristosporus CBS 931.73 (2 KS domains) and B. meristosporus B9252 (2 KS domains). No KS domains of Mucoromycota or Zoopagomycota genomes were clustered within the animal FAS clade, the bacterial PKS clade, nor the fungal FAS II clade. The Fungal NR PKS clade contained a new clade, “non-reducing PKS proteins clade IV”, comprising KS domains from Basidiomycota, Neocallimastigomycota, and one KS domain of B. meristosporus B9252. The fungal FAS clade I comprised all the remaining KS domains of Mucoromycota and Zoopagomycota genomes, while the fungal FAS II clade comprised mostly KS domains from Neocallimastigomycota, Basidiomycota and one Chytridiomycota representative (Figure S3 and S4).To better understand the predicted PKSs of Mucoromycota and Zoopagomycota that clustered in the fungal FAS clade, the patterns of domains that comprise fungal PKS protein sequences were analyzed. Predicted PKS of Ascomycota contained AT, KS and PP domains in all sequences (Figure S5). Predicted PKS from Basidiomycota contained AT and KS domains in all sequences, while KR and PP domains were present on 36% of the sequences. All PKS that contain a KS domain in the FAS clade are missing the PP domain. For Chytridiomycota, the predicted PKS with the KS domain clustered in the Fungal (R) clade contained AT, KS and DH domains, but no PP domain. Chytridiomycota PKS with KS domains associated to FAS clade only possessed AT and KS domains. For Neocallimastigomycota, 100% of PKS clustered in fungal R and NR clades contained AT and KS domains, however, only 5 PKS contained the PP domain. All Neocallimastigomycota PKS that clustered within the FAS clade contained only AT and KS domains. For Mucoromycota and Zoopagomycota, 100% of PKS contained either AT and/or KS domain, but no other domains (Figure S6). Some Basidiobolus PKS were differentiated from the other Mucoromycota and Zoopagomycota PKS as they possessed one or more of the DH or PP domains (Figure S6).
Evolutionary relationships of predicted terpene cyclase gene models
A total of 1,108 terpene cyclase or terpene cyclase-like gene models were predicted and used for phylogenetic analyses (Additional file 5). These include 253 identified in the Mucoromycota species genomes, 59 in the Zoopagomycota genomes, and 401 ortholog proteins from 58 fungal genomes of Basidiomycota and Ascomycota. An additional 395 TC candidates were identified in bacterial genomes from RefSeq via BLASTP. The phylogenetic reconstruction of TC resulted in two main clades: an outgroup clade that comprised predicted TC annotated as tRNA threonylcarbamoyladenosine dehydratase and phytoene synthases, and a main ingroup TC core clade (Figure 5, Figure S7). The tRNA threonylcarbamoyladenosine dehydratase subclade contained mostly bacterial sequences plus one TC gene model of B. meristosporus B9252 and B. meristosporus CBS 931.73 each. The phytoene synthases clade comprised two subclades, one clade that grouped most bacterial gene models, and a second clade clustering bacterial and Mucoromycota predicted TC gene models, but no BasidiobolusTC core gene models were found in this clade.
Figure 5
Phylogenetic sources and abundance of terpene cyclase (TC) core genes predicted for Zoopagomycota and Mucoromycota species. The maximum likelihood phylogenetic tree is a simplification of the maximum likelihood tree reconstructed using the TC core genes. The heatmap (right) represents the abundances of predicted TC genes within each reconstructed clade, including the dikarya species included in the analysis. Glom.: Glomeromycotina. Mort.: Morteriellomycotina. Ent.: Entomophtoromycotina. Kickx.: Kickxellomycotina. Zoo.: Zoopagomycotina
Phylogenetic sources and abundance of terpene cyclase (TC) core genes predicted for Zoopagomycota and Mucoromycota species. The maximum likelihood phylogenetic tree is a simplification of the maximum likelihood tree reconstructed using the TC core genes. The heatmap (right) represents the abundances of predicted TC genes within each reconstructed clade, including the dikarya species included in the analysis. Glom.: Glomeromycotina. Mort.: Morteriellomycotina. Ent.: Entomophtoromycotina. Kickx.: Kickxellomycotina. Zoo.: ZoopagomycotinaThe TC core clade (Figure 5 and Figure S7) comprised a bacterial exclusive clade (Bacteria I), followed by four highly supported sub-clades containing TC core genes from Mucoromycota, Zoopagomycota, and Dikarya, and are referred to here as Fungi TC clades I – IV. The majority of TC genes from Mucoromycota and Zoopagomycota clustered within clade Fungi TC II, which included no other fungal or bacterial TC. These genes were annotated as Squalene synthetases by the Mycocosm genome portal.The Fungi TC clades I, III and IV included mostly Mucoromycota TCs. The Fungi TC I clade contained TC genes from Mucoromycota isolates, as well as TC genes from the ascomycete isolates Fusarium verticillioides 7600 and Hypoxylon sp. EC38 v1.0, and the basidiomycete Suillus luteus UH-Slu-Lm8-n1 v2.0. These gene models were annotated as associated with the Ubiquitin C-terminal hydrolase UCHL1 according to the MycoCosm portal. Fungi TC III clade also comprised mostly Mucoromycota isolates, with additional TCs from Zoopagomycota isolates Piptocephalis cylindrospora RSA 2659 single−cell v3.0 and Syncephalis pseudoplumigaleata Benny S71−1 single−cell v1.0. Annotations of genes in Fungi TC III clade in Mycocosm indicated that these proteins are part of the isoprenoid/propenyl synthetases, responsible for synthesis of isoprenoids. Isoprenoids play a role on synthesis of various compounds such as cholesterol, ergosterol, dolichol, ubiquinone or coenzyme Q (Finn ). Finally, Fungi TC IV clade followed a similar Mucoromycota-enriched pattern with the exceptions of Dimargaris cristalligena RSA 468 single−cell v1.0, Linderina pennispora ATCC 12442 v1.0, M. pterosporus CBS 209.56 v1.0, and Syncephalis fuscata S228 v1.0. Fungi TC IV clade also included TC genes from Ascomycota sequenced genomes. The majority of these genes were annotated as containing a terpene synthase family, metal binding domain, and a polyprenyl synthetase domain in the MycoCosm genome portal.Within the fungal clades, the NADH dehydrogrenase (ubiquinone) complex clade represented a highly supported clade clustering bacterial, Mucoromycota and Zoopagomycota TCs. Annotations associated with gene models found in this clade indicated that this clade comprised TC associated to NADH dehydrogrenase (ubiquinone) complex.The terminal clades of TC clusters include the Bacteria II + Basidiobolus clade that included bacterial TC and six TC SM core genes predicted from Basidiobolus (two from each genome), and the clade comprising TC core genes solely from Dikarya and one gene from Mortierella verticillata NRRL 6337 and Rhizopus microsporus ATCC11559 v1.0 (Figures 5 and Figure S7).
Signatures for HGT in Basidiobolus genomes
The identity search for genes with evidence for HGT identified 934 genes in B. meristosporus CBS 931.73, 620 genes of B. meristosporus B9252, and 382 genes of and B. heterosporus B8920 with zero BLASTP hits to fungal proteins. These genes were used for the coverage assay to identify gene model coverage deviation from the harboring scaffold median coverage (Figure S8). This assay resulted in 810 genes of B. meristosporus CBS 931.73, 532 genes of B. meristosporus B9252, and 301 genes of and B. heterosporus B8920 with z-scores under 2 standard deviations from the harboring scaffold median coverage. Finally, 4 HGT genes from B. meristosporus CBS 931.73, 106 genes from B. meristosporus B9252, and 89 genes from B. heterosporus B8920 were not located in scaffolds surrounded by genes with best fungal hits. All other HGT genes were located in scaffolds surrounded by fungal genes: 806 from from B. meristosporus CBS 931.73, 426 genes from, B. meristosporus B9252, and 212 genes of of and B. heterosporus B8920. These candidate genes with evidence for HGT represented 5%, 3% and 2% of the gene content of B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively. The HGT candidates showed significant differences in GC content (F-value = 6.395, df = 2, p-value = 0.00167) when contrasted to all genes considered of fungal origin (Mean fungal GC content = 0.4865, Mean HGT GC content = 0.4911, Tukey HSD p-value = 0.001), but no differences were detected in GC content between HGT candidates and fungal genes that co-occur on common scaffolds (Mean surrounding gene GC content = 0.488, Tukey HSD p-value = 0.30). Similarly, no differences in codon usage, or in 5-mer nucleotide composition (Figures S10 and S11, respectively), were found between HGT candidates and genes of fungal origin. Differences were found, however, in intron number and normalized intron length between genes HGT candidate genes and fungal genes, where the distribution of intron number showed a median of 0 introns for HGT candidates and 2 for fungal genes (Kruskal-Wallis test; χ2 = 272.88, df = 1, p-value < 2.2e-16; Figure S9). The majority of HGT candidate genes (61% of the genes) have no introns, compared to 36% of genes from fungal origin with no introns (Figure S9). The candidate HGT genes have a significantly smaller normalized intron length than the genes with a fungal origin (Kruskal-wallis test; χ2 = 272.88, df = 1, p-value < 2.2e-16).A large percentage of SM core genes appeared to be the product of HGT from bacterial species into Basidiobolus. The SM core genes identified as candidate HGT genes is 61%, 27% and 30% for B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively (Table 2). NRPS/NRPS-like SM core genes represent the largest percentage of HGT evidence, while TC have the lowest percentage of HGT evidence (Table 2, Table 3). The identification of taxonomic sources for HGT into Basidiobolus indicated that most HGT comes from bacteria in the phylum Proteobacteria with 232, 135 and 69 gene models in B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively. Firmicutes (112, 68, and 35 gene models, respectively) and Actinobacteria/High Gram C+ (127, 53, and 25 gene models, respectively) (Figure 6, Table S4) were the second and third most abundant source of HGT from bacteria into Basidiobolus.
Table 2
Number of predicted secondary metabolite core genes with evidence for HGT in Basidiobolus genomes. NRPS: Non-ribosomal peptide synthetases, PKS: Polyketide synthases, TC: Terpene cyclases
Isolate
Total SM
NRPS/NRPS-like
PKS/PKS-Like
NRPS-PKS hybrids
TC
B. meristosporus CBS 931.73
26/44 (61%)
22/30 (60%)
2/4 (50%)
0/0 (0%)
2/10 (2%)
B. meristosporus B9252
12/44 (27%)
9/30 (30%)
1/7 (14%)
0/1 (0%)
2/6 (1.4%)
B. heterosporus B8920
7/23 (30%)
7/18 (40%)
0/1 (0%)
0/0 (0%)
0/4 (0%)
Table 3
Summary of secondary metabolite core genes with HGT evidence from Basidiobolus isolates
Isolate
Gene
Z-score of gene coverage
SM class
Taxonomy for best NCBI hit
B. heterosporus
N168_02885
−0.104
NRPS
b-proteobacteria
N168_05934
−0.183
NRPS
b-proteobacteria
N168_07140
0.260
NRPS
b-proteobacteria
N168_06479
0.328
NRPS
b-proteobacteria
N168_00176
0.013
NRPS
cyanobacteria
N168_05966
0.672
NRPS
d-proteobacteria
N168_08580
0.047
NRPS-Like
d-proteobacteria
B. meristosporus B9252
N161.mRNA.1431.1
−0.350
NRPS
cyanobacteria
N161.mRNA.1485.1
−0.258
NRPS
d-proteobacteria
N161.mRNA.4115.1
0.158
NRPS
enterobacteria
N161.mRNA.4324.1
0.213
NRPS
CFB group bacteria
N161.mRNA.8304.1
0.358
NRPS
b-proteobacteria
N161.mRNA.11289.1
0.596
NRPS
cyanobacteria
N161.mRNA.1486.1
0.662
NRPS-Like
firmicutes
N161.mRNA.6846.1
0.760
NRPS-Like
firmicutes
N161.mRNA.8699.1
−0.201
NRPS-Like
firmicutes
N161.mRNA.6146.1
0.999
PKS
high GC Gram+
N161.mRNA.1413.1
0.481
Terpene
CFB group bacteria
N161.mRNA.13969.1
1.099
Terpene
CFB group bacteria
B. meristosporus CBS 931.73
366903
0.043
NRPS
a-proteobacteria
368581
−0.232
NRPS
b-proteobacteria
349800
0.208
NRPS
high GC Gram+
351909
−0.120
NRPS
cyanobacteria
372749
−0.088
NRPS
firmicutes
372991
−0.260
NRPS
b-proteobacteria
373247
0.533
NRPS
firmicutes
373940
−0.142
NRPS
firmicutes
375475
0.522
NRPS
enterobacteria
338397
0.033
NRPS
d-proteobacteria
375580
0.033
NRPS
cyanobacteria
375613
−0.369
NRPS
firmicutes
377413
−0.245
NRPS
g-proteobacteria
387529
−0.157
NRPS
b-proteobacteria
307892
0.404
NRPS
a-proteobacteria
298977
−0.201
NRPS
b-proteobacteria
343011
0.147
NRPS
firmicutes
363930
−0.358
NRPS
firmicutes
300898
−0.165
NRPS-Like
firmicutes
322666
−0.174
NRPS-Like
CFB group bacteria
146993
−0.200
NRPS-Like
cyanobacteria
382467
−0.264
NRPS-Like
a-proteobacteria
340613
−0.136
NRPS-Like
GNS bacteria
237744
−0.001
PKS
bacteria
292783
−0.429
PKS
high GC Gram+
301341
−0.210
Terpene
CFB group bacteria
304520
0.389
Terpene
cyanobacteria
Figure 6
Plausible taxonomic sources of HGT genes. Bar-plot represents the proportion of diversity of HGT candidates for each Basidiobolus genome. Colors represent the taxonomy term for the RefSeq best hit from BLAST. Overall, between 2–5% of the gene models predicted for Basidiobolus species appear to be product of HGT from taxonomic groups of bacteria associated to reptilian and amphibian gut tracts (Proteobacteria, firmicutes, and CFB/bacteroidetes).
Plausible taxonomic sources of HGT genes. Bar-plot represents the proportion of diversity of HGT candidates for each Basidiobolus genome. Colors represent the taxonomy term for the RefSeq best hit from BLAST. Overall, between 2–5% of the gene models predicted for Basidiobolus species appear to be product of HGT from taxonomic groups of bacteria associated to reptilian and amphibian gut tracts (Proteobacteria, firmicutes, and CFB/bacteroidetes).Most HGT candidate genes resulted in no gene ontology (GO) annotation (655 gene models) and/or no InterPro domain annotation (140 gene models, Table S5). The ten top GO terms were oxidation-reduction process (59 gene models), protein binding (34 gene models), phosphorelay sensor kinase activity (32 gene models), N-acetyltransferase activity (28 gene models), catalytic activity (26 gene models), extracellular space (25 gene models), hydrolase activity (24 gene models), oxidoreductase activity (23 gene models), hydrolase activity (24 gene models), hydrolyzing O-glycosyl compounds(23 gene models), and NRPS (18 gene models) (Table S5).
Siderophore activity in Basidiobolus meristosporus
The siderophore activity assay resulted in visible activity for the three replicates for B. meristosporus and Cladosporium sp., and a small halo for one of the replicates of C. thromboides. No visible siderophore activity was detected for the other replicates of C. thromboides or for the empty AY-CAS plates (Figure 7, Figure S12). The analysis of variance for the area of siderophore activity measured showed significant differences across isolates/empty plates (ANOVA, F value = 19.62, P = 0.0004). Post-hoc tests showed no differences between B. meristosporus and Cladosporium sp. siderophore activity (Tukey HSD, P = 0.9801489) nor between C. thromboides and the empty AY-CAS plates (Tukey HSD, P = 0.9396603), but significant differences were found for all other comparisons (Tukey HSD, P < 0.001 for all remaining comparisons).
Figure 7
Siderophore activity of Basidiobolus meristosporus CBS 931.73 in a universal CAS assay using layered AY-CAS plates after 12 days. Bars represent mean siderophore activity measured per strain as the yellow area in AY-CAS plates for three replicates. Error bars represent the standard deviation for each replicate. Cladosporium sp. PE-07 represents the positive control. Conidiobolus thromboides FSU 785 represents a zygomycete with no evidence for siderophore NRPS expansion.
Siderophore activity of Basidiobolus meristosporus CBS 931.73 in a universal CAS assay using layered AY-CAS plates after 12 days. Bars represent mean siderophore activity measured per strain as the yellow area in AY-CAS plates for three replicates. Error bars represent the standard deviation for each replicate. Cladosporium sp. PE-07 represents the positive control. Conidiobolus thromboides FSU 785 represents a zygomycete with no evidence for siderophore NRPS expansion.
Discussion
Secondary, or specialized, metabolite (SM) production is an important element of fungal metabolism. It has resulted in numerous natural products with human health implications, such as mycotoxins in our food supply, and medical applications, including antibiotics, immunosuppressants, and antitumor agents. The majority of the known genetic and chemical diversity of fungal SM has been described from the phyla Ascomycota and Basidiomycota with few SM gene clusters, and limited SM production, reported for fungi in Mucoromycota and Zoopagomycota. This observation has led to the dogma that ‘zygomycete’ species are depauperate of these chemical pathways (Voigt ). Here we report gene clusters involved in SM production in the largest survey to date of Mucoromycota and Zoopagomycota species using genomics approaches and estimate the SM potential of these fungi.Genome sequencing of a total of 69 isolates across diverse lineages of Mucoromycota and Zoopagomycota enabled detailed identification of SM clusters. These results support the hypothesis that zygomycete fungi have a low abundance of secondary metabolism (Figure 1, Table S1) and agree with previous reports (Voigt ). Outliers to this pattern exist, however, and are particularly true of the genus Basidiobolus (Zoopagomycota), which possesses a large number of SM gene clusters predicted for the NRPS, PKS and TC families. This discovery of abundant candidate genes for production of secondary metabolite in Basidiobolus is novel and its presence is most consistent with a signal of horizontal gene transfer from bacteria to fungi, a phenomenon we propose is facilitated by living in the amphibian gut environment.
Distribution and evolution of NRPS genes across Mucoromycota and Zoopagomycota
A deeper examination of BasidiobolusSM gene clusters indicates that this genus surpasses the number of expected NRPS genes for zygomycete species (Figure 1). Most of these SM genes also show evidence for transcription, indicating that the majority of these genes are expressed constitutively under laboratory conditions (Figure 2). Several of these core genes, such as the NRPS gene model 387529, appear to be expressed at a higher rate than several housekeeping genes.A more detailed census of core genes reveals unique patterns of evolution of NRPS genes across Mucoromycota and Zoopagomycota when compared to Dikarya fungi (Figure 3, Table 1). The only NRPS A-domains found throughout the two phyla are members of the AAR clade (with the exception of Rhizopus delemar 99-80 and Piptocephalis cylindrospora RSA 2659). These A-domains are from the genes that encode α-aminoadipate reductases (AAR), an enzyme responsible for the reduction of alpha-aminoadipic acid, which is essential for the lysine biosynthesis pathway and is present in all fungal phyla (Bushley and Turgeon 2010). In contrast, the remaining NRPS genes, and their respective A-domains, show discontinuous and patchy distributions across Mucoromycota and Zoopagomycota (Figure 3).The most pronounced NRPS diversification in Basidiobolus are for genes that are predicted to encode for siderophores, iron chelating metabolites. A-domains for predicted siderophores are distributed throughout four clades including the three clades of major bacterial genes and the fungal SID and SIDE clades (Figure 3, Figure S1). Major bacterial clades (MBC) exclusively comprise bacterial siderophore synthases, such as pyoverdine, yersiniabactin and pyochelin (Bushley and Turgeon 2010), with the exception of the surfactin-like clade. Our results show that genomes of Mortierella and Basidiobolus contain A-domains that are members of the MBC, and that they are the only fungal representatives of these clades. SID clade contains all NRPS associated with siderophore production from Ascomycota and Basidiomycota species. All Basidiobolus isolates contain one NRPS A-domain in this clade, as well as three A-domains from three NRPS gene models of Conidiobolus coronatus NRRL28638. Finally, SIDE, a clade comprising NRPS genes responsible for the production of siderophores in filamentous ascomycetes is expanded in Basidiobolus (Figure 3), which is also the only zygomycete with A-domains clustered in this clade. These findings are consistent with enrichment of both bacterial and fungal siderophores and are suggestive of the importance of iron metabolism in Basidiobolus.The CYCLO clade contains the A-domains for core genes associated with biosynthesis of cyclic peptides, such as beauvericin and cyclosporin (Bushley and Turgeon 2010; Bushley ). Sister to all other CYCLO clade A-domains are the A-domains that comprise the NRPS core gene model 387529 from B. meristosporus (Figure S2). 387529 is expressed at the highest rate of any SM gene under laboratory conditions (Figure 2). It is annotated as a tetra-modular gene model, that includes two N-methylation domains, four adenylation domains, four condensation domains, and a TE domain. When compared to simA, the NRPS responsible for biosynthesis of cyclosporin, structural similarities can be found, such as the presence of the N-methylation domains and the TE terminator domain. Its phylogenetic and structural similarities to simA suggest that 387529 results in the synthesis of a cyclic peptide with methylated amino acid residues. Cyclopentapeptides were recently described from B. meristosporus and were hypothesized to be produced by gene model 387529 (listed as gene model ORX93211.1 in Luo ), but their linkage to an NRPS gene cluster requires genetic confirmation.The surfactin-like clade contains A-domains for bacterial core genes with similarities, but not identical, to the Bacillus subtilissurfactin termination module (srfA-C gene; Peypoux ) including the third, fourth and fifth A-domains of the five A-domain gene model NP_930489.1 of Photorhabdus luminescens gene, two domains of the gene PvdD (AAX16295.1; pyoverdine synthetase) of Pseudomonas aeruginosa, and a single A-domain of the bimodular NRPS dbhF protein of Bacillus subtilis (AAD56240.1) which is involved in the biosynthesis of the siderophore bacillibactin. This surfactin-like clade contains eleven A-domains predicted from Basidiobolus genomes including the gene models 298977, 368581 and 372991 from B. meristosporus CBS 931.73. Gene model 298977 showed no evidence for gene expression, while gene models 368581 and 372991 show high rates of expression (Figure 2). This is the first report of the prediction of a surfactin-like gene in fungi, but surfactant production was recently reported in Mortierella alpina (Baldeweg ). These include malpinins, amphiphilic acetylated hexapeptides that function as natural emulsifiers during lipid secretion, and malpibaldins, hydrophobic cyclopentapeptides. This finding is consistent with the genomic data and reveals that Mortierella, in addition to Basidiobolus, possesses homologs in the surfactin-like clade that are phylogenetically different from B. subtilissurfactins.Surfactins, encoded by the srfA gene cluster in Bacillus subtilis, are functionally active as surfactants, as well as toxins and antibiotics. However, the surfactin genes from B. subtilis (SrfA-AA, SrfA-AB and SrfA-AC) were included in our analysis and clustered in a different clade within the MBC. A-domains from single module NRPS-like protein from B. meristosporus CBS 931.73 (Gene model 146993) and from a NRPS-PKS hybrid A-domain from of B. meristosporus B9252 (Gene model N161_14278) clustered with the B. subtilis SrfA genes. The placement of this NRPS-PKS A-domain is interesting because surfactins are lipopeptides, which contain a hydrophobic fatty acid chain, whose biosynthesis is consistent with an NRPS-PKS hybrid. The A-domains from the remaining Mucoromycota and Zoopagomycota NRPS-PKS hybrids clustered as sister to the original clade of Dikarya NRPS-PKS hybrids, supporting the hypothesis of a single origin of fungal NRPS-PKS hybrids (Bushley and Turgeon 2010).
Mucoromycota and Zoopagomycota lack PKS diversity
Polyketide synthases (PKS) are abundant SM of Ascomycota and Basidiomycota and are involved in antibiotic production, carotenoid biosynthesis and other functional roles. In contrast, literature on PKS diversity for Mucoromycota and Zoopagomycota is limited. Our analyses update the phylogenetic reconstruction reported by Kroken et al. (2013) by adding genomic information from Chytridiomycota, Mucoromycota and Zoopagomycota (Figures S3 and S4). Overall, we report the discovery of two new clades: A clade of non-reducing PKS proteins (clade IV) comprising Neocastimastigomycota and Basidiomycota, and a novel FAS II clade consisting of Neocastimastigomycota, Chytridiomycota and Basidiomycota KS domains. The results of our PKS prediction revealed a number of potential PKS core genes in zygomycete species (Figures S3 and S4), but the results of our phylogenetic reconstruction show that the KS domain of the predicted PKS gene models are fungal fatty acid synthases (FAS). Only Basidiobolus meristosporus genomes possessed KS domains associated with fungal PKS, either reducing or non-reducing, and only B. meristosporus CBS 931.73 possessed a PKS in the Fungal FAS clade.A domain-by-domain presence/absence analysis of PKS genes models (Figures S5 and Figure S6) shows that in addition to AT and KT domains, a third domain (either KR, DH or PP) is found in the majority of fungal PKSs (Figure S5). Conversely, the zygomycete genes in the FAS clade only possess the AT and/or KT domains, which are domains common between FAS and PKS. However, BasidiobolusKS domains are found in fungal reducing PKS clades. Gene models 292783 and 237744 from both isolates of B. meristosporous cluster with fungal reducing PKS II and possess the DH domain (Figure S4), and the expression levels of 292783 is consistent with an actively transcribed gene (Figure 2).
Terpene cyclase-like genes are expanded in zygomycetes
Terpene cyclases (TC) are the most common SM predicted for zygomycete species (Figure 1, Table S1). Phylogenetic reconstruction shows that zygomycete TC core genes are clearly distinct from Ascomycota and Basidiomycota TC, where at least 4 new clades of predominantly zygomycete TC are found (Figure 5, Figure S7). Fungi II TC clade comprises TC from all zygomycete genomes analyzed with the exception of Piptocephalis cylindrospora RSA 2659 single−cell v3.0, Phycomyces blakesleeanus NRRL1555 v2.0, and five genomes of Rhizophagus irregularis, which show no prediction of TC in their genomes. No Dikarya TC genes cluster within this clade. Fungi I, III and IV group TC genes are found almost exclusively in Mucoromycota species, as well as some Dikarya species. TC genes from Fungi I are present only in Mucoromycotina genomes. Functional annotations of TC genes from this clade indicate that these TC genes code for proteins associated with squalene and phytoene synthases and are part of the synthesis of carotenoids. Carotenoids are important compounds for the synthesis pathway of trisporic acid, the main molecule responsible for initiating sexual reproduction in zygomycetes (Burmester ). Finally, Basidiobolus is the only non-Dikarya genus with TC genes clustered within the Bacteria II clade of TC. Both these SM core genes show evidence for expression comparable to housekeeping genes or other SM core genes (Figure 2). The presence of bacterial-like TC genes in Basidiobolus present more evidence on the plasticity of the genome of Basidiobolus and its ability to integrate and possibly express foreign SM associated DNA.
Horizontal gene transfer of SM genes to Basidiobolus
Basidiobolus is a genus with a complex biology, alternating ecologies, and multiple spore types. This complex biology is also reflected at the genomic scale. The sequenced genomes show a larger genome size than other zygomycetes, as well as a higher number of genes than any other Zoopagomycota genomes sequenced to date. Our SM prediction assay is concordant with these patterns in which Basidiobolus has an excess of SM gene clusters when compared to other zygomycetes. Evidence points to HGT as a main driver of SM diversity in Basidiobolus as supported by the phylogenetic reconstructions of NRPS and TC gene clusters with bacterial homologs. Basidiobolus and Mortierella are the only fungi with genes associated with the bacterial clades in each of these SM phylogenetic reconstructions. Moreover, these HGT candidates are integrated into the Basidiobolus genome assembly and do not show evidence of artifactual assembly as evidenced by discontinuous coverage (Figure S8). The most abundant functional group of SM core genes overall are siderophores and their overall functionality is supported by both the RNA expression analyses (Figure 2) and siderophore plate assays (Figure 7).In one stage of the Basidiobolus life cycle, the fungus lives as a gut endosymbiont where it co-occurs with bacteria and other organisms that comprise the gut microbiome. Animal gut environments can facilitate HGT between bacteria and fungi, as previously reported for the zoosporic species of Neocallimastigomycetes (Chytridiomycota), which live in the ruminant gut environment and whose genomes exhibit a 2–3.5% frequency of genes with HGT evidence (Wang ; Murphy ). Phylogenomic analyses of the Basidiobolus NRPS A-domains support a phylogenetic affinity with A-domains from bacterial taxa and more rarely other fungi, a pattern most consistent with HGT. Our HGT survey comprised an extensive search of reference genomes across the tree of life available in NCBI RefSeq, as well as all of the gene models predicted for the Mucoromyocta and Zoopagomycota genomes. We find that 2–5% of all predicted gene models present in Basidiobolous genomes are consistent with signatures of HGT from bacteria. However, the percentages change from 27 to 61% of predicted SM core gene models with bacterial HGT evidence. It should be noted that the two genomes with the lower percentage of predicted HGT SM genes, B. meristosporus B9252 (27%) and B. heterosporus B8920 (30%), are more fragmented, Illumina-only genomes, which complicates assembly of biosynthetic gene clusters. The SM gene models with bacterial signatures are highly abundant, with NRPS and NRPS-like genes comprising the top 25 ontology categories of HGT genes in B. meristosporus (Table S5). These results are consistent with the life history of Basidiobolus, where the fungus lives in close proximity with other microorganisms associated with the amphibian gut environment.The additional analyses to discover distinct genetic features for HGT candidates showed that the largest differences found are in intron number and intron length, but not in nucleotide composition or by codon usage. A significantly smaller number of introns and smaller normalized intron length in HGT candidates provide more support to the HGT hypothesis, where we expected that bacterially transferred genes would maintain a smaller number and length of introns. Intronic expansion of transferred genes into fungal species after an HGT event appears to be rapid in order to reflect the genetic makeup across the genome (Da Lage ) and can explain the introns in some of the HGT candidates. However, up to 60% of the HGT candidates still maintain absence of introns as expected for genes of bacterial origin. Finally, the nucleotide composition of HGT candidates and fungal genes were indistinguishable. Reports show that foreign genes with similar codon usage are more likely to become fixed on the receiving genome (Medrano-Soto ; Amorós-Moya ; Tuller 2011). We interpreted these results to indicate that horizontally transferred genes are evolving toward a similar nucleotide composition of the fungal genome based on the 5-mer/codon usage assay, but still maintain high protein similarity to and group with donor lineage copy in phylogenetic reconstructions.The taxonomic survey of our HGT analysis shows that a diverse array of bacteria may have consistently contributed genetic information into the Basidiobolus genomes (Figure 6 and Table S4). The most abundant bacterial taxonomic groups associated with HGT are the Proteobacteria, Firmicutes, Actinobacteria/high GC gram positive bacteria, and Bacteroidetes (Figure 6, Table S4). The proportion of HGT for each taxonomic group appears to be consistent among the three Basidiobolus genomes, and there are consistencies between the most common taxonomic groups responsible for HGT in Basidiobolus and the reported composition of bacteria associated with the gut microbiome in amphibians (Bletz ; Kohl ) and reptiles (Colston and Jackson 2016; Costello ).
Conclusions
Our results confirm that the majority of zygomycete fungi classified in Mucoromycota and Zoopagomycota do not possess a large genomic potential for secondary metabolism. Significant departures from this pattern exist, however, as exemplified by Basidiobolus, a genus with a complex genomic evolution and potential for considerable and diverse secondary metabolite production. First, it possesses larger than average genome with less than 8% content of repetitive regions, but a genetic plasticity to integrate and express extrinsic DNA. Second, the incorporation of extrinsic DNA is consistent with selection for increased SM production, especially gene models that are related to the capture of resources available in anaerobic conditions (iron chelation by siderophores) and metabolites that may play roles in antibiosis (surfactin-like genes) or host interaction. Third, the amphibian gut environment predisposes Basidiobolus to the acquisition of these SM core genes via HGT from co-inhabiting bacterial species. More information is needed to further test these hypotheses, including sequencing of additional Basidiobolus species with long read technologies; more accurate characterization of amphibian microbiomes that test positive for Basidiobolus; and Liquid Chromatography – Tandem Mass Spectrometry (LC – MSMS) characterization of the Basidiobolus metabolome.
Authors: Scott Kroken; N Louise Glass; John W Taylor; O C Yoder; B Gillian Turgeon Journal: Proc Natl Acad Sci U S A Date: 2003-12-15 Impact factor: 11.205
Authors: Elena Martino; Emmanuelle Morin; Gwen-Aëlle Grelet; Alan Kuo; Annegret Kohler; Stefania Daghino; Kerrie W Barry; Nicolas Cichocki; Alicia Clum; Rhyan B Dockter; Matthieu Hainaut; Rita C Kuo; Kurt LaButti; Björn D Lindahl; Erika A Lindquist; Anna Lipzen; Hassine-Radhouane Khouja; Jon Magnuson; Claude Murat; Robin A Ohm; Steven W Singer; Joseph W Spatafora; Mei Wang; Claire Veneault-Fourrey; Bernard Henrissat; Igor V Grigoriev; Francis M Martin; Silvia Perotto Journal: New Phytol Date: 2018-01-07 Impact factor: 10.151
Authors: Olga A Lastovetsky; Maria L Gaspar; Stephen J Mondo; Kurt M LaButti; Laura Sandor; Igor V Grigoriev; Susan A Henry; Teresa E Pawlowska Journal: Proc Natl Acad Sci U S A Date: 2016-12-12 Impact factor: 11.205
Authors: Antonis Rokas; Matthew E Mead; Jacob L Steenwyk; Huzefa A Raja; Nicholas H Oberlies Journal: Nat Prod Rep Date: 2020-01-03 Impact factor: 13.423
Authors: Molly C Bletz; Daniel J Goedbloed; Eugenia Sanchez; Timm Reinhardt; Christoph C Tebbe; Sabin Bhuju; Robert Geffers; Michael Jarek; Miguel Vences; Sebastian Steinfartz Journal: Nat Commun Date: 2016-12-15 Impact factor: 14.919
Authors: Diego A Martinez; Brian G Oliver; Yvonne Gräser; Jonathan M Goldberg; Wenjun Li; Nilce M Martinez-Rossi; Michel Monod; Ekaterina Shelest; Richard C Barton; Elizabeth Birch; Axel A Brakhage; Zehua Chen; Sarah J Gurr; David Heiman; Joseph Heitman; Idit Kosti; Antonio Rossi; Sakina Saif; Marketa Samalova; Charles W Saunders; Terrance Shea; Richard C Summerbell; Jun Xu; Sarah Young; Qiandong Zeng; Bruce W Birren; Christina A Cuomo; Theodore C White Journal: MBio Date: 2012-09-04 Impact factor: 7.867
Authors: Laura H Okagaki; Cristiano C Nunes; Joshua Sailsbery; Brent Clay; Doug Brown; Titus John; Yeonyee Oh; Nelson Young; Michael Fitzgerald; Brian J Haas; Qiandong Zeng; Sarah Young; Xian Adiconis; Lin Fan; Joshua Z Levin; Thomas K Mitchell; Patricia A Okubara; Mark L Farman; Linda M Kohn; Bruce Birren; Li-Jun Ma; Ralph A Dean Journal: G3 (Bethesda) Date: 2015-09-28 Impact factor: 3.154
Authors: Jacob M Wurlitzer; Aleksa Stanišić; Ina Wasmuth; Sandra Jungmann; Dagmar Fischer; Hajo Kries; Markus Gressler Journal: Appl Environ Microbiol Date: 2021-01-15 Impact factor: 4.792
Authors: Yuanning Li; Jacob L Steenwyk; Ying Chang; Yan Wang; Timothy Y James; Jason E Stajich; Joseph W Spatafora; Marizeth Groenewald; Casey W Dunn; Chris Todd Hittinger; Xing-Xing Shen; Antonis Rokas Journal: Curr Biol Date: 2021-02-18 Impact factor: 10.834
Authors: Jacob M Wurlitzer; Aleksa Stanišić; Sebastian Ziethe; Paul M Jordan; Kerstin Günther; Oliver Werz; Hajo Kries; Markus Gressler Journal: Chem Sci Date: 2022-07-15 Impact factor: 9.969