Literature DB >> 34822583

Mining Indole Alkaloid Synthesis Gene Clusters from Genomes of 53 Claviceps Strains Revealed Redundant Gene Copies and an Approximate Evolutionary Hourglass Model.

Miao Liu1, Wendy Findlay1, Jeremy Dettman1, Stephen A Wyka2, Kirk Broders3, Parivash Shoukouhi1, Kasia Dadej1, Miroslav Kolařík4, Arpeace Basnyat1, Jim G Menzies5.   

Abstract

Ergot fungi (Claviceps spp.) are infamous for producing sclerotia containing a wide spectrum of ergot alkaloids (EA) toxic to humans and animals, making them nefarious villains in the agricultural and food industries, but also treasures for pharmaceuticals. In addition to three classes of EAs, several species also produce paspaline-derived indole diterpenes (IDT) that cause ataxia and staggers in livestock. Furthermore, two other types of alkaloids, i.e., loline (LOL) and peramine (PER), found in Epichloë spp., close relatives of Claviceps, have shown beneficial effects on host plants without evidence of toxicity to mammals. The gene clusters associated with the production of these alkaloids are known. We examined genomes of 53 strains of 19 Claviceps spp. to screen for these genes, aiming to understand the evolutionary patterns of these genes across the genus through phylogenetic and DNA polymorphism analyses. Our results showed (1) varied numbers of eas genes in C. sect. Claviceps and sect. Pusillae, none in sect. Citrinae, six idt/ltm genes in sect. Claviceps (except four in C. cyperi), zero to one partial (idtG) in sect. Pusillae, and four in sect. Citrinae, (2) two to three copies of dmaW, easE, easF, idt/ltmB, itd/ltmQ in sect. Claviceps, (3) frequent gene gains and losses, and (4) an evolutionary hourglass pattern in the intra-specific eas gene diversity and divergence in C. purpurea.

Entities:  

Keywords:  ergot alkaloids; ergot fungi; gene divergence; gene diversity; indole diterpenes; phylogeny; secondary metabolites

Mesh:

Substances:

Year:  2021        PMID: 34822583      PMCID: PMC8625505          DOI: 10.3390/toxins13110799

Source DB:  PubMed          Journal:  Toxins (Basel)        ISSN: 2072-6651            Impact factor:   4.546


1. Introduction

Fungi in the genus Claviceps (Clavicipitaceae, Hypocreales, Sordariomycetes) infect the florets of cereal crops, nonagricultural grasses (Poaceae), sedges (Cyperaceae), and rushes (Juncaceae) [1], followed by occupying the unfertilized ovaries and eventually replacing the seeds with fungal resting bodies, i.e., sclerotia, known as ergots [2]. In light of molecular phylogenetics, 63 named species [3,4] are classified into four sections, i.e., Claviceps sect. Claviceps, C. sect. Citrinae, C. sect. Paspalorum, and C. sect. Pusillae, on the basis of morphological, ecological, and alkaloid-producing features [3]. Ergot bodies or sclerotia contain a wide spectrum of alkaloids toxic to humans and animals, making them unwelcome pathogens in agricultural and food production [5,6], but also important resources for pharmaceuticals [7,8]. Among the alkaloids produced by Claviceps, ergot alkaloids (EAs) are the major culprit for the mass food/feed poisoning in human and livestock, as well as a number of tragedies in human history [9,10]. EAs are indole compounds characterized by a tricyclic or tetracyclic ring system. Over 80 different EAs found in nature fall into three structural groups: clavines, lysergic acid amides, and ergopeptides [8,11], corresponding to their structural complexity. Clavines are the intermediates or derivatives of the intermediates in the lysergic acid amide pathway, whereas ergopeptines are the most complex group [11]. Intensive investigations on biochemistry and molecular genetics have elucidated the EA biosynthetic pathways in EA producers especially Claviceps spp. [12,13]. A cluster of 12 functioning EA synthesis (eas) genes (cloA, dmaW, easA, easC–H, lpsA–C) in C. purpurea strain 20.1 were considered to encode all the enzymes needed for the end-product ergotamine and ergocryptine [14]. The four early steps, requiring dmaW, easF, easC, and easE, are responsible for the closure of the third ring resulting in chanoclavine, followed by middle steps, requiring easD, easA, easG, and cloA, for forming tetracyclic clavines, and later steps for producing the lysergic acid amides, dihydroergot alkaloids, and complex ergot peptines [13] (Figure 1). Among the 12 genes, the homologs of nine were found in C. fusiformis in a cluster. In C. paspali, two additional genes (easP and easO) were found; however, easE was defective. The presence or absence of eas genes has proven to be correlated with EA profiles in several Claviceps spp. and strains [13,14]. However, the investigation of eas gene clusters in a wide range of Claviceps spp. is lacking, and less is reported about the evolution of the individual gene in these clusters among and within the species.
Figure 1

The ergot alkaloid biosynthetic pathway in Claviceps spp. Modified from Young et al. [13] and Robinson and Panaccione [15].

Indole diterpenes (IDTs) are another large group of bioactive compounds with diverse structural variations, triggering toxicity in animals and insects through interfering with ion channels [16,17]. In the literature, there are copious reports that certain species in Claviceps (i.e., C. paspali and C. cynodontis) and close relatives in Epichloë (Neotyphodium as the asexual name before implementation of the International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) [18]) produce the paspaline-derived IDTs, such as paspalitrem, lolitrem, and paxilline, causing ataxia and staggers in livestock that feed on the grasses infected by those fungal species [19,20,21]. Biosynthetic pathways and associated gene clusters of these paspaline-derived IDTs have been investigated [22,23,24], resulting in the discovery of at least 10 genes involved in IDT production in Epichloë spp. and the prediction that ltmG, M, C, and B were responsible for the synthesis of paspaline, the basic structural backbone of IDTs, whereas ltmP and Q were essential for the production of lolitrem and ltmF, J, K, and E, which are required for more complex structures [25,26]. The proposed scheme for the biosynthesis of paspalitrem in C. paspali involved seven genes including the initial formation of paspaline through ltmG, M, C, and B, followed by the sequential functioning steps of ltmP, Q, and F [22]. Recently, the pre-paspaline steps were further resolved as three sequential steps: starting from ltmG converting farnesyl diphosphate (FPP) to Geranylgeranyl diphosphate (GGPP), followed by ltmC transferring GGPP to 3-geranylgeranylindole, and finally through ltmM and B yielding paspaline [27]. In addition to C. paspali and C. cynodontis, other Claviceps spp., i.e., C. arundinis, C. humidiphila, and C. purpurea, could also produce indole diterpenes or paspaline-like compounds [28,29,30]. The genome investigation of C. purpurea 20.1 revealed the presence of ltmM, C, B, P, Q, and an extra gene ltmS [14]. It is not known whether these genes are consistently present in various strains of C. purpurea and other Claviceps species. In addition, two other classes of alkaloids, i.e., lolines (LOL) and peramines (PER), produced by Epichloë spp., are known to function as insecticides, but are not associated with any toxicity symptoms in grazing mammals [31,32]. Given the close relationship between Epichloë and Claviceps, it is reasonable to raise the question of whether any of the loline or peramine gene homologs are present in any of the Claviceps spp. even though those two classes of alkaloids have not been reported in Claviceps. The ‘hourglass model’ borrowed from ontogeny refers to the pattern that the morphological divergence of mid-development stages of an embryo are more conserved compared with earlier and later stages, resembling an hourglass with a narrow waist, but broad ends [33,34]. Before the hourglass model (HGM) was proposed in the 1990s, the early conservation model (ECM) was widely accepted, which echoed von Baer’s third law [35], i.e., embryos progressively diverge in morphology during ontogeny. The debates about these two, along with other models, i.e., adaptive penetrance model [36] and unconstrained model (random) [37], are still ongoing, although recent evidence at molecular and genomic levels has provided support for the presence of the phylotypic stage (the waist stage of development) in fungi, insects, plants, and vertebrates [38,39,40]. According to Haeckel’s biogenetic hypothesis, ontogeny recapitulates phylogeny [41]. The evident similarities between the development of an individual and the evolution of the whole biological system have been addressed by many generations [42] to verify that these models in ontogeny are recapitulated in other evolutionary processes. For example, studies on gene evolution in Drosophila spp. recaptured the hourglass model in that the early maternal genes showed a higher level of diversity than zygotic genes [43]. Here, we propose the biosynthesis of complex biological compounds as an analogy of the development of an organism, and ask whether any of the models fit to the evolution of the genes involved in the biosynthesis. The objectives of the present study were to shed light on the presence of four classes of alkaloid genes (clusters) in 53 strains of 19 Claviceps species, and to understand the evolutionary patterns of these genes at inter- and intraspecific levels. This information helps build the foundation for future studies on chemo- and genotype associations and for developing gene-based chemotyping and toxin detection.

2. Results

2.1. Genome Assemblies

The 37 genome sequences assembled in this study resulted in 1362 to 2581 contigs, N50 values ranged from 19,946 to 55,909 bp, and the completeness measured by Benchmarking Universal Single-Copy Orthologs (BUSCO) over the fungal database (fungi odb10) ranged from 97% to 99.1% (Table 1, available in GenBank https://www.ncbi.nlm.nih.gov/ (accessed on 9 November 2021) as accessions JAIURI000000000–JAIUSS000000000 available upon publication of the article). The quality of the assemblies was equivalent to the assemblies of 17 genomes from previous studies (Table 1) [44,45]. Overall, the 54 assemblies of 53 strains (two versions of assemblies for CCC1102 were included because certain genes were obtained from one or the other assemblies) of 19 Claviceps species included in this study belong to three sections: C. sect. Citrinae, C. sect. Claviceps, and C. sect. Pusillae, from six continents (Africa, Asia, Australia, Europe, North America, and South America) and on host plants in 26 genera (Table S1).
Table 1

Statistics of genome assemblies screened.

SpeciesStrainBioSampleWGS #ContigsTotal Length (bp)Largest Contig (bp)N50 (bp)L50GC (%)Coverage (x)Complete BUSCO’s (%)
C. arundinis CCC1102SAMN11159893JAIUSP000000000140629,878,863375,53355,90915651.4261x98.1
C. capensis CCC1504SAMN11159898JAIUSL000000000149727,462,555202,63739,75819851.6966x98.8
C. cyperi CCC1219SAMN11159895JAIUSO000000000246726,149,012130,03219,94638651.7256x98.3
C. monticola CCC1483SAMN11159896JAIUSN000000000178727,131,110129,90529,63927951.658x98.8
C. occidentalis LM77SAMN11159879JAIURJ000000000228528,557,246118,74621,55641051.3758x97.8
C. occidentalis LM84SAMN11159876JAIURI000000000211928,639,296133,79723,64138951.39164x97.0
C. pazoutovae CCC1485SAMN11159897JAIUSM000000000161927,544,752151,19635,47722951.761x98.3
C. purpurea s.s.Clav04SAMN11159846JAIUSJ000000000258130,594,081349,53329,37829651.6946x98.8
C. purpurea s.s.Clav26SAMN11159847JAIUSI000000000182130,253,558299,36836,36924251.4859x98.8
C. purpurea s.s.Clav46SAMN11159848JAIUSG000000000188730,292,940231,31436,58224651.0858x99.1
C. purpurea s.s.Clav52SAMN11159849JAIUSE000000000171429,291,845175,16535,95625051.4260x98.9
C. purpurea s.s.Clav55SAMN11159850JAIUSD000000000202330,195,775203,52333,46126151.5559x98.4
C. purpurea s.s.LM14SAMN11159853JAIUSC000000000188830,259,282163,53232,81226851.7449x97.9
C. purpurea s.s.LM207SAMN11159861JAIUSB000000000191030,165,540260,84731,42827351.7453x98.7
C. purpurea s.s.LM223SAMN11159862JAIURY000000000189430,223,423195,66131,69329151.7374x98.4
C. purpurea s.s.LM232SAMN11159863JAIURX000000000191130,304,653216,99633,37626551.7353x98.8
C. purpurea s.s.LM233SAMN11159864JAIURW000000000192830,249,987183,37833,02327351.7449x98.3
C. purpurea s.s.LM30SAMN11159855JAIURV000000000181630,203,936160,35335,00526551.7564x98.4
C. purpurea s.s.LM33SAMN11159856JAIURU000000000201130,162,301157,17628,95430651.7545x97.9
C. purpurea s.s.LM39SAMN11159857JAIURT000000000179730,183,718168,04734,90225851.7581x98.3
C. purpurea s.s.LM4SAMN11159851JAIURS000000000186630,197,808200,83131,05428151.7464x98.3
C. purpurea s.s.LM46SAMN11159858JAIURR000000000184230,109,785205,39932,50327051.7679x98.6
C. purpurea s.s.LM461SAMN11159865JAIURQ000000000204130,157,824190,24728,92830751.7437x97.7
C. purpurea s.s.LM469SAMN11159866JAIURP000000000183630,218,091199,88034,40826951.7475x98.3
C. purpurea s.s.LM470SAMN11159867JAIURO000000000248230,086,038123,01423,23138451.7526x97.9
C. purpurea s.s.LM474SAMN11159868JAIURN000000000191730,149,711232,50430,85528351.7564x98
C. purpurea s.s.LM5SAMN11159852JAIURM000000000181730,171,863188,14434,17427151.7467x98.4
C. purpurea s.s.LM60SAMN11159859JAIURL000000000187130,274,458180,24231,97727551.7381x98.6
C. purpurea s.s.LM63SAMN20436330JAIUSS000000000167430,276,205210,63040,95421851.7968x98.4
C. purpurea s.s.LM65SAMN20436331JAIUSR000000000182230,277,382206,60937,97624151.7871x98.4
C. purpurea s.s.LM71SAMN11159860JAIURK000000000191930,241,564172,99732,32428251.76168x98
C. purpurea s.s.LM72SAMN20436332JAIUSQ000000000198630,160,156282,50636,80524951.8163x98.4
C. quebecensis Clav32SAMN11159882JAIUSH000000000136228,435,427248,88842,25219251.6164x99.0
C. quebecensis Clav50SAMN11159881JAIUSF000000000140428,499,699294,42547,79717851.659x98.8
C. ripicola LM219SAMN11159874JAIUSA000000000184730,428,256154,69034,89825451.3955x97.1
C. ripicola c.f.LM220SAMN11159873JAIURZ000000000166230,409,961205,88143,97121151.4391x97.7
C. spartinae CCC535SAMN11159888JAIUSK000000000201728,974,645142,72328,33230051.3860x98.1
Assemblies from previous studies
C. arundinis CCC1102SAMN11159893SRPS01140629,878,863375,53355,90915651.4261x97.7 *
C. africana CCC489SAMN11159887SRPY01532931,933,80198,04912,22575244.6856x95 *
C. arundinis LM583SAMN08798359QEQZ01161330,055,381164,90439,30622351.4269x96.9
C. citrina CCC265SAMN11159885SRQA01483025,056,89681,802874787147.5764x92.2*
C. digitariae CCC659SAMN11159892SRPT01382131,170,596116,85916,07757245.557x95.9 *
C. humidiphila LM576SAMN08798355QERB01183130,488,243190,08534,78726151.5177x97.9
C. lovelessii CCC647SAMN11159891SRPU01820134,575,81365,4395747178143.6153x91.6 *
C. maximensis CCC398SAMN11159886SRPZ01231729,114,417192,85137,10123046.6658x98.3 *
C. occidentalis LM78SAMN08800200QEQY01232128,571,683125,45921,41642251.3764x97.3
C. perihumidiphila LM81SAMN08800226QEQX01142330,694,913232,02946,52619251.5140x96.9
C. purpurea s.s.LM28SAMN08797627QERD01193030,251,797260,84231,81527451.7449x97.9
C. purpurea s.s.LM582SAMN08798357QERA01220730,199,509132,07227,19933451.7489x98.6
C. pusilla CCC602SAMN11159889SRPW01917137,319,48483,5555659191741.8452x90.9 *
C. quebecensis LM458SAMN08851611QEQW01170035,882,5931,850,35141,78419151.8778x98.0
C. ripicola LM454SAMN08798353QERC01210830,692,668189,16228,58731451.37156x97.9
C. ripicola LM218SAMN08798202QERE01163030,598,250206,72339,76323251.4146x97.6
C. sorghi CCC632SAMN11159890SRPV01720631,897,900112,2966643138945.2460x89.9 *

* BUSCO completeness for these strains was based on the Dikaryon fungal database; see Wyka et al. [44] for details.

2.2. Presence of Four Classes of Alkaloid Genes in 53 Genomes

One thousand sequences of 19 loci were extracted from the 53 genome assemblies as detailed below. The DNA sequences of each genes were submitted to Genbank associated with accession numbers: cloA (49 sequences) MZ882098–MZ882146, dmaW (118 sequences) MZ871640–MZ871757, easA (51 sequences) MZ851397–MZ851447, easC (50 sequences) MZ851807–MZ851856, easD (51 sequences) MZ871767–MZ871817, easE (66 sequences) MZ877968–MZ878033, easF (88 sequences) MZ881959–MZ882046, easG (50 sequences) MZ882047–MZ882096, easH1 (50 sequences) MZ934760–MZ934809, easH2 (32 sequences) MZ934810–MZ934841, lpsB (48 sequences) MZ934842–MZ934889, lpsC (44 sequences) MZ934890–MZ934933, idt/ltmB (55 sequences) MZ935033–MZ935087, idt/ltmC (47 sequences) MZ935088–MZ935134, idt/ltmG (three sequences) MZ935227–MZ935229, idt/ltmM (47 sequences) MZ935135–MZ935181, idt/ltmP (46 sequences) MZ934987–MZ935032, idt/ltmQ (60 sequences) MZ934934–MZ934986, and idt/ltmS (45 sequences) MZ935182–MZ935226.

2.2.1. Ergot Alkaloid Genes (eas)

More consistency in terms of presence/absence of eas genes was observed in C. sect. Claviceps than C. sect. Pusillae. The results from BLASTn searches using in-house script (see Section 4.2 for details) and Geneious mapping (https://www.geneious.com, accessed on 9 November 2021) with reference genes showed the genomes of all isolates from C. sect. Claviceps contained at least 10 eas genes matching the C. purpurea 20.1 reference sequences (Table 2; lpsA1 and lpsA2 were excluded from analyses as they were heavily fragmented due to the significantly long length. A study on long-read sequencing of several selected strains by Hicks et al. was focused on these two genes (in this Special Issue). The 10–12 genes were assembled on two to three contigs. For most strains, nine genes (lpsC, easA, lpsB, cloA, and easC-G) were on the same contig. Genes after dmaW, i.e., easH1, easH2, and fragments of lpsA, were on different contigs. The easH2 gene was either not detected or on a separate contig possibly due to the long length of lpsA1 because it was located between lpsA1 and A2 in the reference genome C. purpurea 20.1. The exceptions were C. humidiphila LM576, C. spartinae CCC535, C. purpurea LM461, and C. ripicola LM220 and LM454, in which lpsC was on a different contig, or lpsC along with the next three to four genes were on the same contig separated from other genes (Table 2).
Table 2

The eas gene copies and their locations in 18 species in Claviceps sect. Claviceps and sect. Pusillae.

SectionOrganismAsbl *Sample lpsC easA lpsB cloA easC easD easE easF easG dmaW easH
E1E2F1F2F3 W1W2W3H1H2
Claviceps C. arundinis WFCCC1102418/411411411411411411411 411273 411411/583273707583
SWCCC1102 157157157157157157 15773 15715773305157458
BWLM583 455455455455455455 455187 455/805805187 805822
C. capensis WFCCC1504173173173173173173173 173 1731354 347
C. cyperi WFCCC1219277277277277277277277 277 277277/20372094/696 525
C. humidiphila BWLM576599259259259259259259 259390 259259390 259701
C. monticola WFCCC1483367367367367367367367986367986 367367/1745986/966 494
C. occidentalis WFLM77202202202202202202202 202 2021262702 18871693
BWLM78192192192192192192192 192 1921273722 18711675
WFLM84290290290290290290290 290/1715 17151340721 17791618
C. pazoutovae WFCCC1485307307307307307307307 307767 307307/1479767/622 430
C. perihumidiphila BWLM81 114114114114114114 114 114/604604359 604710
C. purpruea WFClav52131131131131131131131739131/1395739 13951395/923879 923835
WFClav04717171717171711407711514/2293 7171/9932459/594 993874
WFClav26105105105105105105105 105816/1596 105105/14531596 759836
WFClav462012012012012012012011255201/13581255/1668 13581358/9281481/1294 9281043
WFClav554194194194194194194191226419/14161226/1797 14161416/13161797/1399 13161168
WFLM1485858585858585 85 8585/6991391/539 699716
WFLM207374374374374374374374 374 374374/11691625/1506 11691183
WFLM223444444444444444444444 444/15561027/1718 15561556/7831460 783762
WFLM232112112112112112112112 112 1121121843 1148908
WFLM23389898989898989 89 89/658658533 658692
BWLM28126126126126126126126 126 1261261874/563 1261208
WFLM3088888888888888 88 88/10221022610 10221180
WFLM33106106106106106106106 106 106106/13431781/649 13431210
WFLM39100100100100100100100855100/1391855/1766134013911391/7531340/546 753703
WFLM4152152152152152152152 152 152152/15701797 857726
WFLM46209209209209209209209 209 2092091386 2091025
WFLM461121121121121121121/14431443100614431006164310061006/15051846/1490 15051402
WFLM4699191919191919198091/1419980/1609172714191419/10181727/419 1018680
WFLM4702942942942942942942942080294/14282080/2165 142814282358 949736
WFLM474158158158158158158158829158/1493829/1365 14931493/7881782/41 788719
WFLM5121121121121121121121 121 121121/12171546/679 12171198
BWLM58298989898989898 98 98981036 800
WFLM60333333333333333333333 333 333/17911790/12661480/676 12661242
WFLM71159159159159159159159 159 655655931 655727
WFLM6316016016016016016016011781601178985621/160621985 621918
WFLM6521212121212121 2110182542121254 21
WFLM7219019019019019019019011561901156 55912 5
C. quebecensis WFClav32308308308308308308308753308/201753/730 201201730 201
WFClav50227227227227227227227731227/231731/679 231231679 231
BWLM45816516516516516516516514461651446/1061 4984981061 498
C. ripicola BWLM21881818181818181 81 8181527 81820
WFLM21978787878787878 78 7878533 78
WFLM220777777588588588588 588 58858895 588285
BWLM454 120120120623623623 623 623623107 623
C. spartinae WFCCC5351156/13758538538536806806802768027 68068027 680
Pusillae C. africana SWCCC489 424 424424424 424 424424 424/1076
C. digitariae SWCCC659 403 403
C. lovelessii SWCCC647 1632163228852885/4933556/4933556 556 556556 143
C. maximensis WFCCC398
C. pusilla SWCCC602 3688143536883968/389139683968/3180 31801918 1809
C. sorghi SWCCC632 2692 24752475186186 186 186186 3370

* The assembly versions: BW was from Wingfield et al. [45], SW was from Wyka et al. [44], and WF was generated in the present study; values in the cells denote contig numbers; the 2nd contig number was led by a/when the fragment is on two contigs; green color represents full-length genes, light orange represents partial or gapped sequences, and no fill represents no gene matches; hatches denote fragments containing frameshifts or internal stop codons. None of the genes were detected in C. citrina (C. sect. Citrinae, not listed).

Both inter- and intraspecific variation was observed, regardless of the general consistency of presence of eas genes. Species-specific features included all three strains of C. occidentalis have two partial copies of dmaW (~658 bp, ~641 bp composed of a partial exon 1 and full-length exon 2 and 3) and a single copy of all other eas genes except easH. Of a relevant note, all partial genes detected in the present study were located at the end of contigs. Moreover, all three strains of C. quebecensis had a second partial nonfunctioning copy of easE (275, 275, and 1208 bp) and two partial copies of easF with good open reading frames (ORFs), and they were lacking easH2 (Table 2). Intraspecific variation among the 27 strains of C. purpurea was evident as most strains contained one copy of lpsC, easA, lpsB, cloA, easC, D, G, H1, and H2, and two copies of dmaW. However, three strains (LM65, LM72, and LM582) lacked easH2. Eleven strains had a second copy of easE (easE2), six full- or near-full-length and five partial, but these gene fragments contained indels of various sizes and internal stop codons (Table 2). This would indicate that they may not be functional genes unless those variations were caused by sequencing or assembly errors. In contrast, the second copy of full-length easF (easF2) from LM72 (MZ881984) and LM461 (MZ881981) had good ORFs. The easF2 gene of the other six strains was split on two contigs with gaps in the middle. Most of these fragments, except the second exon at the 3′ end of Clav26, Clav55, and LM470, were free of internal stop codons. Four strains had a full length (or close to full length), and one strain (LM469 652 bp) had a partial third copy, easF3, yet these gene fragments had a number of indels and internal stop codons (Table 2). The intraspecific variations were also found in C. arundinis and C. ripicola (Table 2). The six genomes from C. sect. Pusillae had more variable numbers of the eas genes observed, but all six genomes lacked lpsC and easH2 (Table 2). The strain C. lovelessii CCC647 had the highest number of matches, i.e., 10 full- or near-full-length matches (cloA 1788 bp, easD had an 8 bp short gap at split region), while all but easH1 and lpsB had good ORFs. In contrast, C. digitariae CCC659 had only two gene matches: dmaW and easA, but both were full-length with good ORFs. C. maximensis CCC398 and C. citrina CCC265 (C. sect. Citrinae) had no matches for any eas genes (Table 2). Examining each eas genes, easA was present the most consistently in 51 of 53 genomes as a single copy and had good ORFs, except for the one in C. pusilla CCC602 which had an internal stop codon. Similarly, lpsB, cloA, easC, easD, and easG were present as a single copy in all species of sect. Claviceps and two to four species in sect. Pusillae (Table 2). For easE, all species in sect. Claviceps contained at least one copy, six strains of C. purpurea (LM39, LM63, LM72, LM461, LM469, and LM474)) had a full length second copy (easE2), and the other five strains C. purpurea (Clav04, Clav46, Clav52, Clav55, and LM470), all three C. quebecensis, one C. spartina, and one C. monticola had a second partial copy. Compared with the C. purpurea 20.1 easE1 reference sequence, all the easE2 sequences contained a large number of deletions (gaps) of various sizes in exon and intron regions, internal stop codons, and no start codon, indicating that they are likely not functional. For species in sect. Pusillae, one copy of easE was found in four species with good ORFs (C. africana CCC489, C. lovelessii CCC647, C. pusilla CCC602, and C. sorghi CCC632). For easF, all species in sect. Claviceps contained at least one copy; however, two strains of C. purpurea (Clav55 and LM470) had internal stop codons near the 3′ end. Twenty-three strains of seven species (C. arundinis, C. humidiphila, C. monticola, C. pazoutovae, C. purpurea, C. quebecensis, and C. spartinae) had a second full-length or partial copy, among which 19 strains had good ORFs. In addition, a third copy was found in some C. purpurea strains in full length (LM39, LM63, and LM65) or partial (LM461 and LM469). Even though with 77–93% similarity to C. purpurea 20.1 easF1, none of the third copies had a correct open reading frame (not functional) (Table 2). Three species in sect. Pusillae (C. africana, C. lovelessii, and C. sorgji) had one functioning copy. For dmaW, most species (strains) in sect. Claviceps contained two full-length copies or copies split on two contigs with gaps. Six strains of C. purpurea (Clav26, Clav52, LM223, LM232, LM4, and LM470) had a partial second copy, but all three strains of C. occidentalis had partial sequences (~650 bp) for both copies. One strain of C. arundinis (CCC1102) had a third copy in full length, with 81% and 83% similarities with dmaW1 and dmaW2, but frameshifts and internal stop codons were present. Five species in sect. Pusillae, except C. maximensis, had one copy. Interestingly, the additional copies of easE, easF, and dmaW were more or less clustered together, such that the second copies of all three genes were present on the same contig in C. monticola CCC1483 and C. spartinae CCC535 (Figure 2A). Alternatively, the easF2 sequence was split on two contigs, which were located with easE2 on one contig and dmaW2 on the other, i.e., C. purpurea Clav55, and C. quebecensis Clav32, Clav50, and LM458 (Figure 2B). More commonly, easE2 was on the same contig as easF2, whereas dmaW was on another contig, such as in seven strains of C. purpurea (Clav46, Clav52, LM470, LM474, and LM72; Table 2), or easF2 co-located with dmaW when easE was a single copy (LM583; Figure 2C). In cases when the third copy of easF was present, they were often on the same contig with dmaW2, i.e., C. purpurea LM39, LM63, LM65, and LM469 (Figure 2D). The arrangement in LM461 was more peculiar in that the second copies of easE and easF were on the same contig with dmaW1 and easG (a single-copy gene), which indicates that they may all be on the primary ergot alkaloid gene cluster (Figure 2E). The third dmaW from CCC1102 (from SW assembly) was not connected to other eas genes (Table 2).
Figure 2

The schematic arrangements of multiple copies of easE, easF, and dmaW in relation to the primary cluster of other eas genes. The dark solid bars denote the contigs, while gray boxes represent genes labeled accordingly, with the ranges underneath. The lengths of genes and spaces are in approximate scale. The dashed bars and genes on them indicate that those genes are on the same contig; however, the details are not displayed. (A–E) represent different patterns of locations (see the text Section 2.2.1 for details).

For easH, easH1 was present in 50 genomes, except C. citrina, C. digitariae, and C.maximensis; however, the genes of the four species (CCC489, CCC602, CCC632, and CCC647) in C. sect. Pusillae had numerous indels of various sizes throughout the sequence, causing frameshifts and internal stop codons. Further validation of the sequences is needed to confirm whether these are functioning. The easH2 gene was present in 32 strains of six species (C. arundinis, C. humidiphila, C. occidentalis, C. perihumidiphila, C. purpurea, and C. ripicola). The reference sequence of easH2 from C. purpurea 20.1 was 840 bp, which is about 100 bp shorter than easH1 (945 bp), and it was considered a pseudogene. Our results showed that the 32 easH2 sequences had variable lengths and high levels of nucleotide variation (see more notes in later sections: phylogenies and gene diversity). Most of these sequences appeared not functional; however, the lengths of the sequences from two strains of C. ripicola (LM218 and LM220) were 954 bp and contained full-length ORFs, indicating that they are likely functioning genes. For lpsC, at least one strain per species in sect. Claviceps (except C. perihumidiphila) showed one copy of lpsC, i.e., in total, 43 out of 46 strains contained a single copy of lpsC, among which three strains of C. purpurea, i.e., Clav26, LM4, and LM232, had a single internal stop codon; otherwise, the full range of sequences aligned very well with the reference. It is possible that the single internal stop codon could be a sequencing error. Another five strains/species, including C. capensis CCC1504, C. cyperi CCC1219, C. humidiphila LM576, C. monticola CCC1483, C. purpurea LM223, and C. spartinae CCC535 had partial sequences 1000–5000 bp long. These sequence fragments contained several indels and internal stop codons, and they are apparently not functional genes. Only one strain of C. perihumidiphila lacked lpsC.

2.2.2. Indole-Diterpene/Lolitrem (idt/ltm) Genes

Compared with eas genes, the presence/absence and copy numbers of idt/ltm genes were less variable. Through mapping genome assemblies to the reference genes, all members in sect. Claviceps had one copy of ltmC, M, P, and S and one or two copies of idt/ltmB and Q, except C. cyperi CCC1219 that lacked ltmQ and S. All members in C. sect. Pusillae had no matches to any ltm genes, whereas members of sect. Citrinae (C. citrina CCC265) had full-length matches with ltmB, C, G, and M. Notable species-specific features were that all three strains of C. occidentalis (LM77, LM78, and LM84) had two partial copies ltmQ (1517–1518 bp); C. arundinis (CCC1102, LM583), C. perihumidiphila (LM81), C. ripicola (LM218, LM219, LM220, and LM454), and C. spartinae (CCC535) had two functional copies of ltmB (Table 3). The translated sequences of ltmS from three strains of C. occidentalis (LM77, LM78, and LM84) and three strains of C. quebecensis (Clav32, Clav50, and LM458) were 14 amino-acid residues longer than other species, and those 14 amino acids were identical among the six strains.
Table 3

The idt/ltm gene copies and their locations in C. sect. Claviceps and sect. Citrinae.

SectionOrganismAssembly * ltmQ1 ltmQ2 ltmP ltmB1 ltmB2 ltmC ltmS ltmM ltmG
Citrinae C. citrina WFCCC265 1947 1947 5822211
Claviceps C. arundinis WFCCC110250 5050332505050
BWLM583158 158158124158158158
C. capensis WFCCC150429 2929 292929
C. cyperi WFCCC1219 2525 25 25
C. humidiphila BWLM576945 945478 745745745
C. monticola WFCCC1483568 568591 591591591
C. occidentalis WFLM771456189814561538 15381538657
BWLM789851877985985 985985691
WFLM843761789376376 376376376
C. pazoutovae WFCCC1485225 225185 185185185
C. perihumidiphila BWLM8127 27277272727
C. purpruea WFClav52174 174174 1741741230
WFClav04130637130130 130130130
WFClav26116 116116 116116116
WFClav46432294343 434343
WFClav551358/1838/128614441286557 557557557
WFLM14243 243243 243243243
WFLM207255 255255 255255255
WFLM223327 327327 327327327
WFLM232315 315315 315315315
WFLM2337 77 777
BWLM28258 258258 258258258
WFLM3087 8787 878787
WFLM3351 5151 515151
WFLM39192 192192 192192192
WFLM4361 361361 36112201220
WFLM4629 2929 292929
WFLM461529/651592965965 9659651500
WFLM46937 3737 373737
WFLM470646 787787 787787787
WFLM474243 243243 243243243
WFLM517 1717 171717
BWLM582112 112112 112112112
WFLM60765 765765 765765440
WFLM71393 1283977 977977977
WFLM63433 433433 433433433
WFLM65406 406406 406406406
WFLM72549/1151 1151361 361361361
C. quebecensis WFClav3256 5656 565656
WFClav5091 9191 919191
BWLM458536 536/1563475 475475475
C. ripicola BWLM218191 191191136191191191
WFLM219395 395638589638638638
WFLM220368 368368591368368368
BWLM454138 138764949764764764
C. spartinae WFCCC535225 2252254722512121212

* The assembly versions: BW was from Wingfield et al. [45], SW was from Wyka et al. [44], and WF was generated in the present study; values in the cells denote contig numbers, two values connected by/indicate the fragment was on two contigs; green color represents full-length genes, light orange represents partial or gapped sequences, and no fill represents no gene matches; hatches denote fragments containing frameshifts or internal stop codons. None of the idt/ltm genes were detected in C. sect. Pusillae except for two short fragments of ltmG from C. maximensis CCC398 and C. digitariae CCC659 by low stringency search, which are not listed (see also Section 2.2.2).

Intraspecific variations were observed in C. purpurea; four out of 27 strains showed a second copy of ltmQ (Table 3). In the strain Clav04, the fragment on the primary cluster (contig130) ltmQ1 was a partial copy, whereas another copy on contig 637 was a full-length copy (ltmQ2) with a good ORF. Clav46 had two partial copies; ironically, the copy on contig 43 (where all other ltm genes co-located) had a number of short deletions causing frameshifts and internal stop codons, whereas the copy on contig 229 had good ORFs, except that the first 243 bp (including 53 residuals in exon 1 and partial exon 2) were missing. On the other hand, some of the single-copy ltmQ sequences, such as in C arundinis CCC1102, C. pazoutovae CCC1485, C. perhumidiphila LM81, C. purpurea LM72, C. quebecensis Clav32 and LM458, C. ripicoloa LM218 and LM219, and C. spartinae CCC535, had varied number of indels causing frameshifts and internal stop codons; however, phylogenetically, they still belonged to copy 1 (more details in in Section 2.3). All six genes were clustered on the same contig in 29 strains of the 12 species in sect. Claviceps; otherwise, at least three genes were on the same contig. The clustered six ltm genes were arranged in the same order as in C. purpurea 20.1 [14] (Table 3; gene coordinates are not shown). In C. citrina, ltmB and C were on the same contig (1947), whereas ltmM and G were on separate contigs. It is not assessable whether they were in one cluster. In general, the inter-gene sequences ranged from 500–1200 bp; however, several strains had very long spaces between ltmP and B, such as 4 kb in C. ripicola LM220 and over 2 kb in LM218 and C. arundinis CCC1102 and LM583 (results not shown). Through the additional BLAST searches with lower stringency (E-value < E−50), fragments of 483 and 501 bp of ltmG from C. maximensis CCC398 and C. digitariae CCC659, respectively, were pulled out by using ltmG from C. paspali RRC-1481. They were 76% and 78% similar, respectively, to the reference sequence in the coverage (comparable to the 74% similarity between C. citrina CCC265 and C. paspali RRC-1481). Running BLAST searches of these two fragments to the NCBI database indicated that 60 bp of the 483 bp from C. maximensis matched with Beauveria bassiana ARSEF 2860 geranylgeranyl pyrophosphate synthetase; 279 of 501 bp from C. digitariae matched with idtG (geranylgeranyl diphosphate synthase) from Periglandula ipomoeae strain IasaF13.

2.2.3. Loline Alkaloid (lol) and Peramine (per) Genes

All the searches with lol and per reference genes resulted in no hits, except for the low-stringency BLAST with lolC that resulted in small fractions of sequences (~150–180 bp) matched with the start of the fifth exon for seven species (strains): C. africana (CCC485), C. citrina (CCC265), C. digitariae (CCC659), C. lovelessii (CCC647), C. maximensis (CCC398), C. pusilla (CCC602), and C. sorghi (CCC632). These fragments matched with 80% to 92% identity to O-acetylhomoserine from Purpureocillium lilacinum (XM 018324292), Drechmeria coniospora (XM 040800194), and Verticillium dahliae (XM 009654023) in the NCBI database https://blast.ncbi.nlm.nih.gov/Blast.cgi accessed in August 2021. These sequences were not submitted to GenBank because of their short length.

2.3. Phylogenies of eas and idt/ltm Genes

The individual phylogenetic trees of 11 eas genes all agreed on the long-branched separation between C. sect. Pusillae and sect. Claviceps, which was congruent with the pattern inferred by the previous multigene analyses combined with morphological, ecological, and metabolic features [3] and supported by the phylogenomic analyses [44] (Figure 3a). In C. sect. Pusillae, all genes agreed on the close proximity of C. fusiformis, C. lovelessii, and C. pusilla, as well as of C. africana and C. sorghi. The main incongruence among the gene trees appeared in the uncertain placements of C. digitariae and C. paspali, as well as the variant relationships among C. fusiformis, C. lovelessii, and C. pusillae, which could be a result of insufficient sampling (see further explanation in Section 3; Figure 3b–d and S1).
Figure 3

(a) The hypothetical species relationships of Claviceps spp. inferred by orthologous genes from Wyka et al. [44]. (b–d) Variant species relationships in Sect. Pusillae summarized from phylogenies inferred by each eas gene trees (Supplementary Figures S1 and S2). The thickened branches denote bootstrapping values >80%. The letters next to thick branches denote the genes supporting the grouping, abbreviated as A, C—H1 = easA, easC—H1; cl = cloA; W1 = dmaW1. Dashed branches indicate that taxon was present on the gene trees listed after the species name. lpsC and lpsB are not listed here because only one or three sequences were available on the trees. DNA sequences of C. fusiformis and C. paspali were from GenBank EU006773 and JN613321.

In terms of the species relationships in the sect. Claviceps, considering single-copy genes, a majority of gene trees agreed on the grouping of the four major clades inferred by the previous phylogenomic study [44]. For communication convenience, we named them as four Batches to avoid confusion with species level and general use of clades: Batch humidiphila including C. arundinis, C. humidiphila, and C. perihumidiphila, Batch purpurea including C. capensis, C. monticola, C. pazoutovae, and C. purpurea (previously designated as Clade purpurea by Píchová et al. [3]), Batch occidentalis including C. occidentalis and C. quebecensis, and Batch spartinae including C. ripicola and C. spartinae (Figure 3a and Figure S1). The exceptions were C. perihumidiphila and C. cyperi that had uncertain placement on different gene trees (Figure S1b,d,f,g). The more notable disparities among the gene trees appeared in the order of divergence of the four Batches from C. sect. Pusillae or sect. Paspalorum (Figure 4 and Figure S1 and Figure S2). Previous phylogenomic analyses resulted in the topology of a twice bifurcate pattern, ((Batch humidiphila)(Batch spartinae); (Batch occidentalis)(Batch purpurea)) [44], and this pattern was only supported by easG (Figure 4a). A slight variation of the easA tree appeared in that Batch humidiphila was an earlier diverged lineage than Batch spartinae, and these two formed a paraphyletic group instead of a monophyletic group (Figure 4b). All other genes supported the derived position of Batch humidiphila and Batch spartinae (Figure 4c–e). Furthermore, eight genes (cloA, dmaW1, easC, easE1, easH1, lpsC, and ltmB1) placed Batch purpurea at a more ancestral position than Batch occidentalis, whereas six genes (easF1, lpsB, ltmM, ltmP, ltmS, and ltmQ1) reversed the divergence order of these two Batches (Figure 4c,d). The other three genes (easD, lpsC, and ltmC) showed an unresolved order of divergence (Figure 4e).
Figure 4

(a–e) Varied species relationships in sect. Claviceps summarized from phylogenetic trees of eas and ltm genes by PhyML analyses (the full trees are provided in Supplementary Figures S1 and S2). The thick branches denote bootstrapping values >80%. The letters beside the thick branches indicate that those genes had strong support for those branches; otherwise, all genes listed below the figure had strong support.

As for genes with multiple copies, the most complex was dmaW. The dmaW2 sequences were separated into two groups. Group I included 16 strains of eight species (all non-C. purpurea dmaW2 except C. monticola CCC1483 and C. pazoutovae CCC1485), forming a parallel lineage with their dmaW1 counterpart and representing one gene duplication at node ① (Figure 5a and Figure S2a). Group II included C. purpurea, C. monticola CCC1483, and C. pazoutovae CCC1485, as well as one strain of each dmaW1 (LM60) and dmaW3 (CCC1102). This group diverged from C. purpurea dmaW1, representing the second duplication at node ②. Within group II, the otherwise consistent close relationship between C. monticola and C. pazoutovae was broken by seven strains of C. purpurea. This can be explained by a third duplication at node ③. The presence of dmaW3 of C. arundinis CCC1102 and dmaW1 C. purpurea LM60 in group II indicated extra duplication events at nodes ④ and ⑤ (Figure 5a).
Figure 5

The simplified phylogenies of individual multicopy genes showing potential duplication events. The unedited trees generated by PhyML are presented in the Supplementary Figure S2. (a) dmaW, (b) easF, (c) easE, (d) easH, (e) ltmB, and (f) ltmQ. The thickened branches indicate bootstrapping values ≥80%; dashed and hatches branches are shorter than their real length. The lineages that are not shaded gray are the first copies of each gene.

The second and third copies of easF (easF2, easF3) grouped in one clade diverged from C. cyperi easF1. Within this clade, C. purpurea easF2 (14 strains) appeared as a paraphyletic group, from which diverged a clade composed of C. purpurea easF3 (five strains) and a subclade easF2 of C. quebecensis, C. humidiphila, C. arundinis, C. spartinae, C. pazoutovae, and C. moticola. From this tree topology, at least two gene duplication events were inferred (Figure 5b and Figure S2b). The second copy of easE (easE2) from 16 samples grouped into one clade, which diverged from easE1 of C. occidentalis. However, within the easE2 clade, C. purpurea samples were separated into two subclades. The sample Clav 04 appeared as an orphan clade located close to C. quebecensis easE2, and another 10 samples grouped together and had affinity with C. monticola easE2, indicating that the historical gene duplications possibly occurred twice at nodes ① and ② (Figure 5c and Figure S2c). The second copies of easH (easH2) were grouped into three groups that diverged three times independently. Group I includes two strains of C. ripicola (LM218 and LM220) that diverged from easH1 of the clade composed of C. capensis, C. moticola, and C. pazoutovae. As noted earlier, the sequence lengths of easH2 from these two strains are similar to easH1 and contained good ORFs, indicating that they were likely from a very recent gene duplication. Group II, including three strains of C. occidentalis, one strain each of C. arundinis, C. humidiphila, and C. perihumidiphila, and 15 strains of C. purpurea, diverged from the easH1 clade composed of eight species in sect. Claviceps (C. occidentalis, C. cyperi, C. quebecensis, C. perihumidiphila, C. ripicola, C. spartinae, C. arundinis, C. humidiphila, and C. purpurea). Group III, including nine strains of C. purpurea and the reference sequence of C. purpurea 20.1 easH2, diverged within the clade of C. purpurea easH1 (Figure 5d and Figure S2d). For idt/ltm genes, the second copies of ltmB can be considered as one group arising from one gene duplication, except that ltmB1 of C. humidiphila LM576 was placed in this group. This sequence was the only copy detected in LM576 and, therefore, labeled as copy one. However, it was on a separate contig (contig 478), clustered with neither ltmP and ltmQ (contig 945, Table 3) nor ltmC, ltmS, and ltmM (contig 745). It is very likely that this represents the second copy of this gene, and copy one was either lost or not detected (Figure 5e and Figure S2e). The three partial ltmQ2 genes from three strains of C. occidentalis grouped closely with a clade composed of four strains C. purpurea ltmQ1 (Clav04, Clav46, LM71, and LM72) and two ltmQ2 (Clav55, and LM461) (Figure 5f and Figure S2f). As noted earlier in Section 2.2.2, ltmQ1 of Clav04 and Clav46 was either a partial gene or a nonfunctional gene, respectively, whereas the second copies were functioning genes. Here, ltmQ2 of Clav04 and Clav46 grouped in C. purpurea ltmQ1 clade 1. This situation can be explained by a scenario in which these two copies might have switched locations due to errors in assembling. For another two sequences, ltmQ1 of LM71 was on a different contig with other ltm genes, and in LM72, the gene was split into two contigs, where one half was connected with ltmP, while the other half was independent. Overall, these four sequences appeared as the same copy in C. purpurea ltmQ2 (Clav55 and LM461). If that is the case, one gene duplication event possibly happened at node ①. Alternatively, the ltmQ2 of Clav04 and Clav26, as well as the two ltmQ2 groups, could have resulted from independent gene duplications (Figure 5f and Figure S2f). Long-read sequencing, i.e., Nanopore or PacBio, could bring more insight by ruling out the possible assembly errors.

2.4. Intraspecific Genetic Variation within C. purpurea

Overall, the haplotype diversities (Hd) of eas genes ranged from 0.936 to 1 (close to saturation), except for easH2 that had a lower value, 0.858. Nucleotide diversity (Pi) of eas genes ranged from 0.08 (easD) to 0.168 (easH2), the average number of nucleotide difference (K) ranged from 7.1510 (easD) to 212.238 (easE2), tree-based divergence from COT ranged from 0.06 (easA and easD) to 0.150 (easH2), and tree-based diversity ranged from 0.01 (easD) to 0.219 (easE2). In general, easD and easA had lower values for divergence and/or diversity. The second copies of dmaW, easE, easF, and easH had much higher values of the four parameters. Some of those genes may not function and, therefore, had fewer functional constraints. If only the first copy of the genes was considered, the genes with the highest diversity and divergence values were Pi 0.03 (dmaW1), K 92.379 (lpsC), tree-based divergence from COT 0.0025 (dmaW1), and tree-based diversity 0.038 (dmaW1). The two genes functioning in the middle of the pathway, i.e., easA and easD, were observed to be the most conserved genes compared with the other genes in the earlier or later steps (Table 4, Figure 6a).
Table 4

Nucleotide polymorphism, tree-based divergence, and diversity of ergot alkaloid (eas) and indole-diterpene/lolitrm (idt/ltm) synthesis genes in C. purpurea.

Biosynthesis Genes# of Sequences 1Total # of Sites# of Sites (Excluding Indel)Segregating SitesRatio# of HaplotypesHaplotype (Gene) DiversityNucleotide DiversityAverage Number of Nucleotide DifferencesTree-Based Divergence from COT 2Tree-Based Diversity
Nnss/nhHdPiStd devKMeanStd ErrorMeanStd. Error
Ergot Alkaloid (eas) Genes
dmaW 3515169211960.213320.9950.0550.00450.5230.0620.0040.0860.002
dmaW1 2114809381540.164190.990.0300.00627.8860.0250.0030.0380.002
dmaW2 1415169231020.111130.9890.0420.00439.110.0420.0080.0650.004
easF 3212326301210.192240.9720.0580.01536.5480.0490.0110.0810.003
easF1 251232642400.062170.9530.0160.00110.540.0170.0020.0290.001
easF2 812326341010.159810.0480.02030.4640.0310.0150.0550.010
easE 34228316075770.359280.9790.0850.021135.9380.0850.0280.1470.008
easE1 28192118911120.059220.9680.0130.00124.5260.0130.0010.0200.000
easE2 7228316145360.332710.1320.034212.2380.1150.0420.2180.032
easC 2815081503890.059230.9790.0130.00110.0340.0100.0010.0170.000
easD 28851846370.044220.9760.0080.0017.15100.0060.0010.0100.000
easA 2811431143510.045210.9660.0090.00110.2860.0060.0000.0110.000
easG 281151923570.062200.9740.0100.0019.67200.0140.0010.0230.001
cloA 28275421101730.082230.9790.0170.00135.7960.0160.0010.0300.000
easH 5111955692470.434230.9360.1320.01475.1510.1020.0130.1540.004
easH1 28947943640.068150.9440.0120.00411.0530.0100.0030.0140.001
easH2 2311955712320.406150.8580.1680.01795.850.1500.0330.1880.010
lpsB 28400639612550.064230.9790.0120.00148.4840.0100.0010.0170.000
lpsC 27543154164210.078250.9940.0170.00192.3790.0170.0010.0290.001
Indole-Diterpene/Lolitrem (idt/ltm) Genes
ltmC 2812491246600.048250.9920.0080.00010.2990.0070.0010.0120.000
ltmB1 28871868380.044210.9710.0140.00112.5160.0660.0100.0240.001
ltmM 2817661731740.043250.9920.0070.00112.680.0050.0000.0090.000
ltmQ 25218021192930.138240.9970.0190.006410.0260.0100.0400.004
ltmQ1 24218021192920.138230.9960.0200.00641.4860.0250.0070.0340.003
ltmP 28194918951110.059240.9810.0100.00119.4790.0080.0000.0140.000
ltmS 28955924340.037240.9890.0070.0016.8390.0080.0010.0150.000

1 Sequences with large gaps causing a significant reduction in the number of sites were excluded from the analyses. 2 Tree-based divergence from the center of tree (COT) and diversity were estimated by DIVIEN; other parameters were estimated by DnaSP.

Figure 6

Nucleotide diversity and tree-based diversity and divergence for individual eas genes (a) and idt/ltm genes (b). Error bars denote the standard deviation for Pi and standard error for the other two parameters. The genes are arranged from top to bottom according to their order in the biosynthetic pathway. ltmS is not included in the chart as its function is unknown.

1 Sequences with large gaps causing a significant reduction in the number of sites were excluded from the analyses. 2 Tree-based divergence from the center of tree (COT) and diversity were estimated by DIVIEN; other parameters were estimated by DnaSP. Compared with the first copy of eas genes, idt/ltm genes had a similar level of the highest diversity and divergence. Pi ranged from 0.007 (ltmM and ltmS) to 0.02 (ltmQ1), average number of nucleotide difference (K) ranged from 6.839 (ltmS) to 41.486 (ltmQ1), tree-based divergence from COT ranged from 0.005 (ltmM) to 0.066 (ltmB1), and tree-based diversity ranged from 0.009 (ltmM) to 0.04 (ltmQ) (Table 4, Figure 6b).

3. Discussion

3.1. Correlations between the Presence/Absence of Alkaloid Genes and Alkaloid Production

It has been shown while attempting to induce EA production for pharmaceutical purposes (see review by Flieger [46]) that different ergot species produce varied types of ergot alkaloids. Simultaneously, mycologists explored the use of alkaloid chemistry for characterizing Claviceps species [47,48]. Pažoutová and colleagues [49] differentiated chemoraces using the qualitative and quantitative features of EA production. A systematic study on EA production in 43 Claviceps species confirmed that ergopeptides were produced only by the members in C. sect. Claviceps, whereas dihydroergot alkaloids (DH-ergot alkaloids) were produced only by certain members of C. sect. Pusillae, i.e., C. africana, C. gigantea, and C. eriochloe. Sixteen out of 28 species in C. sect. Pusillae were shown not to produce any EAs, including C. maximesis, C. pusillae, and C. sorghi. Species only producing clavines included C. fusiformis, C. lovelessii, and three other species [3]. More recent studies demonstrated that the indole alkaloid profiles supported the recognition of new species based on molecular and ecological data [29,30]. The EA genes detected in the present study were consistent with the known EA production of the included species, for the most part. For example, C. africana CCC489 had eight genes detected (lacking cloA, easH2, lpsB, and lpsC), and all appeared to be functional, consistent with its production of DH-ergot alkaloids. Similarly, in C. lovelessii CCC647, ten EA genes were detected (lacking lpsC and easH2); however, easH1 and lpsB had mutations resulting in a number of internal stop codons, which is consistent with the production of clavines, a product of the early pathway [3]. A lack of EA production corresponded to no matches for any EA genes in C. maximensis CCC398 and C. citrina CCC265 (C. sect. Citrinae). However, for C. pusillae and C. sorghi, several functional genes were detected even though no EA production was reported [3]. In C. pusillae CCC602, eight genes had full-length matches (dmaW1, easA, C, D, E, G, and H1, and lpsB) and one partial match (cloA 332 bp), but only dmaW1, easC, and easE had ORFs. The lack of easF, the second step in the pathway encoding dimethylallyltryptophan N-methyltransferase, might explain the lack of production of EAs. C. sorghi CCC632 had seven full-length matches (dmaW1 and easA, C, E, F, G, and H1) and two partial (cloA 435 bp and easD 653 bp). Except for cloA and easH1, all other genes had good ORFs. Theoretically, at least chanoclavine should be produced unless those genes were not expressed possibly due to a lack of triggers from physical or environmental conditions [50]. Only the members in C. sect Claviceps had lpsC and easH2, although C. perihumidiphila LM81, one strain of C. ripicola (LM454), and C. arundinis (LM583) lacked lpsC, and C. capensis, C. cyperi, C. humidiphila, and C. monticola had a partial lpsC. Moreover, three C. purpurea strains (LM65, LM72, and LM582) and three C. quebecensis strains (Clav32, Clav50, and LM458) lacked easH2. Whether the absence of these genes causes variations in their EA profiles requires a systematic investigation on the associations between eas genes and products in those species. It is worth noting, however, that the possibility of false negatives in genome screening cannot be ruled out. For instance, for C. arundinis CCC1102, lpsC was detected in the WF version of the genome assembly (created in the present study), but not in the previous version (SW [44], Table 2). The opposite also occurred in that a full length of dmW3 was detected in SW assemblies, but only partially (360 bp) in WF assemblies (this study). The production of indole diterpenoid compounds in ergot fungi was reported in a small number of species, i.e., C. arundinis, C. cynodontis, C. humidiphila, C. paspali, and C. purpurea [21,28,29,30]. Our genome mining showed that ltmQ, P, B, C, M, and S were present in all species in C. sect. Claviceps except C. cyperi. Furthermore, ltmB, C, and M and a nonfunctioning ltmG were detected in C. citrina, while a partial ltmG was detected in C. maximensis CCC389 and C. digitariae CCC659. According to the proposed pathway, to produce paspaline, the first step requires ltmG, followed by ltmC, ltmM, and ltmB [27]. The absence of ltmG could stop production unless GGPP is present through other resources. This might be the case in the producers of indole diterpenoid compounds listed above. In the same way, it is very likely that most of the species in C. sect. Claviceps and the three species in sect. Citrinae and sect. Pusillae could also produce some forms of indole diterpenoid compounds.

3.2. Macro-Evolution of the Gene Clusters—Frequent Gene Duplications and Losses

Ergot alkaloid diversity among diverse producers, i.e., species in Hypocreales, Eurotiales, and Xylariales, was formed by three major processes: gene gains, gene losses, and gene sequence changes [13,14]. This is true within the genus Claviceps. A recent genus-level genome comparison hypothesized that unconstrained tandem gene duplications were caused by putative loss of repeat-induced point mutations in C. sect. Claviceps [44]. This pattern of duplication was confirmed here by the presence of a cluster of second or third copies of easE, easF, and dmaW, as well as second copies of ltmQ and B (Table 2 and Table 3). Moreover, easE2 and F2 of C. purpurea LM461 were on the same contig as easG and partial dmaW1, suggesting that the second copies of easE and F were arranged on the primary cluster possibly as a result of tandem gene duplication. None of the extra gene copies were found in C. sect. Pusillae or sect. Citrinae, consistent with a previous observation that the genomes of sect. Pusillae and sect. Citrinae had much fewer gene duplication events predicted [44]. According to the phylogenies of multicopy genes, one to five gene duplications can be inferred for individual genes. The dmaW gene, encoding the enzyme for the first and determinant step of EA production, had the highest number of potential gene duplications. Even though the presence of dmaW was conserved across various EA producers and proven to be a monophyletic group [51], its evolutionary rate was faster than genes in the middle steps of the EA pathway. Gene losses can be inferred through the discrepant placement of certain gene copies on the phylogenies. For instance, one copy of ltmB in C. humidiphila LM576 was detected; however, this copy grouped with ltmB2. It is very likely that this was the second copy of ltmB gene, and the first copy was either lost or not detected (Figure 5e, see also Section 2.2.2. and Section 2.2.3). The ltmQ1 from four strains of C. purpurea (LM71, LM72, Clav04, and Clav46) was placed in the ltmQ2 clade. For LM71 and LM72, there was only one copy detected (ltmQ1); the scenario is likely similar to ltmB of LM576, where this single copy was the second copy, and the original gene was either lost or not detected (Figure 5f). On a related note, ltmQ2 of Clav04 and Clav46 was located in the ltmQ1 clade. An intuitive explanation would be that the identities of the two copies switched due to assembly artefacts (Figure 5f). Lastly, the incongruent order of divergence of the four Batches of species in C. sect. Claviceps inferred by single-copy genes could be explained as lineages sorted during the frequent gains and losses of the ancestral genotypes (Figure 4). Unlike C. sect. Claviceps, the phylogeny incongruence in C. sect Pusillae was mainly caused by the uncertain placement of C. digitariae and C. paspali. In light of the genome structure, this was likely caused by insufficient sampling instead of gene lineage sorting.

3.3. Micro-Evolution of eas Genes within C. purpurea—An Approximate Hourglass Model

The inter- and intraspecific variations of the second metabolite gene clusters in fungi are typically reported as variations in structures, gene contents, copy numbers, null alleles, and nonhomologous clusters (see review by Rokas [52]). Fewer studies have focused on the DNA sequence variations in each of the gene members. Lorenz et al. [53] identified the sequence differences in lpsA between two C. purpurea strains (P1 and ECC93) that were associated with the different alkaloid types; however, they could not find differences in cloA between C. fusiformiis and C. hirtella that could explain why this gene was functional in the former but not in the latter. Phylogenetic analyses of DNA sequences of four core genes (dmaW, easF, easC, and easE) from selected samples across Clavicipitaceae (with emphasis on Epichloë) uncovered extensive gene losses, and the origin of EA clusters on Clavicipitaceous fungi was determined to be direct descent rather than horizontal transfer [13]. The present study is the first, to our knowledge, to examine the variations of each gene on a fine scale, i.e., among 28 strains of C. purpurea. Both DNA polymorphism analyses of the DNA sequence alignments through DnaSP and tree-based diversity and divergence analyses using the DEVIEN software indicated that the evolutionary rate of early step genes, i.e., dmaW and easF is much higher than the middle step genes, i.e., easA, C, D, and E (Figure 6, Table 4). The pattern matches with the hourglass model in ontogeny, which was also evidenced in genomic studies [39]. The hourglass model (HGM) and early conservation model (ECM) in ontogeny are explained by developmental constraints. HGM considers that, at the middle stage, the meta- and cis-interactions reach the highest complexity, posing constraints for development [54,55], whereas ECM considers the constraints at early stage to be critical because any alterations at early stage would cause cascading effects [56]. The EA pathway was reported as an unusually inefficient one such that a high volume of certain intermediates were accumulated more than needed for producing the end-products [57]. This may impose less selective pressure on the middle steps. The sclerotia of C. purpurea from tall fescue contained chanoclavine (4 ± 3 µg/g) and agroclavine (2 ± 1 µg/g) in addition to the end-products, i.e., ergopeptines and ergnovine [57]. The extra amount of chanoclavine coincides with the lowest evolutionary rates of easD and easA inferred in the present study (Figure 6). The role of easD is to oxidize chanoclavine to chanoclavine aldehyde, followed by the reactions of easA and easG to yield agroclavine. It is likely that easD is under less selective pressure because plenty of supplies are available. Alternatively, it might be under a high level of functional constraints because of its pivotal position in the pathway (first step of closure of the D-ring). A different isoform of easA in C. africana and C. gigantean reacts differently, creating a shunt yielding dihydroergot alkaloids (Figure 1). This diversification may result from the change in ecological niches. Nevertheless, the rates of diversity and divergence of easA were the second lowest after easD, even though it is physically located in between lpsB and lpsC. Both of these later step genes had much higher rates than easA, possibly due to fewer constraints or more direct positive selection, as they are involved in the final steps. The cloA gene represents another point of the pathway where shunts may take place. Presumably depending on the different isoforms of cloA, varied levels of oxidation occur, resulting in different end-products [13,15]. The high rates of diversity and divergence of cloA may reflect a high level of positive selection. The signatures of selective pressure in DNA sequences could be detected through neutrality tests. For instance, if the value of Tajima’s D significantly deviates from zero, it indicates the presence of selective pressures, i.e., negative values suggest a positive selection, whereas positive values indicate balancing selection [58]. We conducted neutrality tests and found that none of the genes departed significantly from neutrality (results not shown). These results are contradictory to Liu et al. [59], in that easE and easA were under positive selection in Canadian and western USA C. purpurea populations. We speculate here that the small sample sizes in present study (28 sequences versus 200–300 in the previous study) might be the factor limiting the ability of the Tajima’s D test to detect selective pressures. Compared with eas gene pathways, it is difficult to evaluate whether or not the evolutionary pattern of ltm genes conformed with the hourglass model because the sequential order of steps was uncertain. Even if we assume that paspaline-derived compounds are the main products, in the absence of ltmG, there are only two to three sequential steps to paspaline. Nevertheless, ltmM had the lowest rate of divergence and diversity compared with earlier (ltmC) and later steps (ltmP and Q). Our results provide evidence for the first time that eas gene evolution follows the hourglass model. Whether this pattern exists in other metabolic gene pathways and the mechanisms that underpin this or other patterns are questions to be answered in future work.

4. Materials and Methods

4.1. Genome Aquisition

Fifty-four genomes of 19 Claviceps spp. were studied. The assemblies of 17 genomes and the raw reads of another 34 genomes were from previous studies (Table 1) [44,45], which outlined the protocols for the DNA extraction, library preparation, and sequencing platforms. In the present study, three additional genomes were sequenced (LM63, LM65, and LM72) using a protocol similar to that described in [44]. Briefly, the gDNA samples were normalized to 300 ng and sheared to 350 bp fragments using an M220 Covaris Focused-Ultrasonicator instrument (Covaris, Woburn, MA, USA). The obtained inserts were used as a template to construct PCR-free libraries using the NxSeq AmpFREE Low DNA Library kit (LGC, Biosearch Technologies, Middleton, WI, USA)) following LGC’s library protocol. Balanced libraries in equimolar ratios were pooled, and paired-end sequencing was carried on a NextSeq500/550 (Illumina, San Diego, CA, USA) using 2 × 150 bp NextSeq Mid Output Reagent Kit (Illumina, San Diego, CA, USA) according to the manufacturer’s recommendations. The new assemblies of 37 genomes were achieved using the following protocols: raw reads were trimmed using BBDuk, a component of BBTools downloaded from the Joint Genome Institute website (https://jgi.doe.gov/data-and-tools/bbtools/ accessed on 9 November 2021). Both quality-trim and kmer-trim were applied using the parameters qtrim = rl, trimq = 20, forcetrimleft = 10, minlength = 36, ftm = 5, ref = adapters/adapters.fa, ktrim = r, k = 22, mink = 11, hdist = 1, tbo tpe. The qualities of initial reads and post-trimming reads were assessed using FastQC version 0.11.9, setting parameters as quiet, noextract. Pairs of trimmed reads for each strain were assembled using the SPAdes version 3.14.0 genome assembly toolkit with the default parameters [60]. QUAST version 5.0.2 was used to evaluate the resulting assemblies and to obtain statistics about the assembled contigs [61]. To assess the completeness of the genome assemblies, BUSCO 4.1.4 was run on the contigs using the fungal database (fungi odb10) (Creation date: 10 September 2020, number of species: 549, number of BUSCOs: 758) [62].

4.2. Alkaloid Gene Screening and Extraction

To investigate the presence/absence of the four classes of alkaloid synthesis genes in 54 genomes, BLAST searches were conducted to interrogate the genomes with the reference genes of interest using an in-house perl script (running blastn with an E-value of E−99 as the cutoff). Alternatively, each individual genome assembly was mapped onto the reference genes using the ‘Map to Reference’ function in Geneious prime 2020.1.2 (https://www.geneious.com, accessed on 9 November 2021). The reference gene clusters were downloaded from GenBank and applied as follows: the clusters of 14 ergot alkaloid synthesis (eas) genes and six indole-diterpene/lolitrem genes (IDT/ltm) from C. purpurea strain 20.1 (JN186799 containing cloA, dmaW, easA, C–G, easH1, easH2, lpsA1, lpsA2, lpsB, and lpsC; JX402756 containing idt/ltmB, C, M, P, Q, and S) and C. paspali RRC-1481 JN186800 (easO) were first applied as a query to interrogate each genome. In addition, the cluster from C. fusiformis PRL1980 EU006773 (10 genes: cloA, dmaW, easA, C–H, and lpsB) were applied to further interrogate genomes in C. sect. Pusillae and C. citrina. For the IDT/ltm genes that were not previously reported in Claviceps purpurea 20.1, the reference sequences from C. paspali JN613321 (ltmF and ltmG) and Epichloë (ltmE and J on JN613318, and K on JN613320) were used to conduct lower stringency megablast searches (https://www.geneious.com, accessed in 9 November 2021) with E-values E−50 and E−20. Megablast searches were also conducted for loline alkaloid genes (lolA, D, E, M–P, T, and U on JF830816, lolC FJ464781, and lolF FJ594413) and peramine (perA JN640287) in all 54 genomes. Genes that were present in genomes were extracted manually. Split fragments of a single gene on different contigs were concatenated on the basis of reference sequences. DNA sequences of genes extracted from the new genomes were submitted to GenBank. When multiple copies of certain genes were present (such as dmaW, easE, easF, ltmB, and ltmQ), the copy on the main cluster was designated as copy 1, as determined by examining the contig numbers. The exception was easH, which was determined on the basis of the similarity to the two copies determined by previous studies [14]. Disconnected fragments shorter than 300 bps were not considered.

4.3. Phylogenetic Analyses

The extracted sequences for each gene were aligned individually through the Geneious Prime (https://www.geneious.com, accessed on 9 November 2021) Align/Assemble function using Global alignment with free end gaps, 93% similarity (5.0/−9.026168) as the cost matrix, a gap open penalty of 12, a gap extension penalty of 3, and two refinement iterations. This protocol is particularly suitable for aligning sequences with large gaps or shorter fragments to full-length sequences. Maximum likelihood phylogenetic trees were developed using the PhyML 3.3.20180621 [63] plugin of Geneious Prime (https://www.geneious.com, accessed on 9 November 2021). Both GTR and HKY substitution models were attempted; branch supports were evaluated through bootstrapping analyses of 100 replicates. Reference sequences of lpsB of C. paspali has only 52% similarity with C. purpurea, causing spurious alignment and a significantly long branch; therefore, they were not included in the analyses.

4.4. Intraspecific Gene Diversity and Divergence Analyses

Population demographic parameters are suitable for investigating genetic differentiation and gene evolution at an intraspecific level. We investigated the DNA polymorphisms, nucleotide diversity (Pi), and average number of nucleotide differences (K) among 27 strains of C. purpurea using DnaSP [64]. Another reason for choosing this sub-set of data, instead of all 53 samples, is that all but three strains (LM65, LM2, and LM582 lacked easH2) contained all 12 genes, making the results more comparable. Nonetheless, the sequences with long gaps causing a significant reduction in alignment length in dmaW and easF were excluded from the DnaSP analyses. In addition, the tree-based diversity and divergence from the center of the tree (COT) were calculated through the web-based DIVEIN software (https://indra.mullins.microbiol.washington.edu/DIVEIN/diver.html, accessed on 9 November 2021) [65]. The following parameters were applied: GTR substitution model, optimized equilibrium frequencies, the best of NNI and SPR tree improvement, and topology + branch length tree optimization algorithm. For multicopy genes (dmaW, easE, easF, and easH), we calculated the parameters for each individual copy and combined them as one gene (Table 4).
  51 in total

1.  Evolutionary biology: Genomic hourglass.

Authors:  Benjamin Prud'homme; Nicolas Gompel
Journal:  Nature       Date:  2010-12-09       Impact factor: 49.962

2.  A "developmental hourglass" in fungi.

Authors:  Xuanjin Cheng; Jerome Ho Lam Hui; Yung Yung Lee; Patrick Tik Wan Law; Hoi Shan Kwan
Journal:  Mol Biol Evol       Date:  2015-02-27       Impact factor: 16.240

Review 3.  The birth, evolution and death of metabolic gene clusters in fungi.

Authors:  Antonis Rokas; Jennifer H Wisecaver; Abigail L Lind
Journal:  Nat Rev Microbiol       Date:  2018-12       Impact factor: 60.633

4.  Molecular cloning and genetic analysis of a symbiosis-expressed gene cluster for lolitrem biosynthesis from a mutualistic endophyte of perennial ryegrass.

Authors:  C A Young; M K Bryant; M J Christensen; B A Tapper; G T Bryan; B Scott
Journal:  Mol Genet Genomics       Date:  2005-07-01       Impact factor: 3.291

Review 5.  Ergotamine in the acute treatment of migraine: a review and European consensus.

Authors:  P Tfelt-Hansen; P R Saxena; C Dahlöf; J Pascual; M Láinez; P Henry; H Diener; J Schoenen; M D Ferrari; P J Goadsby
Journal:  Brain       Date:  2000-01       Impact factor: 13.501

6.  Temporal colinearity and the phylotypic progression: a basis for the stability of a vertebrate Bauplan and the evolution of morphologies through heterochrony.

Authors:  D Duboule
Journal:  Dev Suppl       Date:  1994

Review 7.  Human and cattle ergotism since 1900: symptoms, outbreaks, and regulations.

Authors:  Sarah Belser-Ehrlich; Ashley Harper; John Hussey; Robert Hallock
Journal:  Toxicol Ind Health       Date:  2012-08-17       Impact factor: 2.273

8.  Tremorgenic indole alkaloids potently inhibit smooth muscle high-conductance calcium-activated potassium channels.

Authors:  H G Knaus; O B McManus; S H Lee; W A Schmalhofer; M Garcia-Calvo; L M Helms; M Sanchez; K Giangiacomo; J P Reuben; A B Smith
Journal:  Biochemistry       Date:  1994-05-17       Impact factor: 3.162

9.  Links between Genetic Groups, Indole Alkaloid Profiles and Ecology within the Grass-Parasitic Claviceps purpurea Species Complex.

Authors:  Mariell Negård; Silvio Uhlig; Håvard Kauserud; Tom Andersen; Klaus Høiland; Trude Vrålstad
Journal:  Toxins (Basel)       Date:  2015-04-28       Impact factor: 4.546

10.  Phylogenetic analyses reveal monophyletic origin of the ergot alkaloid gene dmaW in fungi.

Authors:  Miao Liu; Daniel G Panaccione; Christopher L Schardl
Journal:  Evol Bioinform Online       Date:  2009-06-04       Impact factor: 1.625

View more
  2 in total

1.  Evolution of the Ergot Alkaloid Biosynthetic Gene Cluster Results in Divergent Mycotoxin Profiles in Claviceps purpurea Sclerotia.

Authors:  Carmen Hicks; Thomas E Witte; Amanda Sproule; Tiah Lee; Parivash Shoukouhi; Zlatko Popovic; Jim G Menzies; Christopher N Boddy; Miao Liu; David P Overy
Journal:  Toxins (Basel)       Date:  2021-12-02       Impact factor: 4.546

2.  Global Impact of Ergot Alkaloids.

Authors:  James L Klotz
Journal:  Toxins (Basel)       Date:  2022-03-03       Impact factor: 4.546

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.