Janneke Aylward1, Emma T Steenkamp2, Léanne L Dreyer1, Francois Roets3, Brenda D Wingfield4, Michael J Wingfield2. 1. Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa. 2. Department of Microbiology and Plant Pathology, University of Pretoria, Pretoria 0002, South Africa. 3. Department of Conservation Ecology and Entomology, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa. 4. Department of Genetics, University of Pretoria, Pretoria 0002, South Africa.
Abstract
The majority of plant pathogens are fungi and many of these adversely affect food security. This mini-review aims to provide an analysis of the plant pathogenic fungi for which genome sequences are publically available, to assess their general genome characteristics, and to consider how genomics has impacted plant pathology. A list of sequenced fungal species was assembled, the taxonomy of all species verified, and the potential reason for sequencing each of the species considered. The genomes of 1090 fungal species are currently (October 2016) in the public domain and this number is rapidly rising. Pathogenic species comprised the largest category (35.5 %) and, amongst these, plant pathogens are predominant. Of the 191 plant pathogenic fungal species with available genomes, 61.3 % cause diseases on food crops, more than half of which are staple crops. The genomes of plant pathogens are slightly larger than those of other fungal species sequenced to date and they contain fewer coding sequences in relation to their genome size. Both of these factors can be attributed to the expansion of repeat elements. Sequenced genomes of plant pathogens provide blueprints from which potential virulence factors were identified and from which genes associated with different pathogenic strategies could be predicted. Genome sequences have also made it possible to evaluate adaptability of pathogen genomes and genomic regions that experience selection pressures. Some genomic patterns, however, remain poorly understood and plant pathogen genomes alone are not sufficient to unravel complex pathogen-host interactions. Genomes, therefore, cannot replace experimental studies that can be complex and tedious. Ultimately, the most promising application lies in using fungal plant pathogen genomics to inform disease management and risk assessment strategies. This will ultimately minimize the risks of future disease outbreaks and assist in preparation for emerging pathogen outbreaks.
The majority of plant pathogens are fungi and many of these adversely affect food security. This mini-review aims to provide an analysis of the plant pathogenic fungi for which genome sequences are publically available, to assess their general genome characteristics, and to consider how genomics has impacted plant pathology. A list of sequenced fungal species was assembled, the taxonomy of all species verified, and the potential reason for sequencing each of the species considered. The genomes of 1090 fungal species are currently (October 2016) in the public domain and this number is rapidly rising. Pathogenic species comprised the largest category (35.5 %) and, amongst these, plant pathogens are predominant. Of the 191 plant pathogenic fungal species with available genomes, 61.3 % cause diseases on food crops, more than half of which are staple crops. The genomes of plant pathogens are slightly larger than those of other fungal species sequenced to date and they contain fewer coding sequences in relation to their genome size. Both of these factors can be attributed to the expansion of repeat elements. Sequenced genomes of plant pathogens provide blueprints from which potential virulence factors were identified and from which genes associated with different pathogenic strategies could be predicted. Genome sequences have also made it possible to evaluate adaptability of pathogen genomes and genomic regions that experience selection pressures. Some genomic patterns, however, remain poorly understood and plant pathogen genomes alone are not sufficient to unravel complex pathogen-host interactions. Genomes, therefore, cannot replace experimental studies that can be complex and tedious. Ultimately, the most promising application lies in using fungal plant pathogen genomics to inform disease management and risk assessment strategies. This will ultimately minimize the risks of future disease outbreaks and assist in preparation for emerging pathogen outbreaks.
Sequencing of fungal genomes is being driven by various groups of scientists having different interests and needs from genomic data. Mycologists desire genome data to understand how fungi live and evolve, while industries require information on how to improve metabolic pathways or how to find new sources of natural products. The medical and plant pathology sectors need this information to understand diseases, improve diagnoses, understand how they function, and ultimately prevent or at least manage disease outbreaks (Kelman 1985). By 2007, the genomes of 42 eukaryotes were available (Cornell) and by 2008 the number of fungal genomes exceeded 90 (Park). Today, more than 3000 fungi are in completed or ongoing genome projects and the genomes of more than 900 fungal species have been released. The substantial and growing investment in determining genome sequences reflects the positive impact that this field is having on research. Our question here is what the impact has been for plant pathogenic fungi.This mini-review aims to summarise the number of available fungal plant pathogen genomes, determine their general characteristics, and consider the impact that the availability of these genomes is having on the study of plant pathology. In order to determine which fungal plant pathogens have been sequenced, we surveyed fungal species (including Microsporidia, but excluding Oomycota) listed in 11 online genome repositories (Table 1), including MycoCosm (Grigoriev, 2013), NCBI Genome (www.ncbi.nlm.nih.gov/genome), the Broad Institute (www.broadinstitute.org), and the universal cataloguing database, Genomes OnLine Database (GOLD; Reddy). Fungal species that were found in more than one database were clustered and the current classification of all species was verified up to ordinal level using MycoBank (Robert) and Index Fungorum (www.IndexFungorum.org). The most recent scientific literature was consulted where the two online reference databases were not in agreement. Synonymous names associated with each species were noted by consulting MycoBank. We used this non-redundant list to accurately determine the number of fungal species with available genome sequences and, specifically, the extent to which fungal plant pathogens have been sequenced.
Table 1.
Fungal genome resources used to populate the genome species list.
The lower cost of genome sequencing, due to high-throughput technologies, has encouraged large scale genome initiatives. These include the 5000 Insect Genome Project (i5k; Robinson), Genome 10K (Genome 10K, Community of Scientists 2009) and 1000 Plants (www.onekp.com). These projects aim to sample species diversity by sequencing, respectively, whole genomes of insects and vertebrates and the transcriptomes of plant species. Similarly, fungal genome sequencing programmes such as the Fungal Genome Initiative (Fungal Research Community 2002, The Fungal Genome Initiative Steering Committee 2003), the Fungal Genomics Program (Grigoriev, Martin), and its extension, the 1000 Fungal Genomes (1KFG) Project (Spatafora 2011), have contributed significantly to the number of fungal genomes currently available and continue to do so.The prevalence of fungi as study organisms is evident when considering the on-going and completed genome sequencing projects. A catalogue of genome projects, GOLD (Reddy), began in 1997 with six genome entries (Bernal) and in October 2016 included 7422 eukaryote whole genome sequencing projects, of which 3515 (47.4 %) are fungal. Although the respective genome databases (Table 1) list numerous completed and on-going fungal projects, many entries do not represent different species. Of the 1459 completed fungal genome projects in GOLD, slightly more than half (ca. 775) are different species, whilst the remainder comprise additional strains of already-sequenced species. To illustrate the extent and prevalence of sequenced genomes in the fungal kingdom, we mapped species with publically available genomes onto ordinal consensus trees (Fig. 1A–C).
Fig. 1.
Ordinal consensus trees depicting the taxonomic (subphylum, class and order) distribution of publically available genomes for the Ascomycota (A), Basidiomycota (B) and early-diverging fungi (C). The number of sequenced genomes from each order is indicated after the order name. Where sequenced species are not classified into a family or have not been described, these are indicated as incertae sedis (inc. sed.) or unknown (unk.), respectively. The number of families with sequenced representatives out of the total number of described families is indicated in brackets. For each order, horizontal bars show the current progress of sequencing two genomes per family, indicated according to the scale bar below the figure. Dikarya consensus trees are according to Hibbett , while the classification of Spatafora was included in the tree of early-diverging fungi. Orders described after Hibbett have been added in grey (see Supplementary File 3 for references). The figures do not include unclassified fungi that have not been sequenced.
The most recent of the fungal genome sequencing initiatives, the five-year international collaborative 1KFG Project, aims to sequence and annotate two species from each of the more than 500 known fungal families (Spatafora 2011). In three years, there has been a shift from obtaining representative genomes for all the fungal phyla (Buckley 2008) to targeting genome sequencing at the family level. By October 2016, the genomes of 1090 different fungal species were publically available (Supplementary File 1). Of this number, the 1KFG Project has released approximately 60 % of the fungal species genomes available.Although the target of 1000 sequenced fungal species has been reached, the goal of two genomes from each family is a bigger task than the number 1000 (Fig. 1). The goal of having the genome sequences for two representatives have only been achieved in 85 families in the Ascomycota, 66 in the Basidiomycota, and 11 in the remainder of the fungi. Not surprisingly, some economically and medically important families (e.g. Aspergillaceae, Clavicipitaceae, Mucoraceae, Mycosphaerellaceae, Saccharomycetaceae, Tremella-ceae, Ustilaginaceae) have many more than two representatives. Additionally, taxonomic revision and species descriptions continue to generate new fungal families and orders. In the almost ten years since the publication of the Hibbett consensus tree, more than 50 fungal orders have been described, somewhat increasing the workload of the 1KFG Project. Additionally, less than 10 % of the conservative estimate of 1.5 million total fungal species are known (Hawksworth 2012) and new species descriptions continuously emerge. Therefore, the combined goals of sampling fungal biodiversity and sequencing the genomes of representative species are a continuous process.
ARE PATHOGENS PREFERENTIALLY SEQUENCED?
More than 90 % of known fungal species reside in the subkingdom Dikarya (Kirk) comprised of the two largest phyla, Ascomycota and Basidiomycota. The large number of ascomycete and basidiomycete species for which genome sequences have been determined (Table 2) is, therefore, not an over-emphasis of these common phyla, but rather reflects the size and diversity of the Dikarya (Fig. 2). In the majority of cases, the proportion of sequenced species in the phyla of early-diverging fungi is congruent with the known species, suggesting that genome projects have not neglected them (Fig. 2). Mucoromycota has a larger proportion of sequenced species than known species due to the sequencing of several Mucoraceae species that cause human mucoromycosis. One phylum (Olpidiomycota) and one subphylum (Zoopagomycotina) of early-diverging fungi, however, do not have any sequenced representatives and do not have “targeted” or “in progress” projects listed on GOLD. Olpidiomycota was only described recently (Doweld 2013) and its members appear to be poorly known. The lack of Zoopagomycotina sequences can likely be ascribed to very few available pure cultures of this predominantly parasitic group of fungi (Spatafora).
Table 2.
Number of fungal species from each phylum and subphylum with at least one available genome.
Phylum and subphylum
Sequenced species
Known speciesa
ASCOMYCOTA
684
> 64 000
Pezizomycotina
547
Saccharomycotina
123
Taphrinomycotina
11
Incertae sedis
3
BASIDIOMYCOTA
311
> 31 000
Agaricomycotina
227
Pucciniomycotina
43
Ustilaginomycotina
41
BLASTOCLADIOMYCOTA
2
> 175
CHYTRIDIOMYCOTA
5
> 700
CRYPTOMYCOTA
1
?
MICROSPORIDIA
25
> 1 300
MUCOROMYCOTA
47
Glomeromycotina
1
> 165
Mortierellomycotina
5
Mucoromycotina
41
> 325
NEOCALLIMASTIGOMYCOTA
5
> 20
ZOOPAGOMYCOTA
8
Entomophthoromycotina
4
> 275
Kickxellomycotina
4
> 260
UNKNOWN
2
Total sequenced
1090
a According to Kirk .
Fig. 2.
Comparison between the proportion of known and sequenced fungal species in the major fungal taxonomic groups. The number of known species were obtained from Kirk .
Within the subphyla of Ascomycota and Basidio-mycota, the proportion of sequenced species also largely corresponds to the number of known species (Fig. 3), with the major exception being Saccharomycotina (budding yeasts; Fig. 3A). The emphasis placed on this subphylum is even more pronounced when considering not only the number of species, but also the number of strains that have been sequenced. Although Pezizomycotina (filamentous fungi) is by far the most species-rich subphylum on the genome list (547 spp.), most sequenced strains are in Saccharomycotina (416), maintaining its previous status as the most sequenced subphylum in the kingdom (Cuomo & Birren 2010). Other than members of Saccharomycotina, seven highly sequenced (≥ 10 strains) species are listed in GOLD (Table 3). One is of industrial importance (Rhodotorula toruloides), whereas the remainder influence food security (Aspergillus flavus, Fusarium oxysporum, and Magnaporthe oryzae) or human health (Cryptococcus gattii, Coccidioides posadasii, and Trichophyton rubrum).
Fig. 3.
Comparison between the proportion of known and sequenced fungal species in the subphyla of Ascomycota (A) and Basidiomycota (B). The number of known species were obtained from Kirk .
Table 3.
Fungal species on the Genomes OnLine Database (Bernal ) with 10 or more completed whole-genome sequencing projects.
Species
Strains
Phylum
Subphylum
Saccharomyces cerevisiae
166
Ascomycota
Saccharomycotina
Magnaporthe oryzae
48
Ascomycota
Pezizomycotina
Candida albicans
35
Ascomycota
Saccharomycotina
Komagataella pastoris
32
Ascomycota
Saccharomycotina
Saccharomyces kudriavzevii
20
Ascomycota
Saccharomycotina
Cryptococcus gattii
18
Basidiomycota
Agaricomycotina
Fusarium oxysporum
17
Ascomycota
Pezizomycotina
Trichophyton rubrum
12
Ascomycota
Pezizomycotina
Aspergillus flavus
10
Ascomycota
Pezizomycotina
Coccidioides posadasii
10
Ascomycota
Pezizomycotina
Rhodotorula toruloides
10
Basidiomycota
Pucciniomycotina
Saccharomyces pastorianus
10
Ascomycota
Saccharomycotina
A 2008 report by the American Academy of Microbiology (Buckley 2008) stated that fungal genome sequencing is “heavily” skewed in the favour of human pathogens. At that time whole-genome sequencing of eukaryotes, especially fungi, was in its infancy and the statement was based on “100-150 fungal representatives”. The initial high cost of genome sequencing would have favoured fungi of medical importance, but as the number of sequenced fungi grew and the cost decreased, this pattern was bound to change. We assessed whether pathogens are highly sequenced by consulting recent scientific literature (where available) on each fungal species on the genome list and categorising them according to their significance and reason for being sequenced. The largest (41.4 %) category consisted of pathogenic fungi and fungi of medical importance (Fig. 4), of which plant pathogens were the most prevalent group (49.4 %). Currently, 191 plant pathogenic species have publically available genomes (Supplementary File 2) and all belong to Dikarya. Of these, 117 are pathogens of at least one food crop and 43 affect gymnosperms, the majority of which are commercially important (Table 4). The 117 food crop species include pathogens of cereals, fruit, vegetables, and legumes. At least 60 of these species are responsible for diseases on 10 of the 15 global staple food crops (FAO 1995).
Fig. 4.
Categories of significance identified in the 1090 sequenced fungal species. Pathogens comprise the largest category within which plant pathogens are predominant. “Medically important” species represent fungi that are not directly pathogenic, but cause food or environmental contamination. “Interesting” species are studied for their development or metabolism. “Niche-specific” refers to species occupying abiotic niches, whereas “symbiotic” species are associated with other organisms. “Economically important” species have a use in the economy, (e.g. culinary, biocontrol or pharmaceutical industries). Most of the parasites belong to Microsporidia.
Table 4.
Categories of plants affected by the 191 sequenced fungal plant pathogens.
Plant pathogen categories
Genomes available
%
Cash Crop Pathogens
8
4.2
Food Crop Pathogens
117
61.3
Grains
48
25.2
Fruit
37
19.4
Vegetables
10
5.2
Legumes
11
5.8
Multiple crop types
11
5.8
Gymnosperm Pathogens
43
22.5
Othera
23
12.0
TOTAL
171
100
a Non-gymnosperms not cultivated for food.
Clearly, genome sequencing projects have placed an emphasis on plant pathogenic species, specifically those affecting food security or commercial forestry. In general, fungal pathogens are highly represented in the genome list. Furthermore, as the number of available genomes has increased, plant pathogens have replaced human pathogens as the predominantly sequenced category of species. Since most plant pathogens are fungi (Carris), the emphasis placed on fungal genome sequencing may (at least partly) be attributed to food security. For example, M. oryzae, a plant pathogen for which numerous strains have been sequenced (Table 3), is predicted to increase its distribution range and impact due to increased temperature and carbon dioxide levels (Gautam). The sheer number of plant species and their associated disease-causing fungi makes this change in the focus of genome sequencing understandable. Sequencing a large number of plant pathogens that affect a range of plant species is, after all, less of a bias than sequencing many pathogenic species associated with a single species (humans).
GENOME SIZE AND GENE NUMBERS IN PLANT PATHOGENIC FUNGAL GENOMES
As far as we are aware, this review includes the first comprehensive list of plant pathogenic fungal genomes that have been sequenced to date. We, therefore, briefly present an overview of the genome characteristics of these species in comparison to other sequenced fungal species. We specifically looked at genome size and the numbers of genes encoded, because previous studies have revealed a link between plant pathogenicity and genome size and gene content (Duplessis, Ohm, Spanu).The 1090 fungal species with publically available genome sequences have haploid genome sizes ranging between two and 336 million base pairs (Mbp; Fig. 5A). The majority of these genomes fall within the 30–40 Mbp range (average = 37.2, median = 33.6), consistent with the size distribution of the 1940 entries in the Fungal Genome Size Database (Kullman). The genome sizes of sequenced plant pathogens are only slightly, but significantly, larger compared to this “norm”. This difference was most apparent in the pathogenic ascomycetes for which Mann-Whitney U tests indicated the highest level of significance (Median = 38.0; U = 48131, P < 0.01). The average genome size of plant pathogenic basidiomycetes (57.3 Mb) was much larger than that of the plant pathogenic ascomycetes (39.4 Mb) and the remainder of the fungal genomes (34.8 Mb), owing to several pathogenic pucciniomycete (rust) species with genomes larger than 100 Mb.
Fig. 5.
Genome size (A and B) and number of open reading frames (ORFs) per million base pairs (Mbp; C and D) in the plant pathogen genomes compared to the remainder of the genome list and other pathogens. Boxplots were drawn in BoxplotR (Spitzer ) using the Tukey whisker extent. Width of the boxes is proportional to the square root of the sample size; notches show the 95 % confidence interval of the median. Opp. = opportunistic pathogens. In B and D, animal pathogens include entomopathogenic fungi and Microsporidia are excluded from “other” genomes, since they have small genomes with many ORFs/Mb.
Somewhat larger genome sizes in plant pathogens are congruent with the hypothesis that they often contain more repeated elements than other species (discussed below) (Castanera, Ma). Sequenced plant pathogens also have larger genomes than human, animal and opportunistic fungal pathogens (Fig. 5B). Although sequencing has thus far sampled the genome size distribution of the majority of the fungal kingdom, species with excessively large genome sizes have been omitted. This is not necessarily only due to the higher cost of sequencing large genomes, but probably also the complexity of obtaining sufficient biomaterial from a single obligately parasitic individual cultured on a live host (Barnes & Szabo 2008). Since the majority of species in Pucciniomycotina reside in the order Pucciniales of obligate plant pathogens (Kirk), the latter may also explain why the proportion of sequenced species in this group is slightly less than the known species (Fig. 3B). Therefore, genome-sequencing efforts so far, most likely underestimate the maximum size of plant pathogen genomes.Considering the number of predicted open reading frames (ORFs), 714 of the available genomes have publically accessible gene annotations. The sequenced fungi have, on average, 11 256 ± 3 873 total predicted ORFs at a density of 351.8 ± 104.0 ORFs per Mbp (Fig. 5C). In comparison to the other genomes, plant pathogenic fungi do not differ in the number of predicted ORFs, but they do have significantly fewer ORFs when accounting for genome size (U = 35647; P < 0.01). This trend was also observed in the animal pathogenic fungi (including entomopathogens; U = 8847; P < 0.01). Previous whole-genome studies suggest that the number of coding genes does not necessarily increase with genome size, since transposable elements and repetitive sequences proliferate in large genomes (Kidwell 2002). The lower number of ORFs/Mb in the genome of plant pathogenic Ascomycetes is, therefore, consistent with their larger genome size possibly being due to repetitive elements. Additionally, some pathogens have lost genes redundant in their lifecycles (Spanu), which may also decrease their ORFs/Mb. This trend could, however, not be detected in the genomes of human and opportunistic pathogens (Fig. 5D).
IMPACT OF GENOMES ON PLANT PATHOLOGY
Ever since the advent of plant pathology, researchers have been interested in the biology of plant pathogens and how this can be translated into means for disease control. In this regard, genome data are not used in isolation, but provide context for observational and experimental data, thereby accelerating the pace of traditional research methods. For emerging pathogens, a disease may be known, but the mechanisms relating to infection biology and virulence are not necessarily understood. In these cases, a genome can provide the first glimpse of the potential effectors and toxins that are present (e.g. Ellwood). In some plant pathogens, genomes have resulted in a shift of conventional paradigms. Here a classic example is the discovery of entire horizontally transferrable chromosomes related to pathogenicity in Fusarium oxysporum f. sp. lycopersici (Ma). The primary impacts of genome sequences on plant pathology have been a better understanding of the pathogenicity, life-style and genome evolution of pathogens. Furthermore, genomes are also resources from which genetic tools can be used to mine information.
Pathogenicity and life-style
Secreted and cell surface proteins mediate the interaction between pathogen and host and are often the first to be characterised from plant pathogen genomes. Genome sequences enable in silico predictions of secreted virulence proteins (effectors), bypassing traditional enzyme assays or chromatography/spectrometry techniques that were ineffective at detecting less abundant effectors. For the corn smut fungus, Mycosarcosoma maydis (syn. Ustilago maydis), previous experimental studies were not able to identify the virulence factors that were eventually highlighted by interrogating the genome sequence (Kamper). The proteins identified in silico could then be used to experimentally determine the function of specific effectors in the infection process (Liu). This genome-based effector identification and subsequent phenotype determination has contributed significantly to the online Pathogen Host Interactions Database (Urban). Similarly, genomic prediction of fungal product biosynthesis genes have been well established (Keller) and gene deletion systems could subsequently be used to determine the phenotypes that they confer (Lee). Gene inventories of plant pathogens have also revealed proteins not previously known to be involved in pathogenicity, for example the high diversity of membrane transporters in the F. oxysporum and Pyrenochaeta lycopersici genomes strongly implicates them in the pathogenicity of these fungi (Aragona).Intuitively, cell wall degrading enzymes can be expected to be important in plant pathogenesis and the presence of these enzymes in plant pathogens was well established before the genomic era (Jones, Schneider & Collmer 2010). Genome sequences have confirmed that most plant pathogens encode an array of cell wall degrading enzymes, specifically pectinolytic enzymes in dicot pathogens (Klosterman, Olson). The diversity of cell wall degrading enzymes in a genome appears to increase with host range, as exemplified by the massive number of carbohydrate degrading enzymes in Macrophomina phaseolina that infects over 500 plant species (Islam). Exceptions to this perceived norm typically occur in specialised modes of pathogenesis. For example, cell wall degrading enzymes are absent from the genome of the anther smut fungus Microbotryum lychnidis-dioicae (Perlin). Rather than attacking plant cells, this pathogen has an array of enzymes to influence host development, enabling fungal spores to be substituted for pollen. In contrast, gene inventories suggest that necrotrophic pathogens induce apoptosis in host cells rather than breaking down their cell walls (McDonald). Metabolism-related enzymes in fungal genomes, therefore, have great potential to predict infection strategies and lifestyle.Beyond the analysis of single genome sequences, comparing the genomes of ecologically different strains and species has substantial value. For example, analysis of the Rhizoctonia solani AG2-2IIIB genome would have revealed only an abundance of cell wall degrading enzymes. However, comparisons with less aggressive R. solani strains revealed that its virulence can be linked to a significant expansion of polysaccharide lyase enzymes (Wibberg). Similarly, comparisons of resistant and non-resistant Penicillium digitatum strains has enabled identification of mutations conferring tolerance to antifungal compounds (Marcet-Houben).Comparisons between the genomes of 18 dothideo-mycete species has suggested that the number of effectors encoded by these fungi are linked to the pathogenic lifestyle (Ohm). The greatest number of effectors was identified from necrotrophic pathogens, whereas hemibiotrophs have apparently reduced their effector arsenal to evade plant defences before they switch to necrotrophy (Ohm). Furthermore, multiple genome comparisons have been used to highlight specific genes under diversifying selection, revealing that evolutionary pressure on plant pathogen effector proteins drive their adaptation (Schirawski, Stukenbrock). Comparison between plant and fungal genomes have also become essential tools to tease apart pathogen and plant RNA sequences when analysing in planta transcript data (McDonald). The value of multiple genome comparisons has prompted projects such as the Fungal Genome Initiative and the 1KFP to focus not on sequencing single species, but groups of species useful in a comparative context (Grigoriev, The Fungal Genome Initiative Steering Committee 2004).The evolution of different fungal lifestyles is a fascinating topic considered by many comparative genomics studies. Some plant pathogenic fungi with different lifestyles have surprisingly similar gene contents (De Wit), yet unique genes mediate their host interactions. The large proportion of unique secreted effector proteins and host-specific hydrolytic enzymes in plant pathogenic fungi implies that host association drives their adaptation and, therefore, evolution (De Wit, Duplessis, O’Connell, Spanu). The effect of host association is further emphasised by the diversification of both effectors and hydrolytic enzymes in broad host range pathogens such as Colletotrichum higginsianum (O’Connell). In contrast, host association cannot explain the loss of primary metabolism genes that led to obligate biotrophy in powdery mildew fungi (Spanu). Similarly, selective pressures that mediate the evolution of a hemibiotrophic strategy, where the pathogen transitions between a biotrophic and necrotrophic lifestyle, are poorly understood.
Genome evolution of plant pathogens
The arms race between pathogen and host (Stahl & Bishop 2000) makes pathogen adaptability, or evolu-tionary potential, particularly interesting (McDonald & Linde 2002). Reproduction and gene diversity are two of the factors that influence evolutionary potential (McDonald & Linde 2002) and these can be estimated from genome sequences. For example, an analysis of the mating type genes that govern sexual reproduction can provide insights into the mating strategy of a fungus. Heterothallic ascomycete fungi are identified by the occurrence of a single mating type in a genome (Kronstad & Staben 1997), whereas homothallic fungi contain both mating types, either in the same genome or in a dikaryotic cellular state (Wilson). Many fungi propagate only vegetatively or sexual reproduction is difficult to observe. In such cases, genomic analyses have been able to reveal the presence of mating type genes (e.g. Bihon, Marcet-Houben), suggesting that these species could have a cryptic sexual cycle (Bihon). The mating type sequence information can subsequently be used to determine the distribution of different mating types in a population (Aylward, Haasbroek). In contrast, genomes can also reveal the importance of mitotic recombination for generating new allelic combinations. For example, a whole genome survey concluded that mating type genes are completely absent from the tomato pathogen P. lycopersici and that it has an expansion of gene modules associated with heterokaryon incompatibility (Aragona).Adaptability to a changing environment or a resistant host can also be mediated by genome plasticity. Transposons are repetitive elements in DNA known to contribute to genome plasticity and evolution (Wöstemeyer & Kreibich 2002). The expansion of repeats in many plant pathogen genomes points to their role in diversification and adaptation (Raffaele & Kamoun 2012, Spanu, Thon) and has been directly implicated in the pathogenicity of the wheat necrotroph Pyrenophora tritici-repentis (Manning). Surveys of transposons across plant pathogen genomes have revealed differences in their number and activity between the essential core and dispensable supernumerary chromosomes (Ohm, Vanheule). In Fusarium poae, repeat expansion in the core chromosomes is contained, while the non-essential supernumerary chromosomes have many active transposons that invade the core chromosomes (Vanheule). The supernumerary chromosomes also provide opportunity for duplication and diversification of core genes, thereby facilitating adaptation. An example of such diversification and adaptation in the post-harvest spoilage fungus Penicillium digitatum is the association of DNA transposons and ABC transporters in drug resistant strains (Sun ).Horizontal gene transfer (HGT) may add novel ecological capabilities to the genomes of recipient species. Although not historically considered relevant to eukaryotic evolution, genome level investigations have revealed multiple HGT events in fungi, often from other kingdoms (e.g. Marcet-Houben & Gabaldón 2010, Sun ). Such phylogenomic studies have reported, amongst others, horizontal acquisition of genes that mediate pathogenicity (Friesen , Kroken , Slot & Rokas 2011, Thynne ), tolerance to host defences (Marcet-Houben & Gabaldón 2010, Sun ), and nutrient uptake and metabolism (Soanes & Richards 2014, Sun ). Moreover, a comparative genomics study found evidence of HGT at chromosome level in F. oxysporum f. sp. lycopersici, as entire pathogenicity-related chromosomes could be transferred between strains (Ma). Acquiring new ecological capabilities through HGT has previously played a causal role in the emergence of new pathogens and will likely do so again in future (Friesen , Soanes & Richards 2014, Thynne ).
Resources for genetic tools
Other than facilitating whole-genome related studies, genome sequences have become ideal resources for mining genetic tools. Previously, species-specific population genetic tools such as microsatellites had to be developed painstakingly by cloning and genome walking (Barnes, Burgess). Now, any genome sequence enables rapid identification of such genetic markers (e.g. Haasbroek). This holds true for diagnostic markers: genome regions that unambiguously and rapidly identify a pathogen and/or differentiate between pathogens can be designed by inspecting whole genome sequences. Although development of such markers in fungi is lagging behind viral and bacterial pathogens, some examples have recently become available. A pathotype specific marker has been developed from the genome of M. oryzae f.sp. triticum (Pieck) and comparative genomics has detected diagnostic regions in two Calonectria species (Malapi-Wight) and in Pseudoperonospora cubensis (Withers). Continued application of fungal genomes to generate identification tools is bound to increase the efficiency of quarantine procedures (McTaggart).
CHALLENGES
Although the availability of fungal genomes has dramatically increased our knowledge and understan-ding of infection processes and genome evolution, there remains much to learn. For example, the regulatory elements in most genomes remain poorly annotated and require complex experimental methodologies for accurate identification (e.g. Shen). In a recent review, Schatz (2015) commented that sequencing human genomes has been one of the greatest accomplishments of the past two decades but “one of the greatest pursuits for the next twenty years will be trying to understand what it all means”. The same can be said for fungal genomes. The information that can be gleaned from a genome sequence is bound to increase as our understanding of these sequences grows.Genome sequences should not be seen to provide “silver bullets”, although they are often sold this way. They provide the blueprint of potential cellular activities, but are not sufficient to unravel the complexity of pathogen-host interactions. For example, in Fusarium oxysporum, the cell wall degrading enzymes secreted during infection of tomato displayed a clear succession (Jones), an ecologically relevant process that could not be deduced from a gene inventory. In combination with transcriptome data, however, genomic data has revealed how pathogens tolerate host defences (DiGuistini) and how hosts can resist pathogen infection (Zhu). Experimental work, both in vitro and in plantae, will remain essential components in studying fungal plant pathogens.The end goal of studying any host-pathogen relationship is clearly to inform disease management and control. Thus far, identifying specific molecular targets has had little impact on developing new antifungal inhibitors (Odds 2005) and integrative management strategies must, therefore, be a priority. Ultimately, the elucidated effector proteins, host targets, and the overall insights gained into the biology of pathogens must inform disease management strategies (Maloy 2005). It is also crucial that they inform risk assessment protocols governing biosecurity (McTaggart). It is, therefore, essential that the ecological significance of genome patterns is studied to ensure that this knowledge can be extrapolated to emerging pathogen threats.As revealed by comparative genomics, deciphering plant pathogen evolution is in many cases dependent on being able to do comparisons with species having other lifestyles. A large scale example of this is the revised classification of species previously known as Zygomycetes (Spatafora); an endeavour possible because of the availability of multiple genome sequences for this group. In this regard, filling in the gaps in the list of sequenced species is crucial to our understanding of relationships and pathogenesis. The challenge is, therefore, to continue sequencing apparently uninteresting or unimportant taxonomic groups along with the economically important in order to ultimately gain a holistic view.
CONCLUSIONS
The activities of independent research groups and several fungal sequencing initiatives (Fungal Research Community 2002, Grigoriev, Martin, Spatafora 2011, The Fungal Genome Initiative Steering Commitee 2003), have resulted in the number of publically available fungal genomes growing exponentially since 1996 when the first genome was sequenced (Goffeau). The taxonomic distribution of sequenced fungal genomes is currently roughly congruent with the number of species known from each phylum and subphylum. This is an important and impressive achievement in the goal of sampling biodiversity and representing the phylogenetic groups of the fungal kingdom (Fungal Research Community 2002, The Fungal Genome Initiative Steering Commitee 2003). Many of the genomes have been sequenced to sample environmental and ecological diversity. However, investment continues to be primarily focused on projects that have direct human importance. The emphasis on genomes of plant pathogenic fungi has specifically increased subsequent to the Buckley (2008) overview of sequenced fungal species.The genomes of more than 1 000 fungal species are already publically available and this number is growing steadily. Fungal genomics has enabled rapid characterization of plant pathogen genomes and revealed features that allow better understanding of the biology of these species. It has also made it possible to rapidly develop tools to study pathogen biology and genetics. In a field where delayed action has profound consequences for livelihoods and food security, genome sequences provide us with essential tools to prepare for the emergence of new plant pathogens and future disease outbreaks. In this regard, the medical example provided by Bill Gates (Gates 2015) that the application of available technologies could significantly have reduced the impact of the recent Ebola epidemic also holds for plant pathology. Particularly in the era of genomics, we have significant tools to deal with the plant disease arms race and we must apply them more actively and aggressively.
Authors: T B K Reddy; Alex D Thomas; Dimitri Stamatis; Jon Bertsch; Michelle Isbandi; Jakob Jansson; Jyothi Mallajosyula; Ioanna Pagani; Elizabeth A Lobos; Nikos C Kyrpides Journal: Nucleic Acids Res Date: 2014-10-27 Impact factor: 16.971
Authors: A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver Journal: Science Date: 1996-10-25 Impact factor: 47.728
Authors: Richard J O'Connell; Michael R Thon; Stéphane Hacquard; Stefan G Amyotte; Jochen Kleemann; Maria F Torres; Ulrike Damm; Ester A Buiate; Lynn Epstein; Noam Alkan; Janine Altmüller; Lucia Alvarado-Balderrama; Christopher A Bauser; Christian Becker; Bruce W Birren; Zehua Chen; Jaeyoung Choi; Jo Anne Crouch; Jonathan P Duvick; Mark A Farman; Pamela Gan; David Heiman; Bernard Henrissat; Richard J Howard; Mehdi Kabbage; Christian Koch; Barbara Kracher; Yasuyuki Kubo; Audrey D Law; Marc-Henri Lebrun; Yong-Hwan Lee; Itay Miyara; Neil Moore; Ulla Neumann; Karl Nordström; Daniel G Panaccione; Ralph Panstruga; Michael Place; Robert H Proctor; Dov Prusky; Gabriel Rech; Richard Reinhardt; Jeffrey A Rollins; Steve Rounsley; Christopher L Schardl; David C Schwartz; Narmada Shenoy; Ken Shirasu; Usha R Sikhakolli; Kurt Stüber; Serenella A Sukno; James A Sweigard; Yoshitaka Takano; Hiroyuki Takahara; Frances Trail; H Charlotte van der Does; Lars M Voll; Isa Will; Sarah Young; Qiandong Zeng; Jingze Zhang; Shiguo Zhou; Martin B Dickman; Paul Schulze-Lefert; Emiel Ver Loren van Themaat; Li-Jun Ma; Lisa J Vaillancourt Journal: Nat Genet Date: 2012-08-12 Impact factor: 38.330
Authors: Michael H Perlin; Joelle Amselem; Eric Fontanillas; Su San Toh; Zehua Chen; Jonathan Goldberg; Sebastien Duplessis; Bernard Henrissat; Sarah Young; Qiandong Zeng; Gabriela Aguileta; Elsa Petit; Helene Badouin; Jared Andrews; Dominique Razeeq; Toni Gabaldón; Hadi Quesneville; Tatiana Giraud; Michael E Hood; David J Schultz; Christina A Cuomo Journal: BMC Genomics Date: 2015-06-16 Impact factor: 3.969
Authors: Maria Aragona; Andrea Minio; Alberto Ferrarini; Maria Teresa Valente; Paolo Bagnaresi; Luigi Orrù; Paola Tononi; Gianpiero Zamperin; Alessandro Infantino; Giampiero Valè; Luigi Cattivelli; Massimo Delledonne Journal: BMC Genomics Date: 2014-04-27 Impact factor: 3.969
Authors: Adriaan Vanheule; Kris Audenaert; Sven Warris; Henri van de Geest; Elio Schijlen; Monica Höfte; Sarah De Saeger; Geert Haesaert; Cees Waalwijk; Theo van der Lee Journal: BMC Genomics Date: 2016-08-23 Impact factor: 3.969
Authors: Renate Heinzelmann; Daniel Rigling; György Sipos; Martin Münsterkötter; Daniel Croll Journal: Heredity (Edinb) Date: 2020-03-13 Impact factor: 3.821
Authors: Ruvini V Lelwala; Pasi K Korhonen; Neil D Young; Jason B Scott; Peter K Ades; Robin B Gasser; Paul W J Taylor Journal: PLoS One Date: 2019-05-31 Impact factor: 3.240