Literature DB >> 27189990

Genus-Wide Comparative Genome Analyses of Colletotrichum Species Reveal Specific Gene Family Losses and Gains during Adaptation to Specific Infection Lifestyles.

Pamela Gan1, Mari Narusaka2, Naoyoshi Kumakura1, Ayako Tsushima3, Yoshitaka Takano4, Yoshihiro Narusaka2, Ken Shirasu5.   

Abstract

Members from Colletotrichum genus adopt a diverse range of lifestyles during infection of plants and represent a group of agriculturally devastating pathogens. In this study, we present the draft genome of Colletotrichum incanum from the spaethianum clade of Colletotrichum and the comparative analyses with five other Colletotrichum species from distinct lineages. We show that the C. incanum strain, originally isolated from Japanese daikon radish, is able to infect both eudicot plants, such as certain ecotypes of the eudicot Arabidopsis, and monocot plants, such as lily. Being closely related to Colletotrichum species both in the graminicola clade, whose members are restricted strictly to monocot hosts, and to the destructivum clade, whose members are mostly associated with dicot infections, C. incanum provides an interesting model system for comparative genomics to study how fungal pathogens adapt to monocot and dicot hosts. Genus-wide comparative genome analyses reveal that Colletotrichum species have tailored profiles of their carbohydrate-degrading enzymes according to their infection lifestyles. In addition, we show evidence that positive selection acting on secreted and nuclear localized proteins that are highly conserved may be important in adaptation to specific hosts or ecological niches.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Colletotrichum; comparative genomics; evolutionary biology; genome assembly; hemibiotrophic fungi; plant pathogen

Mesh:

Year:  2016        PMID: 27189990      PMCID: PMC4898803          DOI: 10.1093/gbe/evw089

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

The genus Colletotrichum is of considerable interest in studies of plant–pathogen interactions due to its diversity as well as the commercial impact of its various members (Crouch et al. 2014). This has led to it being named one of the top ten most important fungi in a recent survey of plant pathologists in terms of scientific importance (Dean et al. 2012). Within the genus, considerable variation exists, with many known hosts, including important crop species, such as maize, which is infected by Colletotrichum graminicola (Crouch and Beirn 2009), fruits like strawberries, citrus fruits, and bananas (Cannon et al. 2012), as well as model plants, such as Arabidopsis, which is a host of Colletotrichum higginsianum (Narusaka et al. 2004; O’Connell et al. 2004). In addition, different members adopt a variety of infection lifestyles, even though most members are identified as hemibiotrophic plant pathogens, some have been categorized as endophytes (Gangadevi and Muthumary 2008; Sharma et al. 2011; Mejía et al. 2014). Fungi within this genus classified as hemibiotrophs undergo different phases of infection which have previously been characterized by the expression of distinct classes of genes at different stages (Kleemann et al. 2012; O’Connell et al. 2012; Gan et al. 2013). This lifestyle includes a biotrophic phase of infection in living plant cells, followed by a necrotrophic phase, in which there is massive cell death of host cells, similar to other commercially important pathogens, such as Magnaporthe oryzae. In the past few years, resources to study the molecular mechanisms underlying the diversity of different lifestyles adopted by different members of this genus have taken off with the genome sequences of different strains being sequenced representing distinct lineages within the genus (O’Connell et al. 2012; Alkan et al. 2013; Gan et al. 2013; Baroncelli, Sanz-Martín, et al. 2014; Baroncelli, Sreenivasaprasad, et al. 2014). Phylogenetic studies have shown that the genus can be divided into distinct lineages (fig. 1; Cannon et al. 2012), with specific characteristics. Among the different clades, members of the graminicola lineage, including C. graminicola, stand out as being exclusively associated with graminaceous monocots, whereas members of the other sequenced species are associated with infection of dicotyledonous plants (Crouch et al. 2014). The mechanisms underlying the adaptation of the graminicola lineage as a monocot-specific pathogen is still largely unknown.
F

Phylogenetic tree showing relationship between Colletotrichum incanum and other known Colletotrichum species based on the combined alignment of chitin synthase, actin, internal transcribed spacer (ITS), and tubulin sequences. Arrowhead indicates sequenced C. incanum strain. Values at the branch points represent bootstrap support values out of 1,000 replicates.

Phylogenetic tree showing relationship between Colletotrichum incanum and other known Colletotrichum species based on the combined alignment of chitin synthase, actin, internal transcribed spacer (ITS), and tubulin sequences. Arrowhead indicates sequenced C. incanum strain. Values at the branch points represent bootstrap support values out of 1,000 replicates. Here we present the draft genome of Colletotrichum incanum, a member of the spaethianum clade, a group with no previously sequenced member within the Colletotrichum genus. Colletotrichum incanum belongs to a distinct group that is closely related to the graminicola and destructivum clades. While graminicola clade members are graminicolous, destructivum clade members are mostly associated with eudicots (Crouch et al. 2014), although at least one recent study has reported the isolation of Colletotrichum destructivum as an asymptomatic endophyte on orchid (Tao et al. 2013). In contrast, several members of the spaethianum clade have been reported to infect both dicot and nongraminaceous monocot plants (Crouch et al. 2014). In this study, we sequenced a strain of C. incanum and show that it is able to infect both monocot and dicot plants, providing a new and unique model for studying host specificities in plant–fungal interactions. We performed genus-wide analyses and found potential pathogenic lifestyle-specific expansions and contractions of gene families, particularly in carbohydrate-degrading enzymes. Interestingly, secreted proteins of members from the gloeosporioides and acutatum clades, important postharvest pathogens with many phenotypic similarities, were found to be more conserved to one another despite their phylogenetic separation. Furthermore, analysis of positively selected sequences conserved throughout the genus indicated that genes encoding proteins which are targeted for secretion or to the nucleus may undergo higher levels of diversifying selection in a lineage-specific manner compared with those that are targeted to other localizations.

Materials and Methods

Fungal Culture and Infection Conditions

All fungal cultures were maintained on potato dextrose agar (Becton, Dickinson and Company, Franklin Lakes, NJ) at 24 °C under 12 h black light fluorescent bulb (BLB) light/12 h dark conditions. Conidia were harvested after 6 days and sprayed onto plants at a concentration of 1 × 106 conidia/ml. Arabidopsis plants were grown on mixed Supermix A (Sakata Seed Corp., Yokohama, Japan) and vermiculite soil at 21 °C under short day conditions (8 h light/16 h dark) in 70% relative humidity and transplanted 8 days-post-germination (DPG). Col-0 plants with the eds1-2 null mutation, Col-0 eds1-2 (Falk et al. 1999; Bartsch et al. 2006) were also grown as described. Plants at 30–35 DPG were sprayed with Colletotrichum conidia and incubated in 100% relative humidity. Hyphae were observed by a confocal microscope TCP SP5 (Leica microsystems, Wetzlar, Germany). For infections on maize and lily, leaves were inoculated with 5 µl droplets of conidia at a concentration of 5 × 105 conidia/ml. Maize was transplanted to vermiculite soil at 3 DPG and inoculated at 10 DPG and grown at 24 °C under 12 h light/12 h dark. After incubation for 3 days at 4 °C in the dark, Brachypodium distachyon Bd3-1 seeds were transferred to pots of a 1:1 mix of perlite:vermiculite and maintained at 22 °C under long day conditions (16 h light/8 h dark), and leaves were inoculated at 4 weeks after germination. Lily plants were grown at 25 °C under long day conditions. Mature leaves were detached from “casa blanca” lily plants after budding and maintained under 100% humidity at 25 °C under long day conditions during infection. Transformation of C. incanum for expression of green fluorescent protein (GFP) under control of the translation elongation factor (TEF) promoter and scd1 terminator was carried out on protoplasts using the polyethylene glycol transformation protocol as described below (Kubo et al. 1991). For rice infections, detached leaves from 5-week-old plants grown in a rhizotron as described (Mutuku et al. 2015). Rice leaves were inoculated with 5 µl droplets of conidia at a concentration of 1 × 106 conidia/ml and then maintained in 16 h light (28 °C)/8 h dark (23 °C).

Genome Sequencing and Assembly

All cultures were maintained on potato dextrose agar plates or broth at 24 °C. Genomic DNA was extracted using CTAB and QIAgen genomic tips as described for the 1000 fungal genomes project. Paired-end 100 bp sequencing was performed on 150 and 500 bp insert libraries prepared using the Illumina TruSeq PCR-free DNA sample prep kit (Illumina) with an Illumina HiSeq2000 (RIKEN Omics Science Center, Yokohama, Japan). Jellyfish was utilized to calculate k-mer multiplicity for genome size estimation. Reads were trimmed using trimmomatic with the options “LEADING:15 TRAILING:15 MINLEN:36” (Bolger et al. 2014) and assembled using the 127mer version of SOAPdenovo (version 1.05) (Luo et al. 2012) with map_len = 32 and k-mer values of 69. The sequences were deposited at DDBJ/EMBL/GenBank under the accession JTLR00000000. The version described in this article is version JTLR01000000. Files containing all predicted proteins, transcripts, and annotations are available for download at https://sites.google.com/site/colletotrichumgenome/ (last accessed May 2, 2016). The Core Eukaryotic Genes Mapping Approach (CEGMA) pipeline to identify a conserved set of eukaryotic genes was utilized to assess coverage of gene coding regions in the assembly using default settings (Parra et al. 2007).

Gene Prediction and Annotations

The MAKER pipeline (Cantarel et al. 2008) was utilized for gene prediction to combine annotations from the ab initio gene predictors Augustus (Stanke et al. 2006), GeneMark-ES (Ter-Hovhannisyan et al. 2008), and SNAP (available from http://korflab.ucdavis.edu/software.html, last accessed May 2, 2016) using additional evidence from proteins from C. higginsianum. GeneMark-ES was automatically trained on the C. incanum 9503 genome, while Augustus was trained on Scipio (Keller et al. 2008) gene structures derived from the CEGMA gene set from C. incanum using the optimize_augustus.pl script included with Augustus. In addition, SNAP was trained using MAKER combined gene evidence derived using SNAP and alignments from C. higginsianum proteins. Proteins from C. graminicola were also mapped to the genome using exonerate within the MAKER pipeline and used to help improve annotations when manually assessing the gene models. Gene ontology (GO) terms were assigned to predicted proteins by InterProScan5 (Jones et al. 2014), which matches sequences to InterPro protein signatures, and enrichment of GO terms associated with specific gene sets was assessed using hypergeometric tests with the GOStats (Falcon and Gentleman 2007) package in R. For analyses of GO terms associated with orthoMCL clusters, GO terms were assigned to specific clusters only when present in at last half of the Colletotrichum sequences within each cluster. Carbohydrate active enzymes (CAzymes) were classified using dbCAN release 3 (Yin et al. 2012). In order to identify lineage-specific expansions, sequences that were identified as GH43 enzymes were aligned in CLCGenomicsWorkbench8 (CLC Bio) and alignments were used to draw a phylogenetic tree using FastTree (Price et al. 2010) using the JTT + CAT model. The output tree was displayed using the iTol web-based tool (Letunic and Bork 2011). Transporters were classified by performing BLASTP with an E-value cutoff of 1 × 10−5 using all Colletotrichum sequences with at least one predicted transmembrane domain as predicted by TMHMM (Krogh et al. 2001) against all Transporter Classification Database (TCDB) sequences (Saier et al. 2014). Proteases were classified at the MEROPS database (Rawlings et al. 2012) using the batch BLAST search service available on the web. Only complete hits with all active sites conserved that were predicted to be secreted were included in the counts. Secondary metabolite clusters were classified using version 3 of the antismash program setting a threshold of a minimum of five genes to form a cluster (Blin et al. 2013). The localizations of secreted proteins were predicted using SignalP 4.1 (Petersen et al. 2011) to predict proteins with signal peptides, which target proteins to the secretory pathway, removing proteins that were predicted to have transmembrane domains according to TMHMM (Krogh et al. 2001) or glycosylphosphatidylinositol (GPI) anchors according to fungal BigPI (Eisenhaber et al. 2004). Proteins were classified as being targeted to membranes if they were predicted to have signal peptides and transmembrane domains according to SignalP (Petersen et al. 2011) and TMHMM (Krogh et al. 2001) analysis. In addition, NLStradamus (Ba et al. 2009) was utilized to predict proteins with nuclear localization signals. In each case, the default setting of each program was utilized. The cysteine contents of predicted protein sequences was assessed using the pepstats package from the EMBOSS 6.4.0.0 suite (Rice et al. 2000). Repeats were identified using Repeatmasker (Smit et al. 1996) using a custom repeat library generated by Repeatscout (Price et al. 2005) after filtering out identified genomic repeats of less than 50 bp and occurring less than ten times in the C. incanum genome assembly. Hierarchical clustering of species according to their secreted protein profiles in supplementary figure S7, Supplementary Material online, was performed and visualized using the pheatmap package (Kolde 2015) within R (R Core Team 2012).

Comparative Genomics

From the BROAD Institute, Neurospora crassa or74a version 12, M. oryzae 70-15 version 6, Fusarium oxysporum f. sp. Lycopersici 4287 version 2, Fusarium graminearum PH-1 version 3, Aspergillus nidulans fgsg a4 1, C. graminicola, C. higginsianum, Sclerotinia sclerotiorum version 2, Botrytis cinerea B04.10 from the JGI Institute, protein sequences from Nectria haematococca version 2 (Coleman et al. 2009), Trichoderma virens Gv29-8 version 2 (Kubicek et al. 2011), Metarhizium robertsii ARSEF 23 (Gao et al. 2011), Verticillium dahliae version 1 (Klosterman et al. 2011), Chaetomium globosum version 1 (Berka et al. 2011), Podospora anserina S mat+ (EspagneĪ et al. 2008), Eutypa lata UCREL1 (Blanco-Ulate et al. 2013), Taphrina deformans (Cissé et al. 2013), Saccharomyces cerevisiae M3707 (Brown et al. 2013), Leptosphaeria maculans version 1 (Rouxel et al. 2011), and sequences from Colletotrichum fructicola Nara gc5 (ANPB00000000.1), Colletotrichum orbiculare 104-T (AMCV00000000.1), Colletotrichum gloeosporioides Cg-14 (Alkan et al. 2013; AMYD01000001.1), Colletotrichum sublineola (JMSE00000000.1), and Colletotrichum fioriniae (JARH00000000.1) were utilized for various analyses. OrthoMCL with an inflation value of 1.5 was performed to identify orthogroups between 22 different fungi (Li et al. 2003) with a blastp E-value cutoff of 1 × 10−5. For analysis of secreted protein families, Colletotrichum proteins grouped by orthoMCL were analyzed using numbers for secreted Colletotrichum proteins only. DAGchainer was utilized to identify syntenic regions with minimum chain length equal to five colinear genes that were a maximum of ten genes apart using BLASTP results as an input to identify matching protein-encoding genes (Haas et al. 2004). Similarly, sequences from 874 single gene orthogroups identified by orthoMCL analysis of the 22 fungi were aligned using MAFFT and trimmed using trimAl (Katoh et al. 2002; Capella-Gutiérrez et al. 2009). Trimmed alignments concatenated in the commercially available program CLCGenomicsWorkbench8 (CLC bio) resulted in a data set of 14909 positions that was partitioned according to gene and was then utilized to draw a maximum-likelihood species phylogeny with RAxML using WAG as a model for substitution and the autoMRE setting to determine the appropriate number of bootstrap samples and specifying the basal ascomycete species Ta. deformans as the outgroup. The best tree (supplementary fig. S8, Supplementary Material online) was then converted into an ultrametric chronogram using the r8s (Sanderson 2003) program applying the nonparametric rate smoothing approach. Divergence times were then estimated using previously derived estimates from Beimforde et al. (2014) of 443–695 Myr for the divergence between Pezizomycotina–Saccharomycotina, 400–583 Myr for the Pezizomycotina crown group, 267–430 Myr for Leotiomycetes–Sordariomycetes, 207–339 Myr for divergence of Sordariomycetes, 487–773 Myr for Ascomycete crown (Beimforde et al. 2014), and on the previously estimated divergence time of 47 Myr between C. graminicola and C. higginsianum (O’Connell et al. 2012). The branch including members from Colletotrichum and V. dahliae was used as the input for the CAFE program version 3 (De Bie et al. 2006) to identify OrthoMCL-defined gene families experiencing gain/loss with P ≤ 0.01 using the filter function to exclude families that are inferred to have no genes at the root of the tree. For this analysis, only families with at least one member present in the analyzed taxa were used. The phylogenetic tree to classify C. incanum (fig. 1) was drawn using previously identified sequences from other Colletotrichum species (Cannon et al. 2012), C. incanum (Yang et al. 2014), and Monilochaetes infuscans as an outgroup (O’Connell et al. 2012). In brief, sequences from actin (ACT), tubulin-2 (TUB2), internal transcribed spacer (ITS), and chitin synthase I (CHS-1) were aligned using MAFFT and trimmed with trimAL with the automated1 settings (Katoh et al. 2002; Capella-Gutiérrez et al. 2009). The concatenated trimmed alignments were then utilized to estimate the maximum-likelihood species phylogeny with RAxML version 8.2.4 (Stamatakis 2014) using the GTRGAMMA model with 1,000 bootstrap replicates.

Positive Selection

To test for positive selection, Colletotrichum sequences conserved in C. incanum, C. graminicola, C. higginsianum, C. fructicola, C. fioriniae, and C. orbiculare were aligned using PRANK (Löytynoja and Goldman 2005). Protein alignments were trimmed using trimAl (Capella-Gutiérrez et al. 2009) to remove regions where more than 70% of the sequences were gapped. Trees were generated based on the trimmed alignments using the PhyML program (Guindon et al. 2010). The ETE toolkit (Huerta-Cepas et al. 2010) was utilized to automatically label branches being tested under the branch-site model of CodeML (Yang 2007). Likelihood-ratio tests were carried out on PRANK (http://wasabiapp.org/software/prank/, last accessed May 2, 2016) nucleotide alignments using the branch-site model of CodeML using the “cleandata” option to remove any sites with gaps in at least one sequence (Yang 2007). To test for branch-site diversifying selection, likelihood-ratio tests comparing the null hypothesis, where dN/dS was fixed at 1 across all branches and sites (model A1), and the alternative hypothesis, where dN/dS was allowed to vary across the branches and sites (model A), were carried out. P-values corresponding to the chi-square values were obtained and False Discovery Rate (FDR) estimates were computed using the Benjamini–Hochberg procedure. Positive selection was considered significant when FDR ≤ 0.05. One-sided Fisher’s exact tests were carried out in R (R Core Team 2012) to test for enrichment of positively selected genes according to predicted localizations.

Results

Colletotrichum incanum Is Able to Infect Arabidopsis but Not Maize

Colletotrichum incanum (MAFF238712, strain 9503) was originally isolated from Japanese daikon radish and was previously identified as a strain of Colletotrichum dematium from the dematium clade (Sato et al. 2005). However, molecular phylogenetic analysis with sequences from CHS, ACT, ITS, and TUB genes indicated that it is a member of the spaethianum clade (fig. 1) and that it is in fact a strain of the C. incanum species, a recently described species that has been shown to infect soybean (Yang et al. 2014). Given that the strain was isolated from Japanese daikon radish, a member of the Brassicaceae family, we thought that C. incanum may also be able to infect the model plant, Arabidopsis. To this end, 30 accessions of Arabidopsis were tested as potential hosts for infection, with results showing distinct virulence phenotypes in different ecotype accessions (fig. 2 and supplementary table S1, Supplementary Material online). Host susceptibility profiles differed from that of the known Colletotrichum pathogen of Arabidopsis, C. higginsianum (Narusaka et al. 2004, 2009) which belongs to the destructivum clade, indicating that there are fungal strain-specific factors that determine host compatibility between the different species rather than a general resistance against fungal infection. Intriguingly, nonsusceptible Arabidopsis plants lacking a functional copy of the eds1 gene (Bartsch et al. 2006), which is required for the function of many plant resistance proteins, still did not show any increase in susceptibility to C. incanum infection (supplementary fig. S1, Supplementary Material online).
F

Infection phenotypes of Colletotrichum incanum on various host plants. The ability of C. incanum to infect (A, B) Arabidopsis, (C) lily, and (D) maize. (A) Trypan blue staining of Arabidopsis leaves infected by C. incanum at 6 days post infection (dpi). The growth of intracellular hyphae during infection of C. incanum transformed with GFP during leaf infection of (B) the susceptible Arabidopsis accession Bay-0 at 2 dpi and (C) lily at 5 dpi. (D) Infection of maize leaves by (I) C. incanum and (II) Colletotrichum graminicola at 8 dpi. Necrotic lesions were observed on drop-inoculated C. graminicola-infected but not C. incanum-infected leaves. White arrowheads: conidia showing site of initial infection; unfilled arrowheads: host cell wall.

Infection phenotypes of Colletotrichum incanum on various host plants. The ability of C. incanum to infect (A, B) Arabidopsis, (C) lily, and (D) maize. (A) Trypan blue staining of Arabidopsis leaves infected by C. incanum at 6 days post infection (dpi). The growth of intracellular hyphae during infection of C. incanum transformed with GFP during leaf infection of (B) the susceptible Arabidopsis accession Bay-0 at 2 dpi and (C) lily at 5 dpi. (D) Infection of maize leaves by (I) C. incanum and (II) Colletotrichum graminicola at 8 dpi. Necrotic lesions were observed on drop-inoculated C. graminicola-infected but not C. incanum-infected leaves. White arrowheads: conidia showing site of initial infection; unfilled arrowheads: host cell wall. In addition, because members of the spaethianum clade are known to associate with both dicot and monocot hosts, specifically lilies, we tested if C. incanum could also infect lily plants. Lily leaves were shown to be compatible to infection allowing for growth of intracellular hyphae and subsequent invasion of neighboring cells from the primary site of infection (fig. 2C). In comparison, at the same time point, C. higginsianum showed the formation of appressoria but no further hyphal growth. It was also tested if C. incanum could infect maize, because it is relatively closely related to C. graminicola. However, C. incanum 9503 was not found to be able to infect maize, Brachypodium, or rice under the conditions tested (fig. 2D and supplementary fig. S2, Supplementary Material online).

Assembly of the Colletotrichum incanum Genome

The genome of C. incanum strain 9503 was sequenced using Illumina HiSeq paired-end reads. After filtering of low quality reads, reads were assembled into 1,036 scaffolds to a final assembly of 53.25 Mb size with an estimated 153× coverage and a scaffold N50 of 292 kb (table 1). The size of the genome was estimated to be 58.92 Mb according to k-mer analysis, indicating that most of the genome is included in the assembly. Furthermore, according to CEGMA analysis, where the presence of a set of conserved eukaryotic genes is assessed, 97.98% full/99.19% partial copies of these highly conserved genes were present in the assembly, indicating that gene-encoding regions are well represented. This level of gene coverage was comparable with previously sequenced Colletotrichum genomes. For example, CEGMA assessed coverage was 91.9% for C. higginsianum, 98.8% for C. graminicola, 97.98% for C. orbiculare, and 96.37% complete in the case of C. fructicola (previously reported as C. gloeosporioides) assemblies, respectively (O’Connell et al. 2012; Gan et al. 2013). A total of 11,852 protein-coding genes were predicted in the C. incanum 9503 genome, which is lower compared with the Arabidopsis-infecting Colletotrichum species, C. higginsianum, which possesses 16,172 protein-coding genes. However, the number of predicted genes in C. incanum is comparable with that of the C. graminicola and C. sublineola genomes (O’Connell et al. 2012; Baroncelli, Sanz-Martín, et al. 2014), which have 12,006 and 12,699 protein-coding genes, respectively. GO terms could be assigned to 6967, representing 58.8% of the predicted coding sequences using Interproscan5. In addition, 8878 (74.9%) genes could be assigned a PFAM domain. Approximately 5.86% of the assembly was indicated to be repeat sequences.
Table 1

Genome Assembly Statistics of

Assembly size53,254,579 bp
Genome estimated size58.92 Mb
Estimated coverage153×
G + C%52.15
Scaffold N50292,512 bp
Contig N50139,052 bp
Number of sequencesa1,036
Max scaffold size1,056,626 bp
Max contig size790,451 bp
Number of genes11,852
Number of secreted proteins1,002
CEGMA coverage97.98% (complete)/99.19% (partial)

Number of sequences greater than 200 bp.

Genome Assembly Statistics of Number of sequences greater than 200 bp.

Gene Family Expansion/Contraction Relative to Other Colletotrichum Genomes

OrthoMCL was used to group proteins from the C. incanum strain with that of proteins from other Colletotrichum species and nonplant pathogenic fungi, S. cerevisiae, N. crassa, P. anserina, and A. nidulans, as well as the more distantly related plant-interacting fungi, M. oryzae, F. graminearum, B. cinerea, E. lata, Ta. deformans, T. virens, V. dahliae, Ch. globosum, Nec. haematococca, and the insect-pathogenic fungi Metarhizium anisopliae. In total, 19,281 orthogroups were identified between the different species. A total of 7,306 of these were found to be conserved in all Colletotrichum species, representing core Colletotrichum genes, with 491 of these including at least one C. incanum predicted secreted protein. According to GO analysis using all C. incanum genes as a reference, noncore genes in C. incanum were enriched in oxidoreductases. In addition, 2,421 gene families were identified, which were specific to members of the Colletotrichum genus. However, only 234 of these Colletotrichum-specific orthogroups were conserved in all members of Colletotrichum. Significantly over-represented GO terms associated with C. incanum genes within these conserved Colletotrichum-specific orthogroups included ion binding, ribonuclease activity, oxidoreductase activity, peptidase and carbohydrate binding, and transport, indicating the function of groups that may have been expanded specifically in the Colletotrichum lineage. The majority of Colletotrichum genes were found to be single-copy genes within individual genomes (supplementary fig. S3, Supplementary Material online). Out of 11,263 C. incanum proteins to be included in the various orthogroups, only 60 proteins, belonging to 20 groups, were predicted to be coded by genes specific to C. incanum, indicating that the majority of genes predicted are conserved in other members within the genus. More genes were found to be shared with C. higginsianum than any other species tested as may be expected from the close evolutionary distance between the two fungi and overlapping host range. Among the C. incanum-specific orthogroups (supplementary table S2, Supplementary Material online) were 5 with homology to transposable elements, 1 with homology to kinase-like proteins, 1 with homology to alcohol dehydrogenase-like domain-containing proteins, 1 with homology to superoxide dismutase, and 11 groups encoding hypothetical proteins. Analysis using the program CAFE (De Bie et al. 2006) indicated that the number of gene families experiencing gene loss is greater in Colletotrichum members than the corresponding numbers of families undergoing expansions at each speciation event with the exception of C. higginsianum where gene family expansion appeared to be more dominant and C. graminicola where there were equal numbers of gene families experiencing gain and loss (fig. 3). A total of 20 gene families were expanded and 316 contracted in the graminicola-clade strains C. sublineola, which infects sorghum, and C. graminicola, which infects maize, relative to their most recent common ancestor (MRCA) with C. incanum (fig. 3). Out of the 336 orthogroups undergoing changes in copy number at P < 0.01, 82 gene families experiencing rapid evolution in the graminicola clade were identified using the Viterbi algorithm in the CAFE program (Han et al. 2013) to assign P-values to the expansions/contractions experienced at each branch and using a cutoff of P < 0.05. GO terms over-represented among proteins in the orthogroups reduced in C. graminicola but present in C. incanum include ATPase activity, hydrolase activity, transporter activity, and chitinase activity (supplementary table S3, Supplementary Material online).
F

Comparison of gene family sizes in Colletotrichum incanum relative to related fungi. Maximum-likelihood tree constructed from 1,697 single-copy gene families. Divergence dates were estimated using the r8s program. +: numbers of gene families estimated to have experienced expansions, -: numbers of gene families estimated to have experienced contractions at each node.

Comparison of gene family sizes in Colletotrichum incanum relative to related fungi. Maximum-likelihood tree constructed from 1,697 single-copy gene families. Divergence dates were estimated using the r8s program. +: numbers of gene families estimated to have experienced expansions, -: numbers of gene families estimated to have experienced contractions at each node. In order to gain insights about the mechanism underlying the gene losses experienced by the graminicola lineage, the genomic context of genes from gene families experiencing significant gene loss or gain in the graminicola lineage was assessed within the C. incanum genome. Thus, the number of genes in C. incanum in regions with synteny to other Colletotrichum genomes was assessed (supplementary table S4, Supplementary Material online). A total of 8,990 (75.9%) and 7,851 (66.2%) genes out of the 11,852 predicted genes in C. incanum were found to be in 297 and 683 regions of conserved gene order of at least 5 genes or more in comparison with C. graminicola and C. sublineola, respectively. Relative to C. graminicola, the largest region with synteny included 151 genes. In contrast, only 1,428 (12.0%) genes were found to be syntenic to genes in the C. higginsianum genome assembly. However, 7,838 (66.1%), 7,096 (59.9%), and 6,514 (55.0%) genes were found in regions with synteny to the more distantly related species, C. fioriniae from the acutatum clade (Baroncelli, Sreenivasaprasad, et al. 2014), C. orbiculare from the orbiculare clade, and C. fructicola from the gloeosporioides clade, indicating that the lack of synteny with C. higginsianum genome is unusual, and that in general the majority of Colletotrichum genes among different members of the genus are in conserved order. This analysis also indicated that C. incanum is likely to have a genome organization that is similar to C. graminicola despite the difference in host range. In order to further characterize the genomic regions associated with genes from gene families that were significantly changing in terms of copy number, it was investigated if these genes were being lost or inserted within regions with synteny to C. graminicola or from nonsyntenic regions, which may represent regions that may be experiencing genomic rearrangements or high rates of mutation. Out of the 214 genes in C. incanum associated with gene families that were found to be rapidly evolving in the graminicola lineage in terms of changes to gene copy number, a total of 20 (9.3%) of the genes were identified at the borders of the regions with synteny to C. graminicola making up 2.6% of the genes that fell into this category. A total of 50 of these genes were found within the regions with synteny to C. graminicola making up a small proportion (0.6%) of genes within these syntenic regions. This meant that the majority (69.6%) of genes associated with rapidly evolving gene families were in the regions that were outside of these syntenic regions, despite the fact that genes outside of these regions represent only 16.8% of predicted protein sequences. In contrast to these rapidly changing gene families, genes encoding secondary metabolite synthesis proteins that are normally found in clusters of conserved gene order (Keller et al. 2005) were analyzed. Analysis of the presence of secondary metabolite backbone synthesis genes in C. incanum indicated the presence of 63 potential clusters, making them similar in number compared with previously sequenced members of Colletotrichum (supplementary table S5, Supplementary Material online). In this case, 41% of genes were found to be conserved in a syntenic region in C. graminicola. Similarly, 37.9% of genes were found to be in some region of synteny with the more distantly related C. orbiculare genome.

Carbohydrate Active Enzymes

CAzymes are among the gene families which are known to be important for virulence in plant pathogens. Graminaceous monocots have lower pectin contents in their primary cell walls compared with dicots and gymnosperms (McNeil et al. 1984). As C. incanum is able to infect monocot (Lilium sp.) plants as well as dicots, the CAzyme complement of C. incanum was investigated. The number of potential plant cell wall hemicellulose and pectin-degrading enzymes in the C. incanum genome was most similar to that of C. higginsianum despite the lower number of genes in the C. incanum genome. In C. incanum, expansions were noted in GH10 (hemicellulose) and PL1, PL3, PL4, PL9, and GH28 pectin-degrading enzyme-encoding genes relative to C. graminicola and C. sublineola (fig. 4). The graminaceous monocot-specific Colletotrichum members showed CAzyme profiles that were distinctly different from all the other Colletotrichum species analyzed. This indicates that gene loss from members of the graminicola clade may have been a consequence of their monocot-specific lifestyle. Interestingly, C. fioriniae, C. gloeosporioides, and C. fructicola were also noted to cluster together with expansions in GH43-encoding genes despite belonging to phylogenetically separate branches of the Colletotrichum lineage with C. fructicola and C. gloeosporioides belonging to the gloeosporioides clade, and C. fioriniae belonging to the acutatum clade (figs. 1 and 3). Importantly, all three species are important postharvest pathogens that exhibit phenotypic similarities, implying the relation of GH43-encoding gene expansions to their postharvest infection strategy. Phylogenetic analysis of all identified Colletotrichum GH43 proteins show lineage-specific expansions of GH43 members within the acutatum and gloeosporioides genomes, with duplications of specific genes within the respective lineages (supplementary fig. S4, Supplementary Material online).
F

Comparative analysis of selected families of Colletotrichum incanum carbohydrate-active enzymes which may be involved during plant infection listed according to common substrates (Zhao et al. 2013).

Comparative analysis of selected families of Colletotrichum incanum carbohydrate-active enzymes which may be involved during plant infection listed according to common substrates (Zhao et al. 2013).

Transporters

Given that GO terms associated with transporters were also enriched among gene families experiencing significant changes in copy number in the graminicola clade relative to other Colletotrichum genomes, gene families encoding transporters identified as experiencing significant gene gain/loss compared with other Colletotrichum species were assessed. In most cases, these gene families were associated with major facilitator superfamily (MFS) type transporters, including those encoding myo-inositol transporters (2.A.1.1.8), general glucose transporters (2.A.1.1.11), and several family members from the anion:cation symporter (ACS) family (2.A.1.14.17, 2.A.1.14.11, and 2.A.1.14.3). Global analysis of all MFS transporters encoded in the genomes showed that indeed the graminicola clade members clustered separately from that of the other Colletotrichum genome with the exception of C. incanum. As mentioned, C. incanum can infect monocot plants as well as dicot plants; thus, the result might suggest a possible link between MFS transporter specification with monocot plant infection. However, even compared with C. incanum, the graminicola clade members showed reduced numbers of myo-inositol transporters (2.A.1.1.8), glucose transporters (2.A.1.1.11 and 2.A.1.1.68), and selected ACS family transporters (2.A.1.14.11 and 2.A.1.14.17) (supplementary fig. S5, Supplementary Material online). Potentially, these could be dispensable for a graminicolous infection lifestyle.

Proteases

The protease profiles of various members of Colletotrichum were analyzed. In contrast to the CAzyme genes, the protease-encoding gene profile of C. incanum showed more similarities to that of graminicola clade members, C. sublineola, and C. graminicola despite the difference in host range between C. incanum and the graminicola clade members. However, as observed for CAzymes, C. fioriniae and C. fructicola, which belong to the acutatum and gloeosporioides clades respectively, clustered together based on their secreted protease profiles rather than together with Colletotrichum species to which they are more closely related (supplementary fig. S6, Supplementary Material online). In both these fungi, expansions were noted among the S10 serine carboxypeptidases, which have broad substrate specificities and are active in acidic environments, in contrast to the other serine proteases that are normally active in neutral/alkaline environments (Laskar et al. 2012), indicating that these enzymes may be important for their common infection lifestyle. In addition, a search for homologs of subtilisins (S08A proteases), which were previously shown to be more closely related to plant than fungal proteins (Armijos Jaramillo et al. 2013; Gan et al. 2013), revealed the presence of two related sequences in the genome, indicating that they are also conserved in C. incanum.

Secreted Proteins

A total of 1,002 genes (8.2% of predicted protein-encoding genes) were predicted to encode secreted proteins in C. incanum. Out of these, 972 were assigned to 840 orthogroups, with only 32 predicted secreted proteins that were C. incanum-specific. GO terms over-represented among these secreted proteins included those associated with hydrolase activity including peptidase and carbohydrate-degrading activity. In addition, a number of homologs to known effectors from other plant pathogenic fungi were identified including AvrPi54, MC69 from M. oryzae, and secreted in xylem 6 from Fusarium spp. (supplementary table S6, Supplementary Material online). Homologs of AvrPi54 are also conserved in other Colletotrichum fungi although its expression has not been detected. Notably, no homolog of DN3, which was found to be essential for suppression of the conserved effector NIS1 (Yoshino et al. 2012), was identified in C. incanum, as well as in the C. sublineola and C. graminicola assemblies, despite conservation of the NIS1 gene. Gene families identified by orthoMCL that were predicted to include secreted proteins were analyzed based on their conservation in members from representatives of the six major sequenced groups within the Colletotrichum genus (fig. 5). The majority of secreted proteins which were present in orthogroups consisting of two or more secreted proteins are not lineage-specific. The gene families associated with higher copy numbers were found to be widely conserved and the majority of these were found to consist of proteins that could be associated with known PFAM domains. It was also noted that proteins that were less widely conserved (in four species or less) were enriched in cysteine residues relative to more widely conserved proteins (fig. 5). Further, among these, fewer proteins could be assigned PFAM domains, indicating an enrichment of genes with putatively unknown function among the less conserved proteins (fig. 5). The C. fructicola secreted protein profile showed more similarity to C. fioriniae than to the more closely related C. orbiculare in agreement with the findings on CAzymes and proteases described above (supplementary fig. S7, Supplementary Material online). Significantly, this similarity was not noted when clustering all proteases identified within the genomes rather than only those predicted to be secreted (supplementary fig. S6, Supplementary Material online).
F

Conservation of secreted protein clusters in Colletotrichum across six major clades of the Colletotrichum genus. Ticks indicate number of clusters containing two or more genes in each category. Single-copy genes were not included in this diagram. Tracks represent heatmaps indicating the numbers of genes from each cluster present in the genomes of the following: Ci: Colletotrichum incanum, Ch: Colletotrichum higginsianum, Cg: Colletotrichum graminicola, Cfi: Colletotrichum fioriniae, Cfr: Colletotrichum fructicola, Co: Colletotrichum orbiculare. Len: Scatter plot indicating average length of proteins in each gene family where each line represents 200 amino acids (aa) and green: 0–240 aa, orange: <480 aa, purple: <720 aa, red: <960 aa, light green: <1200 aa, yellow: ≥1440 aa; Ann: Dark blue marks denote gene families where at least half the members can be annotated with a known PFAM domain. Cys %: Average cysteine content of predicted proteins in orthogroup where the line represents 10% average cysteine content and where green: 0–1.6 Cys%, purple: <3.1%, orange: <4.7%, yellow: <6.3%, blue: <7.8%, red: <9.4%, brown: <10.9%, gray: ≤12.5%.

Conservation of secreted protein clusters in Colletotrichum across six major clades of the Colletotrichum genus. Ticks indicate number of clusters containing two or more genes in each category. Single-copy genes were not included in this diagram. Tracks represent heatmaps indicating the numbers of genes from each cluster present in the genomes of the following: Ci: Colletotrichum incanum, Ch: Colletotrichum higginsianum, Cg: Colletotrichum graminicola, Cfi: Colletotrichum fioriniae, Cfr: Colletotrichum fructicola, Co: Colletotrichum orbiculare. Len: Scatter plot indicating average length of proteins in each gene family where each line represents 200 amino acids (aa) and green: 0–240 aa, orange: <480 aa, purple: <720 aa, red: <960 aa, light green: <1200 aa, yellow: ≥1440 aa; Ann: Dark blue marks denote gene families where at least half the members can be annotated with a known PFAM domain. Cys %: Average cysteine content of predicted proteins in orthogroup where the line represents 10% average cysteine content and where green: 0–1.6 Cys%, purple: <3.1%, orange: <4.7%, yellow: <6.3%, blue: <7.8%, red: <9.4%, brown: <10.9%, gray: ≤12.5%. Comparisons between C. incanum and the C. graminicola genome indicated that 58 orthogroups containing secreted proteins were absent in C. graminicola and C. sublineola but present in all other Colletotrichum species analyzed, and may represent genes that are dispensable for infection of graminaceous monocots (supplementary table S2, Supplementary Material online). These orthogroups included PL1, PL3, and GH28 pectin-degrading enzymes, glucooligosaccharide oxidases discussed above, as well as families of three potential effectors EC16, EC20, and EC34, which were previously identified by Kleeman et al. (2012) in C. higginsianum. Apart from changes in gene family copy number, mutations in coding sequences are also important for adaptation to specific hosts. Signatures of positive selection with higher levels of nonsynonymous to synonymous mutations indicate genes that are under diversifying selection. Secreted protein-encoding gene sequences were assessed using PAML for positive selection. Because analysis using less than six sequences that are closely related reduces the accuracy of predicting positively selected sites (Anisimova et al. 2002), only gene families with a 1:1 orthology conserved in six sequenced genomes, representing each of the major clades sequenced, were assessed allowing for more reliable assessment of selection. These proteins were then divided based on the predicted localizations of C. incanum homologs. In addition, rather than analyzing dN/dS ratios over whole protein sequences, branch-site models were used, allowing for the detection of positive selection that acts on only certain sites of a full protein and in specified lineages, to identify sequences important for lineage-specific differences. In these tests, only conserved regions were tested for positive selection. Out of the 5,940 single-gene orthogroups that were tested, 310 were predicted to include a secreted protein from the C. incanum genome, 1,010 were predicted membrane proteins, and 1,367 were predicted to have a nuclear localization signal. Out of the secreted protein orthogroups, only ten sequences were predicted to have experienced positive selection specifically in the C. incanum lineage (FDR ≤ 0.05). Furthermore, it was estimated that in C. graminicola, seven sequences (supplementary table S7, Supplementary Material online) were under some significant levels of positive selection. Notably, GLRG_09110 is a homolog of the cas1 appressorium-specific protein and GLRG_05601 and GLRG_04689 are hypothetical proteins. Although signatures of positive selection could be detected in proteins localized to all compartments predicted even among these highly conserved single-copy genes, higher proportions of genes encoding secreted proteins were predicted to be under positive selection relative to those that targeted to other localizations in C. graminicola, C. higginsianum, C. fioriniae, and C. fructicola (P < 0.05; fig. 6). In addition, in C. fructicola, C. fioriniae, and C. orbiculare, genes predicted to encode proteins with nuclear localization signals were also enriched with positively selected sequences relative to those targeted to other localizations (P < 0.05). In contrast, no such enrichment was observed for membrane localized proteins. This indicates that the diversification of both secreted and nuclear localized proteins could play an important role in adaption to lineage-specific infection lifestyles.
F

Analysis of positive selection in all single-copy genes identified in Colletotrichum according to predicted localizations. Proportions of lineage-specific positively selected genes among (A) secreted, (B) nuclear, or (C) membrane-localized genes relative to all other proteins. Fisher’s exact test was used to test for significant differences among the proportions within each species. Ci: Colletotrichum incanum, Ch: Colletotrichum higginsianum, Cg: Colletotrichum graminicola, Cfi: Colletotrichum fioriniae, Cfr: Colletotrichum fructicola, Co: Colletotrichum orbiculare.

Analysis of positive selection in all single-copy genes identified in Colletotrichum according to predicted localizations. Proportions of lineage-specific positively selected genes among (A) secreted, (B) nuclear, or (C) membrane-localized genes relative to all other proteins. Fisher’s exact test was used to test for significant differences among the proportions within each species. Ci: Colletotrichum incanum, Ch: Colletotrichum higginsianum, Cg: Colletotrichum graminicola, Cfi: Colletotrichum fioriniae, Cfr: Colletotrichum fructicola, Co: Colletotrichum orbiculare.

Discussion

With the inclusion of the C. incanum genome presented in this study, the genomes of six major clades within the genus are now available (O’Connell et al. 2012; Gan et al. 2013; Baroncelli, Sreenivasaprasad, et al. 2014), enabling analysis of evolution between different members of Colletotrichum at a genus-wide level. Members of the spaethianum clade including C. incanum are interesting in that they are able to associate with a range of hosts, from dicots such as Arabidopsis, to the monocot lily, as shown in this study. Interestingly, C. incanum shows a slightly different host range on different Arabidopsis accessions compared with that of the closely related species C. higginsianum, which has been widely utilized as a model for hemibiotrophic fungal infection in the model plant Arabidopsis (Narusaka et al. 2004; O’Connell et al. 2004). In many plant–pathogen interactions, it is thought that pathogen host range may be determined by the presence or absence of specific pathogen effector proteins that promote virulence on specific hosts. In some cases, effectors may be recognized by cognate host plant resistance proteins, leading to pathogen death. In the case of the Col-0 Arabidopsis accession, that is susceptible to C. higginsianum but not C. incanum, we show that plants that were lacking the eds1 gene, a key signaling component for a large number of plant resistance proteins, were not compromised in resistance to C. incanum. Based on this result, it is not clear that resistance of Arabidopsis to C. incanum is mediated by host resistance proteins. Future analyses of crosses between susceptible and resistant Arabidopsis lines may provide further information regarding the molecular mechanism of C. incanum resistance in Arabidopsis. Gene family expansions and contractions have been shown to be important for evolution as an adaption to new ecological niches (Lespinet et al. 2002). Indeed, gene losses were associated with the change in host range of the smut fungus Melanopsichium pennyslvanicum, a dicot-infecting fungus that evolved from an ancestral monocot-infecting fungus (Sharma et al. 2014). This study shows that gene family losses are more common than gene family expansions during the evolution of Colletotrichum species. One possible explanation of this is that genes required for infection of different host plants possibly existed in ancestral Colletotrichum species and that losses from existing families occurred after host specialization. However, it is noted that the CAFE analysis used in this case is limited to the birth–death model of gene family evolution which may result in an overestimation of families with members present in the MRCA of the analyzed taxa (De Bie et al. 2006) and thus an increase in the number of families estimated to have experienced losses. Also, such a model is limited because it simplifies the nature of gene gain and loss that occurs in nature. For example, it does not take into account more complex gene gain or loss mechanisms such as horizontal gene transfer between two phylogenetically distinct taxa (Ames et al. 2012; Librado et al. 2012). Of the Colletotrichum species studied, only C. higginsianum showed a greater number of gene families experiencing expansions compared with contractions. The C. higginsianum genome assembly is relatively fragmented relative to that of the other analyzed Colletotrichum species, consisting of more than 10,000 scaffolds of greater than 1 kb in length. Thus, it is possible that some gene families may be artificially inflated in number if two or more partial gene models corresponding to the same transcript are split onto different scaffolds. It was due to this as well that synteny and conservation of potential secondary metabolite biosynthesis clusters in C. higginsianum relative to that of other Colletotrichum species could not be accurately assessed. An important feature that has emerged from the comparative analyses has been that the CAzyme complement of closely related fungi within the genus was shown to differ based on pathogen lifestyle rather than according to their phylogenetic relationship. It has previously been shown that the dicot-infecting C. higginsianum encodes more pectin-degrading enzymes compared with the monocot-adapted pathogen C. graminicola (O’Connell et al. 2012). In keeping with the hypothesis that the types and numbers of fungal CAzyme proteins are influenced by their host ranges, C. incanum, which can infect both Arabidopsis and the monocot lily, and C. higginsianum, which infects Arabidopsis, show CAzyme profiles that are more similar to that of other dicot-infecting Colletotrichum species, despite being more closely related to the monocot-specific graminicola clade members. Interestingly, it was also observed that members of the gloeosporioides clade showed more similarities in terms of their plant cell wall-degrading CAzymes to that of C. fioriniae, rather than to more closely related species such as C. orbiculare. Lineage-specific expansions especially in GH43 were noted in both C. fioriniae and C. fructicola indicating that expansions are likely to have occurred after the two lineages diverged and may have occurred independently within the genus. Together, this genus-wide analysis indicates that gene loss and gene gain are important mechanisms for Colletotrichum to tailor their genes according to their specific pathogenic lifestyles. Intriguingly, both C. fructicola and C. fioriniae also showed a similar clustering when grouped according to the number and types of predicted secreted protease-encoding genes, especially with expansions in S10 proteases. Members from both the gloeosporioides and the acutatum clades are noted for broad host ranges and their ability to infect fruit, causing fruit rot on a variety of plants. Indeed, until molecular methods were developed, they were difficult to distinguish by traditional taxonomical methods and strains from one group were often confused for the other (Wharton and Diéguez-Uribeondo 2004). The genetic similarities observed indicate that similar molecular mechanisms may underlie the phenotypic similarities observed between the two clades of fungi despite their phylogenetic separation. In addition, in our analysis of secreted proteins, it was observed that groups of less conserved protein-encoding genes were associated with higher average cysteine contents and lack of PFAM domain assignments (fig. 5). This is interesting given that hallmarks of effector proteins that are known to contribute to infection in other plant pathogenic fungi include lack of known protein domains and higher cysteine contents, which may be important for structural stabilization of these proteins (Stergiopoulos and de Wit 2009; Sperschneider et al. 2015). Because some effectors are also recognized by host immune components that differ from host to host, the reduced conservation of these proteins among different members of Colletotrichum may be due to effector diversification or loss to avoid recognition by different hosts. In addition, specific gene families associated with transmembrane transport were also found to be reduced among graminicola clade members relative to that of other Colletotrichum species. Among transporter families with reduced numbers were those encoding myo-inositol transporters. Interestingly, there is evidence that the presence of exogenous myo-inositol can differentially affect monocot plants, such as perennial ryegrass, and the dicot plant Arabidopsis (Zhang et al. 2013). Differences include lignin and starch accumulation in the presence of exogenous myo-inositol and the reduction of defense responses in its absence in monocots, which were not detected in Arabidopsis (Zhang et al. 2013). It is possible that the reduced number of myo-inositol transporters in monocot-specific Colletotrichum may have occurred during host specialization although this has yet to be explored. Another important mechanism of adaptation to new ecological niches and lifestyles has been functional mutation of genes. A recent study analyzing genome sequence evolution among eight different C. graminicola strains indicated that noncoding and coding sequences associated with effectors as well as genes upregulated during infection were shown to have higher levels of polymorphisms (Rech et al. 2014). In this study, we searched for signatures of positive selection in specific lineages with the hypothesis that genes under positive selection would be rapidly evolving in response to different ecological niches. Interestingly, even among highly conserved proteins, it was noted that genes predicted to encode secreted proteins were enriched for lineage-specific positively selected genes. In addition, it was found that in C. orbiculare, C. fioriniae, and C. fructicola, genes encoding proteins with potential nuclear localization sequences were also enriched for positively selected proteins to a similar degree as the secreted proteins. Conceivably, the diversification of transcription regulators may also be important in adaptation to different lifestyles.

Supplementary Material

Supplementary tables S1–S7 and figures S1–S8 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  75 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  BadiRate: estimating family turnover rates by likelihood-based methods.

Authors:  P Librado; F G Vieira; J Rozas
Journal:  Bioinformatics       Date:  2011-11-10       Impact factor: 6.937

3.  Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

Authors:  Mira V Han; Gregg W C Thomas; Jose Lugo-Martinez; Matthew W Hahn
Journal:  Mol Biol Evol       Date:  2013-05-24       Impact factor: 16.240

4.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

5.  Estimating the Phanerozoic history of the Ascomycota lineages: combining fossil and molecular data.

Authors:  Christina Beimforde; Kathrin Feldberg; Stephan Nylinder; Jouko Rikkinen; Hanna Tuovila; Heinrich Dörfelt; Matthias Gube; Daniel J Jackson; Joachim Reitner; Leyla J Seyfullah; Alexander R Schmidt
Journal:  Mol Phylogenet Evol       Date:  2014-04-30       Impact factor: 4.286

6.  Determining the evolutionary history of gene families.

Authors:  Ryan M Ames; Daniel Money; Vikramsinh P Ghatge; Simon Whelan; Simon C Lovell
Journal:  Bioinformatics       Date:  2011-10-28       Impact factor: 6.937

7.  Enhanced Agrobacterium-mediated transformation efficiencies in monocot cells is associated with attenuated defense responses.

Authors:  Wan-Jun Zhang; Ralph E Dewey; Wendy Boss; Brian Q Phillippy; Rongda Qu
Journal:  Plant Mol Biol       Date:  2012-12-15       Impact factor: 4.076

8.  Lifestyle transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses.

Authors:  Richard J O'Connell; Michael R Thon; Stéphane Hacquard; Stefan G Amyotte; Jochen Kleemann; Maria F Torres; Ulrike Damm; Ester A Buiate; Lynn Epstein; Noam Alkan; Janine Altmüller; Lucia Alvarado-Balderrama; Christopher A Bauser; Christian Becker; Bruce W Birren; Zehua Chen; Jaeyoung Choi; Jo Anne Crouch; Jonathan P Duvick; Mark A Farman; Pamela Gan; David Heiman; Bernard Henrissat; Richard J Howard; Mehdi Kabbage; Christian Koch; Barbara Kracher; Yasuyuki Kubo; Audrey D Law; Marc-Henri Lebrun; Yong-Hwan Lee; Itay Miyara; Neil Moore; Ulla Neumann; Karl Nordström; Daniel G Panaccione; Ralph Panstruga; Michael Place; Robert H Proctor; Dov Prusky; Gabriel Rech; Richard Reinhardt; Jeffrey A Rollins; Steve Rounsley; Christopher L Schardl; David C Schwartz; Narmada Shenoy; Ken Shirasu; Usha R Sikhakolli; Kurt Stüber; Serenella A Sukno; James A Sweigard; Yoshitaka Takano; Hiroyuki Takahara; Frances Trail; H Charlotte van der Does; Lars M Voll; Isa Will; Sarah Young; Qiandong Zeng; Jingze Zhang; Shiguo Zhou; Martin B Dickman; Paul Schulze-Lefert; Emiel Ver Loren van Themaat; Li-Jun Ma; Lisa J Vaillancourt
Journal:  Nat Genet       Date:  2012-08-12       Impact factor: 38.330

9.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

10.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

View more
  24 in total

1.  Evolutionary Analysis of Pectin Lyases of the Genus Colletotrichum.

Authors:  Alicia Lara-Márquez; Ken Oyama; María G Zavala-Páramo; Maria G Villa-Rivera; Ulises Conejo-Saucedo; Horacio Cano-Camacho
Journal:  J Mol Evol       Date:  2017-10-25       Impact factor: 2.395

2.  Updating species diversity of Colletotrichum, with a phylogenomic overview.

Authors:  F Liu; Z Y Ma; L W Hou; Y Z Diao; W P Wu; U Damm; S Song; L Cai
Journal:  Stud Mycol       Date:  2022-01-11       Impact factor: 25.731

3.  specificity: an R package for analysis of feature specificity to environmental and higher dimensional variables, applied to microbiome species data.

Authors:  John L Darcy; Anthony S Amend; Sean O I Swift; Pacifica S Sommers; Catherine A Lozupone
Journal:  Environ Microbiome       Date:  2022-06-25

4.  A highly contiguous reference genome assembly for Colletotrichum falcatum pathotype Cf08 causing red rot disease in sugarcane.

Authors:  Amaresh Chandra; Dinesh Singh; Deeksha Joshi; Ashwini D Pathak; Ram K Singh; Sanjeev Kumar
Journal:  3 Biotech       Date:  2021-02-27       Impact factor: 2.406

5.  A comparative genomic analysis of putative pathogenicity genes in the host-specific sibling species Colletotrichum graminicola and Colletotrichum sublineola.

Authors:  E A S Buiate; K V Xavier; N Moore; M F Torres; M L Farman; C L Schardl; L J Vaillancourt
Journal:  BMC Genomics       Date:  2017-01-10       Impact factor: 3.969

6.  Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

Authors:  Soumya Rao; Madhusudan R Nandineni
Journal:  PLoS One       Date:  2017-08-28       Impact factor: 3.240

7.  An evolutionarily conserved non-synonymous SNP in a leucine-rich repeat domain determines anthracnose resistance in watermelon.

Authors:  Yoon Jeong Jang; Minseok Seo; Craig P Hersh; Sun-Ju Rhee; Yongjae Kim; Gung Pyo Lee
Journal:  Theor Appl Genet       Date:  2018-11-16       Impact factor: 5.699

8.  Gene family expansions and contractions are associated with host range in plant pathogens of the genus Colletotrichum.

Authors:  Riccardo Baroncelli; Daniel Buchvaldt Amby; Antonio Zapparata; Sabrina Sarrocco; Giovanni Vannacci; Gaétan Le Floch; Richard J Harrison; Eric Holub; Serenella A Sukno; Surapareddy Sreenivasaprasad; Michael R Thon
Journal:  BMC Genomics       Date:  2016-08-05       Impact factor: 3.969

9.  The Colletotrichum acutatum Species Complex as a Model System to Study Evolution and Host Specialization in Plant Pathogens.

Authors:  Riccardo Baroncelli; Pedro Talhinhas; Flora Pensec; Serenella A Sukno; Gaetan Le Floch; Michael R Thon
Journal:  Front Microbiol       Date:  2017-10-11       Impact factor: 5.640

10.  Draft Genome Assembly of Colletotrichum chlorophyti, a Pathogen of Herbaceous Plants.

Authors:  P Gan; M Narusaka; A Tsushima; Y Narusaka; Y Takano; K Shirasu
Journal:  Genome Announc       Date:  2017-03-09
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.