| Literature DB >> 24252298 |
Jana Sperschneider1, Donald M Gardiner, Jennifer M Taylor, James K Hane, Karam B Singh, John M Manners.
Abstract
BACKGROUND: Fungal pathogens cause devastating losses in economically important cereal crops by utilising pathogen proteins to infect host plants. Secreted pathogen proteins are referred to as effectors and have thus far been identified by selecting small, cysteine-rich peptides from the secretome despite increasing evidence that not all effectors share these attributes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24252298 PMCID: PMC3914424 DOI: 10.1186/1471-2164-14-807
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The comparative pipeline for predicting candidate virulence genes and effectors. Our comparative analysis pipeline predicts proteins in a query pathogen genome which have HMM sequence similarity hits predominantly in fungal pathogens. For a protein to be associated with pathogenicity, the list of phmmer hits must include at least 80% hits to proteins from pathogen species (including the query genome itself). These pathogen-associated proteins are then analyzed in terms of two criteria: (1) their degree of conservation across other fungal pathogens with and without a cereal host and (2) their potential to act as effectors outside the fungal cell. Proteins which are highly conserved across a diverse range of other pathogens as identified by our F-measure ranking are prime candidates for virulence-related proteins. To identify putative effector candidates in an unbiased way, an unsupervised clustering technique based on 35 sequence-derived protein features is used to look for protein clusters with an enriched secretion signal. Further investigation of the clusters with regards to predicted sequence motifs, novel secretion signals, amino acid composition and functional annotation is conducted to find and characterise novel putative effector families.
Figure 2A visualization of the -measure for finding highly conserved virulence genes in pathogens, with and without a cereal host. The set of 174 fungal proteomes from the JGI MycoCosm tree [33] is shown with the subsets of pathogenic fungi and cereal-infecting fungi. The genera of cereal-infecting fungi are colored according to fungal lifestyle (biotrophs, hemibiotrophs, necrotrophs). The F-measure is the harmonic mean of precision and recall and ranges from 0 to 1. For a query pathogen protein of interest, a F-measure of 0 means that the protein is only found in the query pathogen genome. A perfect pathogen F-measure of 1 means that this protein is found in all pathogen genomes and at the same time only in pathogenic fungi, thus indicating a protein highly relevant to pathogenicity. In the middle of the scale sit pathogen proteins which can be found mostly in pathogens and at the same time in some non-pathogens. We also introduce the more specific cereal F-measure to identify proteins which are highly conserved and exclusive to cereal-infecting fungi to identify host-specific infection patterns.
Figure 3HMM search returns ToxA-like proteins in isolates C4 and C5. A phmmer search with SnToxA (SNOG_16571) returns sequence similarity hits to Pyrenophora tritici-repentis (PTRG_04889) and hits to potential ToxA-like domains in two proteins in Cochliobolus heterostrophus isolates C4 and C5. The corresponding multiple alignment of the phmmer domain hits is visualized by Jalview [39].
The top 10 pathogen-associated proteins in terms of cereal and pathogen F-measure
| | | | | | | |
| FGSG_03333 | 0.71 | Haloacid dehalogenase-like hydrolase (8e-10, PF12710) | Phosphorylcholine phosphatase (100%/90%) | 41.1 | Yes | 93% Bacteria, 7% Ascomycota |
| FGSG_03338 | 0.7 | – | Serine protease inhibitor 1 (98.4%/52%) | 14.4 | No | 100% Ascomycota |
| FGSG_09148 | 0.69 | Peptidase C69 (3.1e-25, PF03577) | Acyl-coenzyme (99.2%/44%) | 57.2 | Yes | 90% Bacteria, 9% Eukaryota, 1% Archaea |
| FGSG_03339 | 0.66 | – | Serine protease inhibitor 1 (99.2%/55%) | 14.7 | No | 100% Ascomycota |
| FGSG_09328 | 0.66 | Fungal specific transcription factor domain (3.5e-05, PF04082) | Centromere DNA-binding protein complex cbf3 (98.2%/72%) | 62.6 | No | 94% Ascomycota, 6% Basidiomycota |
| FGSG_04015 | 0.66 | – | – | 57.4 | No | 100% Ascomycota |
| FGSG_03861 | 0.64 | DUF3425 (1.8e-14, PF11095) | Pyrimidine pathway regulator 1 (99.4%/15%) | 62.9 | No | 94% Ascomycota, 5% Basidiomycota, 1% others |
| FGSG_04507 | 0.63 | C2 domain (1.1e-10, PF00168) | Endocytosis, exocytosis, synaptotagmin-1 (100%/54%) | 52.1 | No | 41% Metazoan, 29% Viridiplantae, 22% Ascomycota, 8% others |
| FGSG_07909 | 0.62 | Homeobox KN domain (2.3e-15, PF05920) | Homeobox domain (99.7%/11%) | 84.6 | Yes | 56% Metazoan, 29% Viridiplantae, 12% Ascomycota, 3% others |
| FGSG_07846 | 0.61 | FMO-like (5.8e-16, PF00734) | Monooxygenase (100%/75%) | 62.6 | No | 37% Bacteria, 32% Metazoan, 15% Viridiplantae, 12% Ascomycota, 4% others |
| | | | | | | |
| FGSG_04060 | 0.65 | Rare lipoprotein A like double-psi beta barrel (3.1e-05, PF03330) | Beta-expansin 1a (100%/94%) | 22.2 | Yes | 48% Bacteria, 16% Ascomycota, 13% Basidiomycota, 8% Phytophthora, 5% dictyostelium, 10% others |
| FGSG_09841 | 0.65 | – | – | 20.8 | Yes | 76% Ascomycota, 15% Bacteria, 6% Archaea, 3% others |
| FGSG_11496 | 0.64 | Rare lipoprotein A like double-psi beta barrel (3.1e-05, PF03330) | Beta-expansin 1a (100%/93%) | 25.2 | Yes | 23% Viridiplantae, 21% Bacteria, 18% Ascomycota, 13% Basidiomycota, 11% Phytophthora, 6% Dictyostelium, 8% others |
| FGSG_03333 | 0.58 | Haloacid dehalogenase-like hydrolase (8e-10, PF12710) | Phosphorylcholine phosphatase (100%/90%) | 41.1 | Yes | 93% Bacteria, 7% Ascomycota |
| FGSG_09148 | 0.55 | Peptidase C69 (3.1e-25, PF03577) | Acyl-coenzyme (99.2%/44%) | 57.2 | Yes | 90% Bacteria, 9% Eukaryota, 1% Archaea |
| FGSG_04507 | 0.52 | C2 domain (1.1e-10, PF00168) | Endocytosis, exocytosis, synaptotagmin-1 (100%/54%) | 52.1 | No | 41% Metazoan, 29% Viridiplantae, 22% Ascomycota, 8% others |
| FGSG_03549 | 0.52 | – | – | 28.2 | No | 100% Ascomycota |
| FGSG_11152 | 0.49 | – | Coronatine-insensitive protein 1 (99.9%/94%) | 44.4 | No | 100% Ascomycota |
| FGSG_09328 | 0.48 | Fungal specific transcription factor domain (3.5e-05, PF04082) | Centromere DNA-binding protein complex cbf3 (98.2%/72%) | 62.6 | No | 94% Ascomycota, 6% Basidiomycota |
| FGSG_07909 | 0.48 | Homeobox KN domain (2.3e-15, PF05920) | Homeobox domain (99.7%/11%) | 84.6 | Yes | 56% Metazoan, 29% Viridiplantae, 12% Ascomycota, 3% others |
For each protein, its Pfam annotation, Phyre2 structure prediction, molecular weight and signal peptide predicted by SignalP are given as well as the distribution of significant phmmer hits against the non-redundant protein databank (NR) in terms of taxonomy.
Figure 4encodes a putative phosphorylcholine phosphatase with a shared selection process with bacterial plant pathogens. For protein FGSG_03333, a BLASTp search returns 231 hits with E-value < 1.0e-05 and these were merged with our results from the MycoCosm tree. A MUSCLE alignment [48] and maximum likelihood phylogenetic tree estimation by PhyML [49] returns this phylogenetic tree with branch support values, visualized by FigTree [50]. The tree indicates that this protein has been selectively retained in hemibiotrophic and necrotrophic fungal pathogens of cereals, plants and insects. Furthermore, this protein is closely related to a highly conserved protein in a large number of Pseudomonads, indicating a shared selection process and suggesting a common role in pathogenicity.
Selected properties for the clusters of pathogen-associated proteins
| ↑ | ↑ | ↑ | | | | | | | | | | |
| | | ↑ | ↑ | | | | | | | | | |
| | | ↓ | | ↓ | ↓ | ↓ | ↓ | ↑ | ↑ | | ↑ | |
| | ↑ | ↑ | | ↓ | | ↓ | ↓ | | ↑ | ↑ | ↓ | |
| | ↑ | ↑ | | ↓ | ↑ | | ↓ | | | ↑ | ↓ | |
| ↑ | | | | ↑ | ↑ | | ↓ | ↓ | | ↓ | | |
| ↑ | | ↓ | | | ↓ | ↓ | ↑ | ↓ | | | ↑ | |
| ↓ | ↓ | | | ↓ | | ↑ | | ↑ | | ↑ | | |
| ↓ | ↓ | ↓ | | | | ↑ | ↑ | ↑ | | | | |
| ↓ | ↓ | ↓ | | | ↓ | | | ↑ | ↑ | | | |
| ↓ | | | | | | ↑ | | | ↓ | | | |
| | | | | ↓ | | | | | ↑ | | | |
| | | ↑ | | | | | | | | | | |
| ↑ | | ↓ | | ↑ | | | | ↓ | | | | |
| | ↑ | | | | | ↓ | | | | | | |
| | ↑ | | | | | | | | | | | |
| ↑ | ||||||||||||
For each feature in the 35-dimensional feature vector, Mann–Whitney U tests were used to test whether the distribution within a cluster is identical to the full background distribution for all clusters and highly significant p-values for both directions (lesser ↓ and greater ↑, p-value < 2.2e-16) are shown. Secretion refers to the predicted SignalP score and WoLF PSORT extracellular score. The following amino acid membership are used: tiny (A,C,G,S,T), small (A,C,D,G,N,P,S,T,V), aliphatic (A,I,L,V), aromatic (F,H,W,Y), polar (D,E,H,K,N,Q,R,S,T), charged (D,E,H,K,R), basic (H, K, R) and acidic (D, E).
Figure 5A highly conserved N-terminal [SG]-P-C-[KR]-P motif in a subset of proteins. MEME returns a [SG]-P-C-[KR]-P motif (e-value 3.4e-091) adjacent to the predicted signal peptide cleavage site for 37 of the 110 proteins in cluster C8. A MUSCLE alignment of the N-terminal region of a core set of well-aligned 31 of those proteins is visualized by Jalview. The MEME logos are shown with the [SG]-P-C-[KR]-P motif followed by a serine-/threonine-rich stretch. An iterative public database search of these aligned sequences (jackhmmer) returned additional members of this putative protein family in F. graminearum, F. pseudograminearum, F. oxysporum and F. solani.
Figure 6A protein family with a conserved N-terminal [WYF]-C-x-T-Y-x-S-T-Y-L motif in . MEME returns a [WYF]-C-x-T-Y-x-S-T-Y-L motif (e-value 1.2e-028) following the predicted signal peptide cleavage site for seven of the proteins in cluster C8. A MUSCLE alignment of the truncated 7 sequences in cluster C8 is visualized by Jalview.
Figure 7Distribution of predicted protein clusters across the genome. A Circos plot [62] is shown which visualizes the four F. graminearum chromosomes with ticks in kilobase (Kbp) units. The following bands for F. graminearum are visualized: (I) Recombination frequency (blue bars) and SNP density (line), (II) gene density, (III) GC content, (IV) set of 2830 pathogen-associated proteins, (V) protein cluster C7, (VI) protein cluster C8, (VII) proteins with the [SG]-P-C-[KR]-P motif. For each gene set, the location of the genes on the chromosomes and a heatmap in red shading for the gene count in 500 Kbp bins are shown. It can be seen that proteins from putative effector clusters C7 and C8 occur predominantly in regions of genome innovation.
N-terminal motif search results for the pathogen-associated protein clusters in F. graminearum
| Cluster C4 | 10 sites | 3.5e-31 | 143 | Domain seems widespread throughout eukaryotic kingdom. | |
| Q[QS]QQQQQ[QH]QQ[QDM][QAP][QP]Q[QH]QQ[QM][QN]Q[QS]QQ[QL][QH]Q[QAP]QQ[QA][QAP][QM][QL][QP]Q[QP][QH][QP][QL][QH]Q | |||||
| 13 sites | 8.0e-24 | 56 | Domain seems restricted to fungal kingdom, large number of hits predominantly to the Ascomycota. Some protein hits are annotated as fungal transcriptional regulatory proteins or GTP binding protein. | ||
| AC[DE]RC[RK]R[LR]K[IT][KR]C[DS] | |||||
| 6 sites | 5.1e-03 | 48 | Domain seems restricted to fungal kingdom, large number of hits predominantly to the Ascomycota. Some protein hits are annotated with fungal transcriptional regulatory function. Weak Pfam domain annotation as zinc cluster. | ||
| CSRCV[KR]xKLDC[DE]Y | |||||
| Cluster C5 | 6 sites | 9.2e-03 | 4 | Domain seems widespread throughout eukaryotic kingdom. There are weak similarities to zinc-finger protein domain. | |
| MCVxV[HT]KDITC[PS]TC | |||||
| Cluster C8 | 37 sites | 9.3e-91 | 15 | Motif is adjacent to predicted signal peptide cleavage site. Domain is restricted to pathogenic | |
| [AL][LA]Axx[AV]xA[GS]PC[KR]PSS | |||||
| 8 sites | 1.6e-29 | 47 | Search converged after three iterations. Domain is restricted to | ||
| WCITY[LE]STYL[VA][PA][VI]SN | |||||
| Cluster C12 | 11 sites | 2.4e-16 | 11 | Domain restricted to fungal kingdom, hits are predominantly to Ascomycota including a large number of pathogens, e.g. | |
| FH[PL]F[SL]RLPPE[LIV]RL[MQ]I[WY]RHALT |
We report motifs in F. graminearum which are distinct from convential signal peptides and which are conserved in more than 5 sequences and occur within the first 150 amino acids to identify potential novel signalling, subcellular targetting or uptake motifs.
The set of 19 fungal pathogen genomes used in our comparative study which have a cereal host
| Wheat, barley, maize | Hemibiotroph | [ | ||
| | Wheat, barley, maize | Hemibiotroph | [ | |
| Rice, barley | Hemibiotroph | [ | ||
| Wheat | Biotroph | [ | ||
| Maize | Biotroph | [ | ||
| Maize | Necrotroph | - | ||
| Maize | Necrotroph | - | ||
| | Maize | Necrotroph | [ | |
| | Maize | Necrotroph | [ | |
| | Sorghum | Necrotroph | - | |
| | Rice | Necrotroph | - | |
| | Cereal generalist | Hemibiotroph | [ | |
| | Oat | Necrotroph | - | |
| Wheat | Hemibiotroph | [ | ||
| Barley | Necrotroph | [ | ||
| | Wheat | Necrotroph | [ | |
| Maize | Hemibiotroph | [ | ||
| Wheat | Necrotroph | [ | ||
| Barley | Biotroph | [ |
{TC “ 1 The set of 19 fungal pathogen genomes from the MycoCosm tree which have a cereal hots.”\f t}.