| Literature DB >> 30184068 |
Robert A Syme1, Kar-Chun Tan1, Kasia Rybak1, Timothy L Friesen2, Bruce A McDonald3, Richard P Oliver1, James K Hane1,4.
Abstract
We report a fungal pan-genome study involving Parastagonospora spp., including 21 isolates of the wheat (Triticum aestivum) pathogen Parastagonospora nodorum, 10 of the grass-infecting Parastagonospora avenae, and 2 of a closely related undefined sister species. We observed substantial variation in the distribution of polymorphisms across the pan-genome, including repeat-induced point mutations, diversifying selection and gene gains and losses. We also discovered chromosome-scale inter and intraspecific presence/absence variation of some sequences, suggesting the occurrence of one or more accessory chromosomes or regions that may play a role in host-pathogen interactions. The presence of known pathogenicity effector loci SnToxA, SnTox1, and SnTox3 varied substantially among isolates. Three P. nodorum isolates lacked functional versions for all three loci, whereas three P. avenae isolates carried one or both of the SnTox1 and SnTox3 genes, indicating previously unrecognized potential for discovering additional effectors in the P. nodorum-wheat pathosystem. We utilized the pan-genomic comparative analysis to improve the prediction of pathogenicity effector candidates, recovering the three confirmed effectors among our top-ranked candidates. We propose applying this pan-genomic approach to identify the effector repertoire involved in other host-microbe interactions involving necrotrophic pathogens in the Pezizomycotina.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30184068 PMCID: PMC6152946 DOI: 10.1093/gbe/evy192
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of Source Material and Resequencing Data for Strains of (A) P. nodorum, (B) P. avenae, and (C) the P2 clade
| Isolate ID | Source | Collection Year | Total Length of Error-corrected Reads (Mb) | Estimated Genome Coverage (X) | Isolation Source | Contributor |
|---|---|---|---|---|---|---|
| B2.1b | Iran | 2005 | 686.1 | 9.2 | Mohammad Razavi, Iranian Research Institute of Plant Protection | |
| C1.2a | 886.1 | 11.9 | ||||
| IR10_9.1a | 2010 | 528.4 | 7.1 | |||
| FIN-2 | Finland | unknown | 1,998.7 | 26.9 | Andrea Ficke, Nibio, As, Norway | |
| SWE-3 | Sweden | 1,517.8 | 20.4 | |||
| Sn Cp2052 | 1,031.9 | 13.9 | Hans Jorgensen, University of Copenhagen | |||
| BRSn9870 | Brazil | 3,316.9 | 44.6 | Flavio Santana, EMBRAPA | ||
| Sn99CH 1A7a | Switzerland | 1999 | 3,888.2 | 52.3 | Bruce McDonald, ETH Zurich | |
| SnChi01 40a | China | 2001 | 2,796.4 | 37.6 | R. Wu, Louyuan, Fujian Province | |
| SnSA95.103 | South Africa | 1994 | 1,019 | 13.7 | Pedro Crous, University of Stellenbosch, South Africa | |
| AR1-1 | Arkansas, USA | 2011 | 1,986.2 | 26.7 | Christina Cowger, USDA-ARS | |
| GA9-1 | Georgia, USA | 4,296.5 | 57.8 | |||
| MD4-1 | Maryland, USA | 2012 | 4,325.8 | 58.2 | ||
| VA 5-2 | Virginia, USA | 2,288.3 | 30.8 | |||
| OH03 Sn-1501 | Ohio, USA | 2003 | 2,229.8 | 30 | Pat Lipps, Ohio State University | |
| SNOV92X D1.3 | Texas, USA | 1992 | 2,524.6 | 34 | Bruce McDonald, Texas A&M University | |
| SnOre11-1 | Oregon, USA | 2011 | 4,763.4 | 64.1 | – | |
| WAC8410 | Australia | 2010 | 6,032.4 | 81.2 | Department of Agriculture and Food, Western Australia | |
| IR10_5.2b | Iran | 2010 | 850.5 | 11.4 | Mohammad Razavi, Iranian Research Institute of Plant Protection | |
| Hartney99 | Canada | 2005 | 1,132.5 | 15.2 | – | |
| Jansen #4_55 | 480.1 | 6.5 | – | |||
| SN11IR_2_1.1 | Iran | 2011 | 1,415.3 | 19 | Mohammad Razavi, Iranian Research Institute of Plant Protection | |
| 82-4841 | North Dakota, USA | 1982 | 1,207.7 | 16.2 | Unknown grass | Joe Krupinsky, USDA-ARS |
| 83-6011-2 | 1983 | 1,136.3 | 15.3 | |||
| SN11IR_6_1.1 | Iran | 2011 | 2,150.2 | 28.9 | Mohammad Razavi, Iranian Research Institute of Plant Protection | |
| SN11IR_7_2.3 | Iran | 2011 | 1,039.4 | 14 | ||
| Mt. Baker | Washington, USA | 2009 | 585.8 | 7.9 | – | |
| s258 | Netherlands | 2005 | 1,163.2 | 15.6 | – | |
| A1 3.1a | Iran | 2005 | 1,813.1 | 24.4 | Mohammad Razavi, Iranian Research Institute of Plant Protection | |
| H6.2b | 2005 | 3,030 | 40.8 | |||
. 1.—A phylogeny of the P. nodorum, P. avenae, and P2 strains used in this study and the presence or absence of known effector loci: ToxA, Tox1, and Tox3. Green boxes indicate the presence of a known effector locus in that strain and yellow box indicates the presence of a psuedogenized version.
Summary of Genome Assembly Metrics for Resequenced Strains of (A) P. nodorum, (B) P. avenae, and (C) the P2 clade
| Isolate ID | № Scaffolds | Largest Scaffold (kb) | Total Length (Mb) | N50 (kb) | Whole SN15 Gene Count by QUAST | Partial SN15 Gene Count by QUAST |
|---|---|---|---|---|---|---|
| B2.1b | 2,906 | 386.4 | 37.38 | 60.3 | 12,628 | 784 |
| C1.2a | 1,557 | 325.3 | 37.43 | 80.4 | 12,751 | 649 |
| IR10_9.1a | 3,673 | 140.6 | 37.28 | 20.7 | 11,442 | 1,920 |
| FIN-2 | 1,381 | 451.6 | 38.41 | 117.9 | 12,952 | 479 |
| SWE-3 | 1,714 | 335.5 | 37.85 | 58.8 | 12,734 | 727 |
| Sn Cp2052 | 3,026 | 190.2 | 37.32 | 38.2 | 12,310 | 1,148 |
| BRSn9870 | 4,911 | 309.9 | 41.23 | 52.4 | 12,342 | 1,148 |
| Sn99CH 1A7a | 853 | 521 | 37.9 | 180.7 | 13,062 | 361 |
| SnChi01 40a | 779 | 1,268.10 | 37.88 | 206.3 | 13,118 | 332 |
| SnSA95.103 | 11,772 | 48 | 49.94 | 6.5 | 9,545 | 3,882 |
| AR1-1 | 882 | 748.5 | 36.61 | 135.6 | 12,999 | 412 |
| GA9-1 | 664 | 938.6 | 36.53 | 215.3 | 13,074 | 356 |
| MD4-1 | 701 | 599.9 | 36.53 | 191 | 13,045 | 384 |
| VA 5-2 | 1,115 | 485.6 | 36.48 | 89.1 | 12,787 | 606 |
| OH03 Sn-1501 | 1,281 | 349.9 | 37.1 | 85 | 12,844 | 577 |
| SNOV92X D1.3 | 785 | 524.1 | 36.66 | 145.6 | 13,009 | 442 |
| SnOre11-1 | 748 | 875.4 | 37.42 | 249.6 | 13,103 | 352 |
| WAC8410 | 384 | 1,060.50 | 40.27 | 316.8 | 13,223 | 277 |
| IR10_5.2b | 1,681 | 267 | 35.51 | 57.3 | 18 | 14 |
| Hartney99 | 3,381 | 124.3 | 36.58 | 27.6 | 47 | 20 |
| Jansen 4_55 | 10,109 | 83.1 | 32.06 | 4.3 | 20 | 166 |
| SN11IR_2_1.1 | 5,762 | 168.2 | 41.54 | 38.6 | 12 | 15 |
| 82-4841 | 2,444 | 218.3 | 38.53 | 50.5 | 21 | 13 |
| 83-6011-2 | 2,367 | 193.8 | 37.52 | 43.9 | 21 | 13 |
| SN11IR_6_1.1 | 1,174 | 737.1 | 33.51 | 102.8 | 3 | 7 |
| SN11IR_7_2.3 | 2,215 | 244.5 | 33.6 | 44.8 | 6 | 12 |
| Mt. Baker | 8,309 | 38.5 | 34.14 | 6.2 | 15 | 81 |
| s258 | 4,090 | 183.1 | 39.49 | 32.8 | 16 | 21 |
| H6.2b | 1,613 | 411.9 | 38.68 | 74.6 | 2 | 6 |
| A1 3.1a | 1,764 | 419.9 | 39.05 | 68.3 | 3 | 6 |
Note.—P. avenae and P2 strains show a low number of P. nodorum gene matches from QUAST due to dissimilarities in the coding sequence relative to the P. nodorum SN15 reference.
. 3.—Presence and absence of protein orthologs between P. nodorum and P. avenae strains. The number of P. nodorum strains that have contributed a protein to the cluster determines the x-axis location and the number of P. avenae strains that have contributed a protein to the cluster determines the y-axis location. Core conserved genes with members from all strains are at the top-right and strain-specific genes are at the bottom-left. About 10,798 (53.8%) of clusters are missing from at most 2 strains. About 6,229 clusters (29.0%) are present in at most 3 strains. The three known P. nodorum effectors (red, blue, and green stars) are present in only some of the isolates.
. 4.—Intergenic distances at the 3′ and 5′ ends for all genes of P. nodorum WAC8410. Intergenic distances at 3′ and 5′ for all WAC8410 genes. Hexagonal cells are colored to indicate the number of genes that have flanking intergenic distances that place it within the cell’s bounds. Strain-specific genes (highlighted as hollow circles) generally have higher distances to the neighboring gene, placing them in the top-right quadrant. The mean intergenic flank distance is 6,558 bp for strain specific genes and 1,162 bp for all other genes.
Summary of Protein Conservation Across the P. nodorum and P. avenae Strains
| Reference Protein Set | 13,836 proteins |
|---|---|
| Missing from 0 strains | 8,660 clusters |
| Missing from at most 1 strain | 9,921 clusters |
| Missing from at most 2 strains | 10,798 clusters |
| Missing from 0 | 11,366 clusters (11,821 SN15 proteins) |
| Missing from at most 1 | 12,473 clusters (12,877 SN15 proteins) |
| Missing from at most 2 | 13,049 clusters (13,139 SN15 proteins) |
| Observed in only 1 strain | 3,995 clusters (108 SN15 proteins) |
| Observed in at most 2 strains | 5,438 clusters (240 SN15 proteins) |
| Observed in at most 3 strains | 6,229 clusters (291 SN15 proteins) |
| Observed in between 4 and 30 strains (inclusive) | 6,620 clusters (321 SN15 proteins) |
| Present in all nodorum, absent from all | 216 |
| Present in all | 44 |
Note.—About 8660 protein clusters are observed in all strains. There are 204 protein clusters containing members of only one species. The set of “dispensable” proteins is defined here as proteins that are not species-specific (observed in fewer than four isolates) and not well conserved (missing in fewer than three isolates). This “dispensable” set of 2192 proteins contains 213 SN15 proteins, including all of the known effectors.
. 5.—Alignment of genes surrounding the P. avenae Pat5 Tox3 homolog with the corresponding loci from the P. nodorum SN15 reference and a representative P. avenae Pat4 region. The Tox3 gene is absent from the Pat4 genome completely, and present in the SN15, but on a different scaffold to the one shown here.
Counts of the Numbers of SN15 Reference Proteins That Match Each Effector Prediction Criteria
| Criteria | № proteins |
|---|---|
| Small—less than 30 kDa | 4,362 |
| Cysteine-rich—encodes an amino acid with >4% cysteine residues | 616 |
| Near repeats—less than 5 kb from repetitive sequence | 3,417 |
| Absent from SN79—no blast hits to the avirulent strain | 798 |
| Low gene density—encoded in a region with large intergenic space | 2,414 |
| Secreted—includes a signal peptide | 1,475 |
| Under positive selection | 945 |
| EffectorP (subset of secreted) | 288 |
| OcculterCut proximity to GC-AT border | 451 |
| Core Set—missing in at most one strain | 10,294 |
| Strain specific—only found in SN15 | 108 |
| Membrane bound—not predicted to encode a transmembrane domain | 2,381 |
Note.—Each predicted protein is assessed against each of these criteria and assigned a total score calculated as the sum of the criteria scores (table 5).
Top Effector Candidates with Scores ≥5
| Locus | Predicted Secreted | Absent in SN79 | <1 Gene/2 kb | ≤30 kDa | Positive Selection | ≥4% Cys | AT-Rich Regions | Effector P Score ≥0.9 | Candidate Score ≥5 |
|---|---|---|---|---|---|---|---|---|---|
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | – | ✓ | 7 | |
| ✓ | – | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 7 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | ✓ | ✓ | 7 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | ✓ | ✓ | 7 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | ✓ | ✓ | 7 | |
| ✓ | – | ✓ | ✓ | – | ✓ | ✓ | – | 7 | |
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | – | ✓ | 7 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | ✓ | ✓ | 7 | |
| ✓ | ✓ | ✓ | ✓ | – | – | ✓ | ✓ | 6 | |
| ✓ | ✓ | ✓ | ✓ | – | – | ✓ | ✓ | 6 | |
| ✓ | – | ✓ | ✓ | ✓ | – | ✓ | ✓ | 6 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | – | ✓ | 6 | |
| ✓ | – | ✓ | ✓ | ✓ | ✓ | – | ✓ | 6 | |
| ✓ | ✓ | – | ✓ | ✓ | – | ✓ | ✓ | 6 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | – | ✓ | 6 | |
| ✓ | ✓ | – | ✓ | – | ✓ | ✓ | ✓ | 6 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | – | ✓ | 6 | |
| ✓ | – | ✓ | ✓ | ✓ | – | – | ✓ | 5 | |
| ✓ | – | – | ✓ | – | ✓ | ✓ | ✓ | 5 | |
| ✓ | ✓ | ✓ | ✓ | – | – | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | – | – | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| – | – | ✓ | ✓ | – | – | ✓ | – | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| – | – | ✓ | ✓ | ✓ | ✓ | ✓ | – | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | – | – | – | 5 | |
| – | ✓ | ✓ | ✓ | ✓ | – | ✓ | – | 5 | |
| ✓ | – | – | ✓ | ✓ | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | ✓ | ✓ | – | ✓ | – | – | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | ✓ | ✓ | ✓ | – | – | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | ✓ | – | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | – | ✓ | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| – | ✓ | ✓ | ✓ | – | ✓ | ✓ | – | 5 | |
| ✓ | – | – | ✓ | ✓ | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | – | – | – | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| – | ✓ | ✓ | ✓ | – | ✓ | ✓ | – | 5 | |
| ✓ | ✓ | – | ✓ | ✓ | – | – | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | – | – | ✓ | ✓ | 5 | |
| ✓ | ✓ | – | ✓ | ✓ | – | ✓ | – | 5 | |
| – | ✓ | ✓ | ✓ | ✓ | – | ✓ | – | 5 | |
| – | ✓ | ✓ | ✓ | ✓ | ✓ | – | – | 5 | |
| ✓ | ✓ | – | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | – | ✓ | ✓ | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | ✓ | – | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 | |
| ✓ | – | ✓ | ✓ | – | ✓ | – | ✓ | 5 |
. 6.—Presence–absence detection is more accurate using alignments of de novo assembled sequences than read mappings. (A) Mapped read depth of a region on scaffold_004 in the SN15 reference assembly shows a putative sectional absence of seven genes. (B) Dotplot of the alternate strain’s (P. nodorum IR10_2.1a) de novo assembly at the region (marked red in A) shows that only two of the reference genes (marked in pink) are absent in the alternate strain. Highly variable regions around sectional absences can frustrate mapping algorithms leading to an inflated estimation of absent genes.