| Literature DB >> 26840125 |
Robert A Syme1, Kar-Chun Tan1, James K Hane1,2, Kejal Dodhia1, Thomas Stoll3, Marcus Hastie3, Eiko Furuki1, Simon R Ellwood1, Angela H Williams1, Yew-Foon Tan4, Alison C Testa1, Jeffrey J Gorman3, Richard P Oliver1.
Abstract
Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26840125 PMCID: PMC4739733 DOI: 10.1371/journal.pone.0147221
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of corrections made to the P. nodorum SN15 genome assembly.
| Description | Before | After | Change |
|---|---|---|---|
| 107 | 91 | -16 | |
| 0 | 12,911 | 12,911 | |
| 0 | 16,820 | 16,820 | |
| 0 | 1,005 | 1,005 | |
| 164,388 | 126,887 | -37501 | |
| 93,867,773 | 94,594,136 | 726,363 | |
| 0.5623 | 0.4851 | -0.0772 | |
| 0.0615 | 6.2e-03 | -0.0553 | |
| 99.6402 | 99.6427 | 2.5e-3 | |
| 5,872,361,103 | 10,842,396,864 | 4,970,035,761 | |
| 0.0348 | 0.0043 | -0.0305 | |
| 95.0119 | 96.1274 | 1.1155 |
a deletion of erroneous sequence from the original assembly.
b insertion of sequence missing from the original assembly.
c rate of mismatched based relative to the reference sequence over all aligned regions.
d number of short insertions/deletions observed in reads / total aligned bases.
e percentage of reads with aligned mate pair.
New scaffold joins improving the P. nodorum SN15 genome assembly.
Joins were either predicted by mesosyntenic patterns or by terminal matches to long insert Sanger sequence reads. Orientations are indicated relative to that of scaffolds of the original assembly.
| Center scaffold | Right scaffold | Orientation | Evidence |
|---|---|---|---|
| scaffold_26 | Mesosynteny | ||
| scaffold_48 | Mesosynteny | ||
| scaffold_48 | Mesosynteny | ||
| scaffold_55 | Mesosynteny | ||
| scaffold_107 | Long-insert library | ||
| scaffold_105 | Long-insert library | ||
| scaffold_36 | Long-insert library | ||
| scaffold_77 | Long-insert library | ||
| scaffold_49 | Long-insert library | ||
| scaffold_64 | Long-insert library | ||
| scaffold_72 | Long-insert library | ||
| scaffold_61 | Long-insert library | ||
| scaffold_85 | Long-insert library | ||
| scaffold_17 | Long-insert library |
Summary of the characteristics of annotated P. nodorum SN15 genes and their protein products before and after manual re-annotation.
Manually-annotated genes are longer, have more annotated transcripts, are more likely to accord with proteomic data, and are more likely to have conserved protein domains.
| Before | After | |
|---|---|---|
| 12,199 | 13,569 | |
| 2.6 | 2.5 | |
| 1,271.4 | 1,368.7 | |
| 600 | 1010 | |
| 1,616 | 2,057 | |
| 89.9 | 66.6 | |
| 84.3 | 52.4 | |
| 11,464 | 13,248 | |
| 11,287 | 13,184 | |
| 1,122 | 1,476 | |
| 2,665 | 4,352 | |
| 150 | 0 | |
| 0 | 299 |
Fig 1Sources of evidence used to re-annotate P. nodorum SN15 genes.
This data supported 12,143 annotations with at least one source of experimental support. Additional annotations were also supported by non-experimental sources including the presence of conserved domains or homology to genes of other species.
Fig 2Coverage of the top BLASTP hit for re-annotated P. nodorum SN15 predicted proteins.
The manually curated set (left) agrees more closely with sequences in the NCBI Protein NR database than the original set of annotations (right). Contour lines (blue) indicate ‘kernel density’, depicting the relative number of proteins within a localised region of the plot.
Summary of carbohydrate-active enzyme (CAZyme) family numbers in P. nodorum SN15 before and after manual re-annotation.
| CAZyme Family | Original match count | Corrected match count | |
|---|---|---|---|
| AA | 122 | 139 | |
| CBM | 64 | 110 | |
| CE | 142 | 174 | |
| - | 1 | 1 | |
| GH | 264 | 280 | |
| GT | 96 | 105 | |
| PL | 10 | 10 |
Fig 3Proportion of cysteines in P. nodorum SN15 predicted proteins before and after gene re-annotation.
New proteins are more likely to be cysteine-rich. Of the 54 cysteine-rich proteins in the new annotation set (> 9% Cys by length), 16 are the products of newly annotated loci.
Summary of cysteine-rich protein-products of previously unannotated genes in P. nodorum SN15.
Novel cysteine-rich annotations have few BLAST hits and include potential effector candidate genes e.g. SNOG_30451- a degraded and truncated homolog of Tox1.
| Gene name | Protein length | Cysteine count | Cysteine percentage | Blast hits |
|---|---|---|---|---|
| 66 | 9 | 13.6 | No | |
| 74 | 10 | 13.5 | No | |
| 94 | 11 | 11.7 | No | |
| 70 | 8 | 11.4 | No | |
| 53 | 6 | 11.3 | No | |
| 56 | 6 | 10.7 | No | |
| 355 | 37 | 10.4 | Carbohydrate-binding | |
| 58 | 6 | 10.3 | No | |
| 79 | 8 | 10.1 | No | |
| 60 | 6 | 10 | No | |
| 62 | 6 | 9.7 | Fungal hypothetical genes | |
| 104 | 10 | 9.6 | No | |
| 84 | 8 | 9.5 | No | |
| 84 | 8 | 9.5 | Tox1 | |
| 76 | 7 | 9.2 | No | |
| 55 | 5 | 9.1 | No |
Polyketide synthase genes of P. nodorum SN15.
| Gene name | PKS Type |
|---|---|
| Type III PKS | |
| Hybrid Nonribosomal peptide synthetase/PKS | |
| Partially reducing-PKS | |
| Highly reducing-PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Highly reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS | |
| Non-reducing -PKS |
Summary of changes to protein-products with functional annotations of high relevance to plant pathogenicity in P. nodorum SN15, before and after re-annotation.
| Pfam ID | Domain name | Before | After |
|---|---|---|---|
| Hce2 | |||
| Ricin_B_lectin | |||
| CAP | 4 | 4 | |
| NEP | 0 | 0 | |
| LysM | 2 | 2 | |
| NPP1 | 2 | 2 | |
| Toxin_ToxA | 1 | 1 |
Fig 4A whole-genome dotplot of nucmer matches between scaffolds of P. nodorum and of P. tritici-repentis.
The ‘dots-in-boxes’ pattern is indicative of mesosyntenic relationships between chromosomes. P. nodorum scaffolds 8 and 26 are ‘mesosyntenic’ versus P. tritici-repentis scaffold 4, as indicated by black boxes.