| Literature DB >> 25917918 |
Jonathan Grandaubert1, Amitava Bhattacharyya1, Eva H Stukenbrock2.
Abstract
The fungal pathogen Zymoseptoria tritici (synonym Mycosphaerella graminicola) is a prominent pathogen of wheat. The reference genome of the isolate IPO323 is one of the best-assembled eukaryotic genomes and encodes more than 10,000 predicted genes. However, a large proportion of the previously annotated gene models are incomplete, with either no start or no stop codons. The availability of RNA-seq data allows better predictions of gene structure. We here used two different RNA-seq datasets, de novo transcriptome assemblies, homology-based comparisons, and trained ab initio gene callers to generate a new gene annotation of Z. tritici IPO323. The annotation pipeline was also applied to re-sequenced genomes of three closely related species of Z. tritici: Z. pseudotritici, Z. ardabiliae, and Z. brevis. Comparative analyses of the predicted gene models using the four Zymoseptoria species revealed sets of species-specific orphan genes enriched with putative pathogenicity-related genes encoding small secreted proteins that may play essential roles in virulence and host specificity. De novo repeat identification allowed us to show that few families of transposable elements are shared between Zymoseptoria species while we observe many species-specific invasions and expansions. The annotation data presented here provide a high-quality resource for future studies of Z. tritici and its sister species and provide detailed insight into gene and genome evolution of fungal plant pathogens.Entities:
Keywords: comparative genomics; gene annotation; species-specific genes; transposable elements
Mesh:
Substances:
Year: 2015 PMID: 25917918 PMCID: PMC4502367 DOI: 10.1534/g3.115.017731
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Annotation features for the four members of the Zymoseptoria species complex
| Assembly size (Mb) | 39.7 | 32.7 | 31.5 | 31.9 | |
| No. of scaffolds | 21 | 1164 | 868 | 6116 | |
| Previous annotation | New annotation | ||||
| No. of transcripts ( | — | 68,653 | 16,056 | 16,353 | 17,870 |
| No. of transcripts (genome-guided) | — | 73,127 | 19,076 | 20,193 | 15,331 |
| No. of gene models | — | 13,847 | 12,027 | 12,719 | 10,649 |
| No. of predicted gene models | 10,952 | 11,839 | 11,044 | 10,787 | 10,557 |
| No. of complete gene models | 9397 | 11,795 | 10,957 | 10,686 | 10,342 |
| No. of partial gene models | 1555 | 44 | 87 | 101 | 215 |
| No. of gene models with RNA-seq support | 9423 | 10,048 | 7618 | 8297 | 9939 |
| Average gene length (bp) | 1599.8 | 1620.9 | 1594.3 | 1584.9 | 1592.8 |
| Average transcript length (bp) | 1388.8 | 1462.1 | 1459.4 | 1440.9 | 1462.5 |
| Average protein length (aa) | 436.6 | 487.8 | 488.0 | 482.0 | 487.5 |
| No. of exons | 28,309 | 30,068 | 26,699 | 26,231 | 25,367 |
| Average exon length (bp) | 505.7 | 575.2 | 604.7 | 593.7 | 608.1 |
| Average no. of exons per gene | 2.59 | 2.54 | 2.42 | 2.43 | 2.40 |
| No. of introns | 17,357 | 18,226 | 15,653 | 15,445 | 14,809 |
| Average intron length (bp) | 124.1 | 91.6 | 90.9 | 98.4 | 94.6 |
| Average no. of introns per gene | 2.27 | 2.27 | 2.16 | 2.16 | 2.16 |
| No. of genes with introns | 7654 | 8044 | 7234 | 7165 | 6883 |
| Gene density (genes/Mb) | 276.0 | 298.3 | 338.2 | 330.3 | 331.1 |
| % of predicted proteins | — | 22.9 | 18.8 | 19.2 | 18.1 |
| % of hypothetical proteins | — | 31.3 | 31.2 | 31.4 | 32.8 |
| % of | — | 45.8 | 50 | 49.4 | 49.1 |
| No. of secreted proteins | 970 | 874 | 838 | 965 | 700 |
| No. of small secreted proteins (<300 aa) | 441 | 441 | 399 | 540 | 331 |
Based on alignment with reconstructed transcripts (e-value < 1e-5).
Extracted from Morais do Amaral et al. 2012.
Figure 1Venn diagram showing the distribution of predicted models in Z. tritici: Z. pseudotritici, Z. ardabiliae, and Z. brevis. The categorizations of core Zymoseptoria genes, orphan genes, and genes shared by three or two species were performed using a detailed characterization of gene orthology.
Transposable element content in genomes of the Zymoseptoria species complex
| Family No. | DNA Amount (kb) | % of Genome | Family No. | DNA Amount (kb) | % of Genome | Family No. | DNA Amount (kb) | % of Genome | Family No. | DNA Amount (kb) | % of Genome | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LTR | RLC | 12 | 1017 | 2.56 | 6 | 90 | 0.23 | 5 | 232 | 0.62 | 7 | 827 | 2.08 |
| RLG | 14 | 2649 | 6.67 | 17 | 2906 | 7.50 | 16 | 1074 | 2.88 | 18 | 3616 | 9.09 | |
| LINE | RII | 4 | 737 | 1.86 | 0 | 0 | 0.00 | 3 | 76 | 0.20 | 4 | 547 | 1.38 |
| RIL | 2 | 747 | 1.88 | 0 | 0 | 0.00 | 1 | 32 | 0.09 | 0 | 0 | 0.00 | |
| RIX | 1 | 5 | 0.01 | 1 | 83 | 0.22 | 1 | 67 | 0.18 | 1 | 193 | 0.48 | |
| DIRS | RYN | 1 | 23 | 0.06 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 |
| SINE | RSX | 1 | 1 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 |
| TRIM | RLX-TRIM | 4 | 35 | 0.09 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 1 | 4 | 0.01 |
| TIR | DTT | 14 | 246 | 0.62 | 3 | 51 | 0.13 | 4 | 29 | 0.08 | 9 | 254 | 0.64 |
| DTA | 7 | 150 | 0.38 | 1 | 35 | 0.09 | 1 | 4 | 0.01 | 5 | 175 | 0.44 | |
| DTH | 10 | 184 | 0.46 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 11 | 292 | 0.74 | |
| DTM | 7 | 145 | 0.36 | 1 | 44 | 0.11 | 0 | 0 | 0.00 | 4 | 213 | 0.54 | |
| DTX | 6 | 364 | 0.92 | 6 | 432 | 1.11 | 5 | 76 | 0.20 | 3 | 315 | 0.79 | |
| Unknown | DXX | 2 | 40 | 0.10 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 |
| Crypton | DYX | 1 | 6 | 0.02 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 |
| Helitron | DHH | 4 | 470 | 1.18 | 7 | 357 | 0.92 | 1 | 39 | 0.10 | 5 | 391 | 0.98 |
| Maverick | DMM | 0 | 0 | 0.00 | 2 | 476 | 1.23 | 0 | 0 | 0.00 | 0 | 0 | 0.00 |
| MITE | DTX-MITE | 11 | 48 | 0.12 | 1 | 2 | 0.01 | 1 | 5 | 0.01 | 3 | 11 | 0.03 |
| NoCat | NoCat | 10 | 516 | 1.30 | 60 | 1518 | 3.92 | 79 | 1303 | 3.49 | 93 | 2634 | 6.62 |
Figure 2Transposable elements (TEs) characteristics along the 21 chromosomes of Zymoseptoria tritici. (A) Proportions of TEs in the chromosomes of Z. tritici. The red dashed lines show the mean values of TEs on core (1–13) and accessory chromosomes (14–21). (B) The distribution of five major TE classes across core and accessory chromosomes in Z. tritici show a more or less similar distribution and frequency on the two types of chromosomes.
Gene models with protein domains associated with transposable elements
| Protein ID | Length (aa) | Domain Name | Domain Description | Interpro or Pfam Accession No. | Overlapping Repeat Family | RNA-seq Support |
|---|---|---|---|---|---|---|
| Zt09_chr_1_00001 | 143 | RVT_1 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF00078 | RIL_element2_ZTIPO323 | — |
| Zt09_chr_1_00017 | 272 | DDE_Tnp_4 | DDE superfamily endonuclease | PF13359 | DTH_element5_ZTIPO323 | — |
| Zt09_chr_1_01076 | 477 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element10_ZTIPO323 | Yes |
| Zt09_chr_2_00093 | 477 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element10_ZTIPO323 | — |
| Zt09_chr_2_00258 | 409 | DDE_Tnp_1_7 | Transposase IS4 | PF13843 | DTX_element1_ZPST04IR55 | Yes |
| Zt09_chr_2_00383 | 648 | DDE_1 | DDE superfamily endonuclease | PF03184 | DTT_element1_ZTIPO323 | Yes |
| Zt09_chr_2_00481 | 532 | Chromo | Chromo (CHRromatin Organization MOdifier) domain | PF00385 | RLG_element5_ZPST04IR55 | — |
| Zt09_chr_2_00619 | 272 | RVT_2 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF07727 | RLC_element3_ZTIPO323 | — |
| Zt09_chr_2_00906 | 219 | DDE_1 | DDE superfamily endonuclease | PF03184 | DTT_element5_ZTIPO323 | Yes |
| Zt09_chr_2_00907 | 269 | DDE_1 | DDE superfamily endonuclease | PF03184 | DTT_element5_ZTIPO323 | Yes |
| Zt09_chr_3_00204 | 734 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | DTA_element2_ZTIPO323 | Yes |
| Zt09_chr_3_00229 | 349 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element14_ZTIPO323 | Yes |
| Zt09_chr_4_00051 | 195 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element4_ZB163 | — |
| Zt09_chr_4_00288 | 371 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | DTA_element3_ZTIPO323 | Yes |
| Zt09_chr_5_00137 | 477 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element10_ZTIPO323 | Yes |
| Zt09_chr_7_00478 | 338 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | DTA_element6_ZTIPO323 | Yes |
| Zt09_chr_7_00479 | 108 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | DTA_element6_ZTIPO323 | Yes |
| Zt09_chr_9_00003 | 1506 | RVT_1 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF00078 | RIL_element2_ZTIPO323 | — |
| Zt09_chr_9_00154 | 621 | DDE_1 | DDE superfamily endonuclease | PF03184 | DTT_element1_ZTIPO323 | Yes |
| Zt09_chr_9_00557 | 1061 | rve | Integrase core domain | PF00665 | RLC_element7_ZTIPO323 | Yes |
| Zt09_chr_9_00615 | 458 | DDE_Tnp_1_7 | Transposase IS4 | PF13843 | DTX_element1_ZAST11IR611 | Yes |
| Zt09_chr_10_00001 | 218 | RVT_2 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF07727 | RLC_element7_ZTIPO323 | — |
| Zt09_chr_11_00364 | 162 | Chromo | Chromo (CHRromatin Organization MOdifier) domain | PF00385 | RLG_element7_ZPST04IR55 | — |
| Zt09_chr_13_00054 | 869 | DDE_3 | DDE superfamily endonuclease | PF13358 | DTT_element10_ZTIPO323 | Yes |
| Zt09_chr_13_00057 | 150 | RVT_2 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF07727 | RLC_element2_ZTIPO323 | — |
| Zt09_chr_13_00271 | 480 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | DTA_element1_ZTIPO323 | Yes |
| Zt09_chr_17_00004 | 876 | RVT_1 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF00078 | RIL_element2_ZTIPO323 | — |
| Zt09_chr_17_00088 | 523 | RVT_1 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF00078 | RIL_element2_ZTIPO323 | — |
| Zt09_chr_18_00008 | 750 | RNaseH-like_dom | Ribonuclease H-like domain | IPR012337 | NoCat_element60_ZAST11IR611 | — |
| Zt09_chr_19_00003 | 169 | RVT_2 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF07727 | RLC_element5_ZTIPO323 | — |
| Zt09_chr_19_00040 | 947 | RVT_2 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF07727 | RLC_element9_ZTIPO323 | — |
| Zt09_chr_19_00041 | 107 | UBN2_3 | gag-polypeptide of LTR copia-type | PF14244 | RLC_element9_ZTIPO323 | — |
| Zt09_chr_21_00003 | 1609 | RVT_1 | Reverse-transcriptase (RNA-dependent DNA polymerase) | PF00078 | RLG_element9_ZTIPO323 | — |