| Literature DB >> 33793741 |
Peri A Tobias1,2, Benjamin Schwessinger3, Cecilia H Deng4, Chen Wu4, Chongmei Dong5, Jana Sperschneider6, Ashley Jones3, Zhenyan Lou3, Peng Zhang5, Karanjeet Sandhu5, Grant R Smith7, Josquin Tibbits8, David Chagné9, Robert F Park5.
Abstract
Austropuccinia psidii, originating in South America, is a globally invasive fungal plant pathogen that causes rust disease on Myrtaceae. Several biotypes are recognized, with the most widely distributed pandemic biotype spreading throughout the Asia-Pacific and Oceania regions over the last decade. Austropuccinia psidii has a broad host range with more than 480 myrtaceous species. Since first detected in Australia in 2010, the pathogen has caused the near extinction of at least three species and negatively affected commercial production of several Myrtaceae. To enable molecular and evolutionary studies into A. psidii pathogenicity, we assembled a highly contiguous genome for the pandemic biotype. With an estimated haploid genome size of just over 1 Gb (gigabases), it is the largest assembled fungal genome to date. The genome has undergone massive expansion via distinct transposable element (TE) bursts. Over 90% of the genome is covered by TEs predominantly belonging to the Gypsy superfamily. These TE bursts have likely been followed by deamination events of methylated cytosines to silence the repetitive elements. This in turn led to the depletion of CpG sites in TEs and a very low overall GC content of 33.8%. Compared to other Pucciniales, the intergenic distances are increased by an order of magnitude indicating a general insertion of TEs between genes. Overall, we show how TEs shaped the genome evolution of A. psidii and provide a greatly needed resource for strategic approaches to combat disease spread.Entities:
Keywords: Myrtaceae; Pucciniomycotina; fungal genome evolution; myrtle rust; transposable elements
Mesh:
Substances:
Year: 2021 PMID: 33793741 PMCID: PMC8063080 DOI: 10.1093/g3journal/jkaa015
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
(A) Comparative statistics for the (Roach ) before, (C) after Hi-C scaffolding
| (A) Raw data statistics | 21 SMRT cells | 29 SMRT cells | |||||
|---|---|---|---|---|---|---|---|
| Raw read numbers | 2,696,004 | 5,798,431 | |||||
| Raw bases | 36,099,416,447 | 72,410,490,565 | |||||
| Coverage (X) | 36.1 | 72.4 | |||||
| Corrected mean length (bp) | 11,733 | 21,811 | |||||
| Corrected N50 (bp) | 47,581 | 46,751 | |||||
|
| |||||||
| Assembly statistics | |||||||
|
| |||||||
| Assembly parameters | Diploid | Diploid | |||||
| Number of contigs | 22,474 | 13,361 | |||||
| Total length (bp) | 1,520,827,096 |
| |||||
| N50 (bp) | 96,784 | 287,498 | |||||
| Shortest contig | 1,004 | 1,003 | |||||
| Longest contig | 1,037,919 | 2,557,042 | |||||
| Assembly time (approx.) | 2.5 months | 4.5 months | |||||
|
| |||||||
|
|
|
|
|
|
|
| |
|
| |||||||
| Primary | 3,187 | 520,311 | 619 |
|
|
| |
| Secondary | 8,626 | 159,727 | 1,794 | 957,974 | 933,887,333 | 34.26 | |
|
| |||||||
|
|
|
|
|
|
|
| |
|
| |||||||
| Primary | 66 | 56,243 252 | 7 |
|
| 33.80 | |
| Secondary | 67 | 52,409,407 | 7 | 89,073,602 | 934,744,333 | 34.26 | |
Bold values indicate (A) the final diploid assembled genome size. (B) largest contig, haploid genome size and GC content (C) largest scaffold and haploid genome size.
Figure 1Repetitive element annotation on primary scaffolds. Austropuccinia psidii has a high repeat content driven by expansion of the Gypsy superfamily. Repetitive element annotation on primary scaffolds. (A) Percentages of genome coverage for all repetitive elements and different subcategories with some overlaps. These include TEs of class I (RNA retrotransposons) and class II (DNA transposons), simple sequence repeats (SSR), and unclassifiable repeats (no category). (B) Percentages of genome coverage of Class I and Class II TEs categorized to the class, order, and superfamily levels wherever possible. Repetitive elements were identified using the REPET pipeline, and classifications were inferred from the closest BLAST hit (see Materials and Methods). The TE order is color coded in each Class I and Class II TE plot. (C) A subset of TE superfamilies has driven the genome expansion of A. psidii. The blue line indicates the mean TE family percentage identity distribution relative to the consensus sequence of TE families as a proxy of TE age. Individual points indicate the relative frequency of a specific TE family plotted at their mean percentage identity relative to the consensus sequence. Points are color coded according to the TE superfamily. Only highly abundant TE families are included in the plot.
Figure 2AT-richness in the A. psidii genome is caused by TEs. (A) Austropuccinia psidii displays two distinct genome compartments in relation to GC content. Relative GC content of genome regions identified by genes, TEs, 1-kb sliding windows, or as identified by OcculterCut is shown. (B) AT-enriched regions are specific to A. psidii. Relative GC content of genome regions identified by OcculterCut in A. psidii and P. striiformis f. sp. tritici (see also Supplementary Figure S2). (C) Karyoplots of scaffolds APSI_025 and APSI_P027. Gene and repeat density are shown at 20,000-bp windows. Mean GC content of 33.8% is shown with a red line. OcculterCut regions of GC-content segmentation are shown as black lines.
Figure 3Depletion of CpG dinucleotides in TEs leads to AT-richness over time and not classic RIP mutations. Austropuccinia psidii does not show classic RIP signatures in its TEs (A, B). (A) TpA/ApT and (B) (CpA + TpG)/(ApC + GpT) ratios in A. psidii and two ascomycetes with and without RIP mutations. Dinucleotide ratios are plotted for regions grouped according to their identity of non-TE, Gypsy, and Copia superfamily. The horizontal dashed black line indicates the median of the dinucleotide ratio in non-TE regions. Deviation from this median in other genome regions indicates RIP mutations as shown in M. brunnea and R. commune. AT-richness and depletion of CpG increase over time in TE sequences of the A. psidii genome (C, D). (C) Percentage GC content of TE consensus sequences grouped by percentage identity relative to the consensus sequence as proxy for TE age. Plot shows younger TEs on the right. Horizontal lines show median and quantiles. Black stars indicate the weighted mean relative to genome coverage. Kruskal–Wallis H-test indicates a significant difference between the samples (P-value < 2.1e−57). Lines with * indicate significant differences with P-values < 1e−5 using Mann–Whitney U tests with multiple testing correction. (D) Mean CpG content per kbp of individual TE insertions grouped by percentage identity relative to the consensus sequence as proxy for TE age. Plot shows younger TEs on the right. Horizontal lines show median and quantiles. Kruskal–Wallis H-test indicates a significant difference between the samples (P-value = 0.0).
Telomere repeat regions identified for the A. psidii primary assembly before and after scaffolding
|
| >1000 | 800–999 | 500–799 | 90–499 | <25 | Total |
|---|---|---|---|---|---|---|
| Average length (nt) | 8,159 | 5,223 | 3,754 | 1,409 | 87 | |
| Prescaffold | 6 | 6 | 4 | 8 | 5 | 29 |
| Postscaffold | 10 | 3 | 2 | 9 | 5 | 29 |
Numbers and average length in nucleotides (nt) of contigs/scaffolds with n(TTAGGG) sequences. n = number of hexamers, for example >1,000 means more than 1,000 × (TTAGGG) or more than 6,000 nt.
Figure 4Structural annotation comparisons of gene (A) and intergenic (B) length, incorporating UTR, across six species of Pucciniales reveal dramatically large intergenic expansions in A. psidii.
Figure 5Austropuccinia psidii putative effectors are not found in gene sparse regions. Nearest-neighbor gene distance density hexplots for three gene categories including all genes, BUSCOs, and candidate effectors. Each subplot represents a distance density hexplot with the log10 3′-flanking and 5′-flanking distance to the nearest-neighboring gene plotted along the x-axis and y-axis, respectively.
Comparative assembly data across several available Pucciniales genomes
| Common name (rust) | A.p. | P.s.t. | P.t. | P.g. | P.c. | P.s. | M.l-p. |
|---|---|---|---|---|---|---|---|
| NCBI/ENA project |
| PRJNA396589 | PRJNA36323 | PRJNA18535 | PRJNA398546 | PRJNA277993 | PRJNA242542 |
| Genome (Mbp) |
| 83 | 135 | 89 | 150 | 100 | 101 |
| Scaffolds |
| 156 | 14,818 | 393 | 1,636 | 15,715 | 462 |
| N50 |
| 1,304,018 | 10,369 | 39,493 | 163,229 | 19,078 | 112,314 |
| L50 |
| 57 | 2,866 | 557 | 241 | 1,530 | 265 |
| Repeats (%) |
| 54 | 40 | nd | nd | nd | 45 |
| Coding genes |
| 15,928 | 15,685 | 15,979 | 26,323 | 21,078 | 16,372 |
| GC (%) |
| 44 | 37 | 44 | 45 | 45 | 41 |
Abbreviated names indicated; A.p. (Austropuccinia psidii pandemic biotype from the current study in bold), P.s.t. (Puccinia striiformis f. sp. tritici), P.t. (P. triticina), P.g. (P. graminis f. sp. tritici), P.c. (P. coronata), P.s. (P. sorghi), and M.l-p. (M. larici-populina). nd, no data found.
Predicted coding genes.
Figure 6(A) Protein–protein species rooted tree based on multiple species alignment of orthogroups identified with Orthofinder (Emms and Kelly 2019). Scale represents substitutions per site and internal node values are species tree inference from all genes (STAG) supports (Emms and Kelly 2018). (B) Protein–protein comparisons across rust fungal species with each concentric ring indicating a different species. Numbers external to the rings represent counts of ortholog groups and numbers within each concentric ring represent the number of ortholog genes in that species per section. The figure shows every possible combination of species included in this proteome ortholog analysis, using concentric circles graphically present an overview of “closeness” between the genomes. The species color code is consistent for Figures 5 and 6.