| Literature DB >> 32345704 |
Ramisah Mohd Shah1, Angela H Williams1,2, James K Hane3,1, Julie A Lawrence3, Lina M Farfan-Caceres3, Johannes W Debler3, Richard P Oliver1,2, Robert C Lee4.
Abstract
Ascochyta rabiei is the causal organism of ascochyta blight of chickpea and is present in chickpea crops worldwide. Here we report the release of a high-quality PacBio genome assembly for the Australian A. rabiei isolate ArME14. We compare the ArME14 genome assembly with an Illumina assembly for Indian A. rabiei isolate, ArD2. The ArME14 assembly has gapless sequences for nine chromosomes with telomere sequences at both ends and 13 large contig sequences that extend to one telomere. The total length of the ArME14 assembly was 40,927,385 bp, which was 6.26 Mb longer than the ArD2 assembly. Division of the genome by OcculterCut into GC-balanced and AT-dominant segments reveals 21% of the genome contains gene-sparse, AT-rich isochores. Transposable elements and repetitive DNA sequences in the ArME14 assembly made up 15% of the genome. A total of 11,257 protein-coding genes were predicted compared with 10,596 for ArD2. Many of the predicted genes missing from the ArD2 assembly were in genomic regions adjacent to AT-rich sequence. We compared the complement of predicted transcription factors and secreted proteins for the two A. rabiei genome assemblies and found that the isolates contain almost the same set of proteins. The small number of differences could represent real differences in the gene complement between isolates or possibly result from the different sequencing methods used. Prediction pipelines were applied for carbohydrate-active enzymes, secondary metabolite clusters and putative protein effectors. We predict that ArME14 contains between 450 and 650 CAZymes, 39 putative protein effectors and 26 secondary metabolite clusters.Entities:
Keywords: Dothideomycetes; PacBio; Pleosporales; chickpea; plant pathogen
Mesh:
Year: 2020 PMID: 32345704 PMCID: PMC7341154 DOI: 10.1534/g3.120.401265
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary assembly and annotation statistics for Illumina sequencing of A. rabiei isolate, ArD2 (Verma ) and PacBio SMRT sequencing for ArME14
| Assembly statistics | ArD2 Illumina (13) | ArME14 PacBio SMRT |
|---|---|---|
| Genome size (bp) | 34,658,250 | 40,927,385 |
| Total sequenced bases | 100 Gb | ∼6.8 Gb |
| Coverage | 178x | 166x (928,353 reads) |
| Number of scaffolds/contigs | 338 | 33 |
| Largest scaffold/contig size (bp) | 1,160,210 | 3,373,759 |
| L50 | 64 | 9 |
| N50 (bp) | 154,808 | 1,812,190 |
| GC (%) | 51.6 | 49.2 |
| % Repetitive sequence | 9.9 | 12.6 |
| Complete chromosomes | — | 12 |
| Number of protein coding genes | 10,596 | 11,257 |
| Predicted secreted proteins | 758 | 1,145 |
| Predicted effectors | 328 | 39 |
| Predicted sec. metabolite clusters | 26 | 26 |
| Predicted no. of CAZymes | 1,727 | 451 |
Scaffolds for ArD2 Illumina assembly (GCA_001630375.1).
Contigs for ArME14 PacBio SMRT assembly.
Differences in numbers likely due largely to different selection criteria.
Secretome and effector predictions for ArD2 assembly using the same methods applied to ArME14.
Unknown prediction method for secondary metabolite clusters.
CAZyme prediction using dbCAN2 meta server in this study.
Figure 1Genome contigs for the reference assembly of A. rabiei ArME14, produced from PacBio SMRT sequencing with polishing using Illumina sequencing. Nuclear contigs are labeled ctg01 to ctg33 as archived in NCBI BioProject PRJNA510692 and the mitochondrial contig is labeled mito. Gene-dense regions of the genome are shown as dark-shaded blocks, joined by gene-sparse and interspersed repeat-rich regions indicated by thin lines. Telomeres are indicated in the figure by triangles at the ends of respective contigs.
Figure 2Alignment of A. rabiei ArD2 scaffolds to the 33 nuclear ArME14 contigs using NUCMER. Unique matches representing homologous nucleotide sequences between the two assemblies are indicated in blue, and repeat-rich nucleotide sequence that characterizes repetitive and AT-rich genomic regions are indicated by non-unique matches shown in red. Presumed non-assembled or absent sections from the ArD2 genome are represented by white space along each of the ArME14 reference contigs.
Transposable element and repetitive DNA sequences from A. rabiei ArME14
| Class | Type | Number of sequences | % of total sequences | Total nucleotides | % of total nucleotides | Average size |
|---|---|---|---|---|---|---|
| LTR | 780 | 43 | 3336780 | 54 | 4278 | |
| LINE | 177 | 9.7 | 450418 | 7.3 | 2545 | |
| LARD | 26 | 1.4 | 194939 | 3.2 | 7498 | |
| TRIM | 5 | 0.3 | 5668 | 0.1 | 1134 | |
| SINE | 2 | 0.1 | 1034 | 0.02 | 517 | |
| TIR | 683 | 38 | 1820841 | 30 | 2666 | |
| Helitron | 44 | 2.4 | 171920 | 2.8 | 3907 | |
| MITE | 7 | 0.4 | 4193 | 0.1 | 599 | |
| No cat | 82 | 4.5 | 137974 | 2.2 | 1682 | |
| Host gene | 4 | 0.2 | 8223 | 0.1 | 2056 | |
| SSR | 6 | 0.3 | 5102 | 0.1 | 2056 | |
“No Cat” and “Host gene” are categories assigned by the PiRATE Galaxy server, and describe unclassified (no category) and potential host gene, respectively.
Figure 3Summary of genome compartmentalisation into GC-equilibrated and AT-rich regions of the A. rabiei genome assemblies. (A) GC content (%) distribution for Dothideomycete genome assemblies as calculated by OcculterCut (Testa ). (B) Size distributions of AT-rich and GC-balanced regions for A. rabiei ArME14 (note logarithmic scale).
Summary of potential pathogenicity genome features, including: secondary metabolite clusters, predicted effector genes and CAZyme genes identified from the A. rabiei ArME14 genome assembly. Detailed tables are provided in Supplementary File, File_S4
| Class | Number |
|---|---|
| EffP > 0.8, MW < 25KDa | 39 |
| EffP > 0.8, MW < 15KDa | 27 |
| EffP > 0.9, MW < 15KDa | 15 |
| 26 | |
| T1PKS | 7 |
| T3PKS | 1 |
| NRPS | 2 |
| NRPS-like | 7 |
| NRPS/NRPS-like – T1PKS | 4 |
| Indole | 1 |
| Terpene | 4 |
| 451 | |
| AA - Auxiliary activities | 77 |
| CBM - Carbohydrate-binding module | 3 |
| CE - Carbohydrate esterase | 31 |
| GH - Glycoside hydrolase | 227 |
| GT - Glycosyl transferase | 82 |
| PL - Polysaccharide lyase | 31 |
mature protein MW.
Figure 4CIRCOS plot of key features of the A. rabiei ArME14 reference genome. Tracks labeled from outside: (A) The 26 largest contigs with size scale in Mb; (B) Annotated genes in scale (green) and telomeres (not to scale) (blue); (C) Locations (not to scale) of predicted effector genes (red), CAZyme genes (green), and predicted secondary metabolite clusters (purple and in scale); (D) Gene density with 20 Kb moving average window; (E) percent GC content with 50 Kb moving average window; (F) Transposable elements and repetitive DNA sequence regions. LINE, LTR, LARD, SINE and TRIM elements (blue), TIR, Helitron and MITE elements (red), SSRs (orange), No category, LTR/TIR, PLE/LARD and potential host gene (gray); (G) TE and repetitive DNA density with 20 Kb moving average window. An SVG version of this figure (Figure_S2) is included in the Supplementary Data to enable closer inspection of genome features.