| Literature DB >> 30337918 |
Soumya Rao1,2, Saphy Sharda1, Vineesha Oddi3, Madhusudan R Nandineni1,4.
Abstract
The ascomycete fungus Colletotrichum truncatum is a major phytopathogen with a broad host range which causes anthracnose disease of chilli. The genome sequencing of this fungus led to the discovery of functional categories of genes that may play important roles in fungal pathogenicity. However, the presence of gaps in C. truncatum draft assembly prevented the accurate prediction of repetitive elements, which are the key players to determine the genome architecture and drive evolution and host adaptation. We re-sequenced its genome using single-molecule real-time (SMRT) sequencing technology to obtain a refined assembly with lesser and smaller gaps and ambiguities. This enabled us to study its genome architecture by characterising the repetitive sequences like transposable elements (TEs) and simple sequence repeats (SSRs), which constituted 4.9 and 0.38% of the assembled genome, respectively. The comparative analysis among different Colletotrichum species revealed the extensive repeat rich regions, dominated by Gypsy superfamily of long terminal repeats (LTRs), and the differential composition of SSRs in their genomes. Our study revealed a recent burst of LTR amplification in C. truncatum, C. higginsianum, and C. scovillei. TEs in C. truncatum were significantly associated with secretome, effectors and genes in secondary metabolism clusters. Some of the TE families in C. truncatum showed cytosine to thymine transitions indicative of repeat-induced point mutation (RIP). C. orbiculare and C. graminicola showed strong signatures of RIP across their genomes and "two-speed" genomes with extensive AT-rich and gene-sparse regions. Comparative genomic analyses of Colletotrichum species provided an insight into the species-specific SSR profiles. The SSRs in the coding and non-coding regions of the genome revealed the composition of trinucleotide repeat motifs in exons with potential to alter the translated protein structure through amino acid repeats. This is the first genome-wide study of TEs and SSRs in C. truncatum and their comparative analysis with six other Colletotrichum species, which would serve as a useful resource for future research to get insights into the potential role of TEs in genome expansion and evolution of Colletotrichum fungi and for development of SSR-based molecular markers for population genomic studies.Entities:
Keywords: Colletotrichum truncatum; comparative genomics; repetitive DNA sequences; simple sequence repeats (SSRs); transposable elements (TEs); whole genome sequence
Year: 2018 PMID: 30337918 PMCID: PMC6180176 DOI: 10.3389/fmicb.2018.02367
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
The Colletotrichum species used for the comparative analysis of TEs.
| Organism | Host | Country of isolation | Genome size (Mb) | GC% | Number of scaffolds | Availability of gene annotation (.gff) file | Accession number (NCBI/dryad) |
|---|---|---|---|---|---|---|---|
| India | 57.91 | 49.38 | 70 | Yes | NBAU02000000 | ||
| United States | 50.91 | 49.1 | 654 | Yes | ACOD00000000.1 | ||
| Trinidad and Tobago | 50.72 | 54.4 | 25 | Yes | LTAN00000000.1 | ||
| South Korea | 52.13 | 51.7 | 34 | No | LUXP01000000.1 | ||
| Japan | 52.33 | 50.06 | 512 | Yes | MPGH00000000.1 | ||
| United Kingdom | 48.56 | 51.1 | 321 | Yes | MJBS00000000.1 | ||
| Japan | 90.83 | 37.52 | 526 | No | dryad.45076 | ||
Summary of C. truncatum assemblies.
| Statistics | Illumina∗ | Illumina + PacBio# |
|---|---|---|
| Number of scaffolds | 80 | 70 |
| Total length (Mb) | 55.37 | 57.91 |
| Mean scaffold length (kb) | 683.5 | 827 |
| Number of gaps | 6,793 | 3,738 |
| Total gap length (Mb) | 2.6 | 1.3 |
| Mean gap length | 387 | 351 |
| Percent | 4.75% | 2.26% |
| Protein coding genes | 13,724 | 13,768 |
| Secretome | 1245 | 1213 |
| Effectors | 310 | 311 |
| SM clusters | 73 | 64 |
| Complete genes | 3563 | 3576 |
| Fragmented genes | 150 | 141 |
| Missing genes | 12 | 8 |
The composition of major families of TEs in C. truncatum.
| Class | Count | Size (bp) | Proportion of genome (%) | |
|---|---|---|---|---|
| Total sequences | 70 | 57912832 | ||
| Ancestral repeats | 19879 | 3446297 | 5.95 | |
| Lineage specific repeats | 259 | 74689 | 0.13 | |
| Total repeats | 20138 | 3520986 | 6.08 | |
| Total TEs | 6035 | 2831668 | 4.89 | |
| LTR | Gypsy | 833 | 1350797 | 2.33 |
| Copia | 195 | 397087 | 0.69 | |
| LINE | CRE-Cnl1 | 56 | 60017 | 0.10 |
| CRE | 30 | 22296 | 0.04 | |
| SINE | 120 | 10372 | 0.02 | |
| DNA | MULE-MuDR | 52 | 93987 | 0.16 |
| TcMar-Fot1 | 27 | 21534 | 0.04 | |
| PiggyBac | 70 | 13388 | 0.02 | |
| Unknown | 1804 | 559668 | 0.97 | |
| Unspecified | 2848 | 302522 | 0.52 | |
The comparison of major TE families among Colletotrichum species.
| TE family | |||||||
|---|---|---|---|---|---|---|---|
| Gypsy | 2.33 | 1.23 | 5.18 | 3.94 | 1.51 | 6.12 | 3.41 |
| Copia | 0.69 | 0.68 | 3.16 | 12.08 | - | 2.38 | - |
| TcMar-Fot1 | 0.04 | 1.63 | 2.17 | - | 0.65 | 0.26 | 0.48 |
| Unknown | 0.97 | 1.03 | 3.56 | 26.41 | 1.69 | 0.15 | 0.26 |
| Unspecified | 0.52 | 0.46 | 0.38 | 0.19 | 0.41 | 0.43 | 0.40 |
| Total TEs | 4.89 | 6.01 | 14.79 | 44.88 | 4.31 | 9.54 | 5.41 |
The analysis of RIP indices and dinucleotide bias in TE families.
| Repeat family | TpA/ApT index | (CpA + TpG)/ (ApC + GpT) Index | Dinucleotide bias |
|---|---|---|---|
| Copia-1 | 1.64309 | 0.123305 | CpT |
| Copia-2 | 1.655783 | 0.12236 | CpT |
| Copia-3 | 1.66795 | 0.091286 | CpT |
| Gypsy-1 | 1.622181 | 0.263649 | CpA |
| Gypsy-2 | 1.676275 | 0.144421 | CpA and CpT |
| Gypsy-3 | 1.589147 | 0.420066 | CpA |
| Gypsy-4 | 1.611704 | 0.119303 | CpT |
| Gypsy-5 | 1.610562 | 0.081724 | CpT |
| Gypsy-6 | 1.589147 | 0.420066 | CpA |
| MULE-MuDR-1 | 0.929878 | 1.055794 | CpA |
| MULE-MuDR-2 | 1.287078 | 0.670805 | CpA |
| Tc-Mariner-1 | 1.136585 | 1.213974 | CpA |
| Tc-Mariner-2 | 1.409766 | 0.748148 | CpA |