| Literature DB >> 31583107 |
Shahin S Ali1,2, Asman Asman3,4, Jonathan Shao5, Amanda P Firmansyah6, Agung W Susilo7, Ade Rosmana3,4, Peter McMahon8, Muhammad Junaid3,4, David Guest8, Tee Yei Kheng9, Lyndel W Meinhardt1, Bryan A Bailey1.
Abstract
BACKGROUND: Ceratobasidium theobromae, a member of the Ceratobasidiaceae family, is the causal agent of vascular-streak dieback (VSD) of cacao, a major threat to the chocolate industry in the South-East Asia. The fastidious pathogen is very hard to isolate and maintain in pure culture, which is a major bottleneck in the study of its genetic diversity and genome. RESULT: This study describes for the first time, a 33.90 Mbp de novo assembled genome of a putative C. theobromae isolate from cacao. Ab initio gene prediction identified 9264 protein-coding genes, of which 800 are unique to C. theobromae when compared to Rhizoctonia spp., a closely related group. Transcriptome analysis using RNA isolated from 4 independent VSD symptomatic cacao stems identified 3550 transcriptionally active genes when compared to the assembled C. theobromae genome while transcripts for only 4 C. theobromae genes were detected in 2 asymptomatic stems. De novo assembly of the non-cacao associated reads from the VSD symptomatic stems uniformly produced genes with high identity to predicted genes in the C. theobromae genome as compared to Rhizoctonia spp. or genes found in Genbank. Further analysis of the predicted C. theobromae transcriptome was carried out identifying CAZy gene classes, KEGG-pathway associated genes, and 138 putative effector proteins.Entities:
Keywords: Ceratobasidiaceae; Chocolate; RNA-Seq; Rhizoctonia; VSD
Year: 2019 PMID: 31583107 PMCID: PMC6767637 DOI: 10.1186/s40694-019-0077-6
Source DB: PubMed Journal: Fungal Biol Biotechnol ISSN: 2054-3085
Fig. 1Vascular streak dieback (VSD) symptoms on cacao trees. a Branches killed due to VSD on mature cacao trees. b The visible streaks in the xylem of young stems of cacao trees. c, d Infection of transverse sections of cacao stems
Fig. 2Molecular phylogenetic analysis of Ceratobasidium theobromae isolated from vascular streak dieback (VSD) symptomatic cacao samples from Soppeng (CT) and in planta strains, present in the VSD symptomatic cacao samples from Luwu District of South Sulawesi Province, Indonesia (INS), a C. theobromae isolate previously reported from Indonesia [11] and different Rhizoctonia spp. and Ceratobasidium spp. representing 30 anastomosis groups. The distance tree of 1000 bootstrapped data sets with the highest log likelihood (− 924.178) is shown. Branch lengths measured in the number of substitutions per site
Clustering of the initial 105,861 genome assembly contigs (44.64 Mbp) into genome bins using MetaBAT2
| Genome bins | Bin 1 | Bin 2 | Bin 3 | Bin 4 |
|---|---|---|---|---|
| Closest representationa |
|
|
| |
| Total contig length (bp) | 28,287,668 | 496,199 | 368,868 | 2,326,605 |
| Total contig number | 600 | 5 | 8 | 447 |
| BUSCO Completeness (%) | 93.4 | NA | NA | 83.5 |
| Max Contig size (bp) | 589,277 | 263,186 | 109,092 | 30,075 |
| Min Contig size (bp) | 2653 | 25,328 | 9627 | 1509 |
| Mean Contig size (bp) | 47,146 | 99,240 | 46,109 | 5205 |
| N50 Contig length (bp) | 93,535 | 263,186 | 81,760 | 6567 |
| Mean GC content | 49.33% | 49.23% | 50.41% | 72.25% |
| Max GC content | 54.13% | 49.61% | 51.48% | 79.16% |
| Min GC content | 44.57% | 47.88% | 49.44% | 64.56% |
aBased on BLASTx search of the contigs against Nr-database
Fig. 3Detection of bacterial contaminant in the initial genome assembly of Ceratobasidium theobromae. a The percentage of smaller contigs (< 200 bp), contigs sorted by MetaBAT2 and bacterial/Rhizoctonia blast hits over the contigs. b, c Illustrate the different distribution of GC contents in the sequences considered as bacterial and putative C. theobromae
Genome assembly and annotation statistics of Ceratobasidium theobromae
|
| |
|---|---|
| Total Contig length (bp) | 33,899,105 |
| Contig numbers | 6878 |
| BUSCO completeness (%) | 99.1% |
| GC content | 44.81% |
| N50 Contig length (bp) | 70,517 |
| Max Contig size (bp) | 589,277 |
| Min Contig size (bp) | 200 |
| Mean Contig size (bp) | 4930 |
| Gene number | 9264 |
| Total gene length (bp) | 20,074,964 |
| Average gene length (bp) | 2168.07 |
| Gene densitya | 0.592 |
| Number of expressed genesb | 3550 |
| Genes with GO annotationc | 5364 |
| Genes within KEGG pathway | 3055 |
aCDS bases/total genome bases
bOnly gene models with ≥ 10 raw reads, detected in any of the infected plant samples
cGene models with E < 10−5 for BLASTn against Uniport Gene Ontology database
List of filtered transcriptome sequence variants in RNA-Seq libraries from VSD-infected stems
| RNA-Seq library | No. of RNA reads in each library | No. of variants (QUAL ≥ 999, DP ≥ 30, and GQ ≥ 40) | |||
|---|---|---|---|---|---|
| Homozygous alternate | Heterozygous | ||||
| SNPs | INDELs | SNPs | INDELs | ||
| INS_19 | 108,224 | 258 | 39 | 10 | 0 |
| INS_23 | 283,192 | 620 | 135 | 26 | 0 |
| INS_31 | 888,258 | 1357 | 280 | 34 | 2 |
| INS_57 | 109,242 | 279 | 37 | 16 | 4 |
Number of CAZymes family genes of Ceratobasidium theobromae
| CAZymes familya | Total genes | Secretedb | Non-secretedb |
|---|---|---|---|
| AA3 | 17 | 5 | 12 |
| AA1 | 15 | 13 | 2 |
| AA5 | 12 | 8 | 4 |
| AA9 | 22 | 22 | 0 |
| Other AA (5) | 14 | 8 | 6 |
| CBM1 | 15 | 7 | 8 |
| CBM13 | 24 | 15 | 9 |
| CBM5 | 11 | 7 | 4 |
| Other CBM (8) | 22 | 12 | 10 |
| CE4 | 18 | 14 | 4 |
| CE16 | 13 | 5 | 8 |
| CE8 | 11 | 10 | 1 |
| Other CE (6) | 17 | 11 | 6 |
| GH0 | 13 | 7 | 6 |
| GH43 | 15 | 7 | 8 |
| GH13 | 13 | 6 | 7 |
| GH3 | 12 | 6 | 6 |
| GH28 | 19 | 17 | 2 |
| GH5 | 24 | 13 | 11 |
| GH16 | 30 | 16 | 14 |
| GH18 | 19 | 7 | 12 |
| GH7 | 12 | 10 | 2 |
| Other GH (44) | 134 | 71 | 63 |
| GT2 | 15 | 0 | 15 |
| GT4 | 15 | 0 | 15 |
| Other GTs (28) | 70 | 5 | 65 |
| PL1 | 26 | 23 | 3 |
| PL3_2 | 19 | 17 | 2 |
| Other PLs (5) | 16 | 11 | 5 |
aNumber within parentheses indicates the number of CAZymes families
bAs determined by SignalP, version 5.1 and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E < 10−10 and > 40% similarity