| Literature DB >> 31712259 |
Xiyi Ren1,2,3, Yongxiang Liu2,3, Yumei Tan2,3, Yonghui Huang2,3, Zuoyi Liu4,5, Xuanli Jiang6.
Abstract
Shiraia bambusicola is a rare medicinal fungus found in China that causes bamboo plants to decay and die with severe infection. Hypocrellin, its main active ingredient, is widely used in several fields, such as medicine, agriculture, and food industry. In this study, to clarify the genomic components, taxonomic status, pathogenic genes, secondary metabolite synthesis pathways, and regulatory mechanisms of S. bambusicola, whole-genome sequencing, assembly, and functional annotation were performed using high-throughput sequencing and bioinformatics approaches. It was observed that S. bambusicola has 33 Mb genome size, 48.89% GC content, 333 scaffolds, 2590 contigs, 10,703 genes, 82 tRNAs, and 21 rRNAs. The total length of the repeat sequence is 2,151,640 bp. The annotation of 5945 proteins was obtained from InterProScan hits based on the Gene Ontology database. Phylogenetic analysis showed that S. bambusicola belongs to Shiraiaceae, a new family of Pleosporales. It was speculated that there are more than two species or genus in Shiraiaceae. According to the annotation, 777 secreted proteins were associated with virulence or detoxification, including 777 predicted by the PHI database, 776 by the CAZY and Fungal CytochromeP450 database, and 441 by the Proteases database. The 252 genes associated with the secondary metabolism of S. bambusicola were screened and enriched into 28 pathways, among which the terpenoids, staurosporine, aflatoxin, and folate synthesis pathways have not been reported in S. bambusicola The T1PKS was the main gene cluster among the 28 secondary metabolite synthesis gene clusters in S. bambusicola The analysis of the T3PKS gene cluster related to the synthesis of hypocrellin showed that there was some similarity between S. bambusicola and 10 other species of fungi; however, the similarity was very low wherein the highest similarity was 17%. The genomic information of S. bambusicola obtained in this study was valuable to understand its genetic function and pathogenicity. The genomic information revealed that several enzyme genes and secreted proteins might be related to their host interactions and pathogenicity. The annotation and analysis of its secondary metabolite synthesis genes and gene clusters will be an important reference for future studies on the biosynthesis and regulation mechanism of the secondary metabolites, contributing to the discovery of new metabolites and accelerating drug development and application.Entities:
Keywords: Functional annotation; Genomic sequencing; Pathogenic gene; Phylogenetic analysis; Secondary metabolism
Mesh:
Substances:
Year: 2020 PMID: 31712259 PMCID: PMC6945017 DOI: 10.1534/g3.119.400694
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
The statistical results of the genome assembly of Shiraia bambusicola
| Sample Name | Contig | Scaffold |
|---|---|---|
| 2,590 | 333 | |
| 30,897,795 | 33,146,786 | |
| 2,590 | 333 | |
| 1,986 | 262 | |
| 28,267 | 546,034 | |
| 1,922 | 132,645 | |
| 137,842 | 1,295,965 | |
| 200 | 1,000 | |
| 48.89 | 48.89 |
The statistical results of genomic components of Shiraia bambusicola
| 33.14 | |
| 10,703 | |
| 30,545 | |
| 10,703 | |
| 19,842 | |
| 18,123,210 | |
| 15,921,153 | |
| 15,901,268 | |
| 2,202,057 | |
| 1,693.28 | |
| 521.24 | |
| 1,485.68 | |
| 110.98 |
Repeat sequence statistics
| Method | Repeat size (bp) | % in genome |
|---|---|---|
| Repbase | 551,548 | 1.6661 |
| ProteinMask | 388,081 | 1.1723 |
| Denovo | 506,412 | 1.5298 |
| TRF | 1,566,152 | 4.7311 |
| Total | 3,012,193 | 9.0993 |
Description: Type, a method for predicting repeats; Repeat Size, the total length of repeats; % in Genome, percentage of repeats in the genome; Total, the deduplication total result of the four methods.
Transposon classification information statistics
| Type | Repbase TEs | ProteinMask TEs | Denovo | Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | |
| 212,137 | 0.6408 | 154,002 | 0.4652 | 212,137 | 0.6408 | 308,987 | 0.9334 | |
| 90,148 | 0.2723 | 58,770 | 0.1775 | 90,148 | 0.2723 | 139,272 | 0.4207 | |
| 242,851 | 0.7336 | 175,309 | 0.5296 | 242,851 | 0.7336 | 316,920 | 0.9574 | |
| 2,417 | 0.0073 | 0 | 0.0000 | 2,417 | 0.0073 | 2,417 | 0.0073 | |
| 66 | 0.0002 | 0 | 0.0000 | 66 | 0.0002 | 66 | 0.0002 | |
| 3,929 | 0.0119 | 0 | 0.0000 | 3,929 | 0.0119 | 3,929 | 0.0119 | |
| 551,548 | 1.6661 | 388,081 | 1.1723 | 551,548 | 1.6661 | 726,355 | 2.1942 | |
Description: Type, the type of transposon, namely: DNA transposon (DNA), long scattered repeat (LINE), long terminal repeat transposon (LTR), short scattered repeat (SINE), other types (Other), Unknown; Repbase TEs, results of transposon predicted using the Repbase database; ProteinMask TEs, prediction results with RepeatProteinMasker; De novo, prediction results using the De novo method; Combined TEs, the deduplication results of the three methods. Total, the comprehensive results of several types of transposons which remove redundancy.
Noncoding RNA statistics
| Type | Copy number | Average length(bp) | Total length(bp) | % in genome |
|---|---|---|---|---|
| 82 | 93.76 | 7,689 | 0.0232 | |
| 21 | 420.95 | 8,840 | 0.0267 | |
| 2 | 242.5 | 485 | 0.0015 | |
| 32 | 89.71 | 2,871 | 0.0087 |
Description: Type, the type of ncRNA; Copy number, the number of ncRNAs; Average length, the average length of ncRNA; Total length, the total length of ncRNA; % in the genome, the proportion of ncRNA in the genome.
Figure 1GO-based functional annotation of genes present in the [Shiraia bambusicola P. Hennings] genome. The first one indicates Biological Process domains, the second indicates cellular component domains, and the third indicates the Molecular function domains.
Figure 2The phylogenetic relationship among the Shiraia bambusicola genomes, 33 other fungi with sequenced genomes, and one other S. bambusicola transcriptome.
Figure 3Venn graph showing the intersections among the proteases (blue), CYP450 enzymes (pink), CAZymes (green), and PHI proteins (light red).
Figure 4The gene distribution of pathogen-host interaction.
Results of classification and annotation of carbohydrate enzymes
| Classification | Number of CAZymes proteins | With transmembrane domain | With signal peptide | Number offamilies |
|---|---|---|---|---|
| 322 | 53 | 8 | 25 | |
| 179 | 25 | 4 | 14 | |
| 667 | 106 | 14 | 68 | |
| 381 | 17 | 8 | 39 | |
| 14 | 5 | 1 | 3 |
Description: CBM, carbohydrate-binding module; CE, carbohydrate esterases; GH, glycoside hydrolases; GT, glycosyl transferase; PL, polysaccharide lyase.
Figure 5The partial results of the secondary metabolites pathway enrichment. The size of the black dot indicates the number of genes; the larger the dot, the more the genes are enriched. The blue shade indicates the significance of enrichment: the lighter the color, the closer it is to 1, the more significant is the enrichment.
The secondary metabolite synthesis gene clusters in Shiraia bambusicola
| Type | Count |
|---|---|
| T1PKS | 14 |
| T3PKS | 1 |
| NRPS | 4 |
| Terpene | 4 |
| T1pks-nrps | 2 |
| undefined | 3 |
| Total | 28 |
Figure 6T3PKS gene cluster analysis.