| Literature DB >> 32158440 |
Yuan Sui1, Michael Wisniewski2, Samir Droby3, Edoardo Piombo4, Xuehong Wu5, Junyang Yue6.
Abstract
Candida oleophila is an effective biocontrol agent used to control post-harvest diseases of fruits and vegetables. C. oleophila I-182 was the active agent used in the first-generation yeast-based commercial product, Aspire®, for post-harvest disease management. Several action modes, like competition for nutrients and space, induction of pathogenesis-related genes in host tissues, and production of extracellular lytic enzymes, have been demonstrated for the biological control activity exhibited by C. oleophila through which it inhibits post-harvest pathogens. In the present study, the whole genome of C. oleophila I-182 was sequenced using PacBio and Illumina shotgun sequencing technologies, yielding an estimated genome size of 14.73 Mb. The genome size is similar in length to that of the model yeast strain Saccharomyces cerevisiae S288c. Based on the assembled genome, protein-coding sequences were identified and annotated. The predicted genes were further assigned with gene ontology terms and clustered in special functional groups. A comparative analysis of C. oleophila proteome with the proteomes of 11 representative yeasts revealed 2 unique and 124 expanded families of proteins in C. oleophila. Availability of the genome sequence will facilitate a better understanding the properties of biocontrol yeasts at the molecular level.Entities:
Keywords: Candida oleophila; biocontrol agent; genome annotation; genome assembly; post-harvest disease management
Year: 2020 PMID: 32158440 PMCID: PMC7052047 DOI: 10.3389/fmicb.2020.00295
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Summary of the sequencing data obtained with PacBio and Illumina technology and used for the genome assembly of C. oleophila I-182.
| Raw data | 1,516 Mb | 862 Mb | 1,259 Mb |
| Clean data | 1,509 Mb | 741 Mb | 699 Mb |
| Read number | 103,064 | 5,749,278 | 8,397,144 |
The details of genome assembly statistics for C. oleophila.
| Total number | 8 | 10 |
| Total length | 14,129,745 | 14,129,104 |
| N50 length | 2,030,489 | 1,848,245 |
| N90 length | 1,455,442 | 1,455,442 |
| Maximum length | 3,488,600 | 2,315,880 |
| Minimum length | 74,302 | 1,795 |
| GC content | 39.39 | 39.39 |
Annotation of the predicted genes using a variety of databases.
| nr | Non-redundant protein database | 4,779 | 85.11 |
| Swiss-Prot | The UniProtKB/Swiss-Prot database | 2,839 | 50.56 |
| KEGG | Kyoto encyclopedia of genes and genomes | 3,162 | 56.31 |
| GO | Gene ontology | 3,745 | 66.69 |
| COG | Cluster of orthologous groups of proteins | 727 | 12.94 |
| P450 | Fungal cytochrome P450 | 349 | 6.21 |
| VFDB | Virulence factors of pathogenic bacteria | 37 | 0.65 |
| ARDB | Antibiotic resistance genes database | 1 | 0.01 |
| TF | Transcription factor database | 255 | 4.54 |
| TrEMBL | Translated EMBL nucleotide sequence data library | 4,751 | 84.61 |
| CAZY | Carbohydrate-active enzymes database | 103 | 1.83 |
| PHI | Pathogen host interactions | 468 | 8.33 |
| IPR | The interpro database | 4,881 | 86.92 |
| T3SS | Type III secretion system effector protein | 2,072 | 36.9 |
| Total | 5,356 | 95.38 | |
FIGURE 1(A) Percent distribution of E-value from the alignment of Candida oleophila predicted genes with available sequences in the nr database. (B) Species distribution of the top BLAST hits for the best alignment of C. oleophila predicted genes against the nr database.
FIGURE 2Distribution of 727 predicted genes in C. oleophila and 21 different COG functional categories.
FIGURE 3GO classification of all the identified genes in C. oleophila was summarized as three main categories: biological process, molecular function and cellular component.
Statistics of different types of ncRNA in the C. oleophila genome.
| tRNA | 246 | 79 | 19,159 | 0.1356 |
| rRNA | 19 | 1,900 | 36,107 | 0.2555 |
| sRNA | 100 | 72 | 7,219 | 0.0511 |
| snRNA | 38 | 110 | 4,162 | 0.0295 |
| miRNA | 132 | 56 | 7,417 | 0.0525 |
FIGURE 4Venn diagram indicating the number of shared and specific gene families among C. oleophila and 11 other representative yeast species. The number in the middle white circle indicates the number of shared families (no parentheses) and the number of shared genes (parentheses). In each of the colored section the number of unique gene families (no parentheses) is indicated and the number of genes within the species-specific families (parentheses) is indicated. Three-letter acronym for the abbreviation of each species.
FIGURE 5Expansion and contraction of gene families among the 12 yeast species. Phylogenetic tree was constructed based on 538 high-quality 1:1 single-copy orthologous genes. The numerical values on each branch of the tree represent gene families undergoing gain (red) or loss (green) events. Gene families predicted in the most recent common ancestor (MRCA) was 6,383. Three-letter acronym for the abbreviation of each species name.
FIGURE 6Identification of two distinct secondary metabolite clusters in the genome of C. oleophila. (A) The non-ribosomal peptide synthetase (NRPS)-like cluster is composed of 18 functional genes. (B) The terpene cluster is composed of nine functional genes. The rectangle denotes a functional gene, while the red arrow on the top indicates the transcriptional direction of each functional gene.