| Literature DB >> 35049992 |
Ricardo Franco-Duarte1,2, Neža Čadež3, Teresa Rito1,2, João Drumonde-Neves4, Yazmid Reyes Dominguez5, Célia Pais1,2, Maria João Sousa1,2, Pedro Soares1,2.
Abstract
Clavispora santaluciae was recently described as a novel non-Saccharomyces yeast species, isolated from grapes of Azores vineyards, a Portuguese archipelago with particular environmental conditions, and from Italian grapes infected with Drosophila suzukii. In the present work, the genome of five Clavispora santaluciae strains was sequenced, assembled, and annotated for the first time, using robust pipelines, and a combination of both long- and short-read sequencing platforms. Genome comparisons revealed specific differences between strains of Clavispora santaluciae reflecting their isolation in two separate ecological niches-Azorean and Italian vineyards-as well as mechanisms of adaptation to the intricate and arduous environmental features of the geographical location from which they were isolated. In particular, relevant differences were detected in the number of coding genes (shared and unique) and transposable elements, the amount and diversity of non-coding RNAs, and the enzymatic potential of each strain through the analysis of their CAZyome. A comparative study was also conducted between the Clavispora santaluciae genome and those of the remaining species of the Metschnikowiaceae family. Our phylogenetic and genomic analysis, comprising 126 yeast strains (alignment of 2362 common proteins) allowed the establishment of a robust phylogram of Metschnikowiaceae and detailed incongruencies to be clarified in the future.Entities:
Keywords: Azores; Metschnikowiaceae; adaptation; biotechnology; functional gene analysis; genomics; phylogenomics; wine yeasts
Year: 2022 PMID: 35049992 PMCID: PMC8781136 DOI: 10.3390/jof8010052
Source DB: PubMed Journal: J Fungi (Basel) ISSN: 2309-608X
Genome assembly statistics of Clavispora santaluciae strains.
| A1.18T | A1.5 | A1.7 | A1.19 | LB-NB-3.3 | ||
|---|---|---|---|---|---|---|
| Canu assembler | Assembly length (bp) | 11,088,431 | 11,018,248 | 10,921,443 | 10,861,576 | 11,019,028 |
| Number of scaffolds | 43 | 13 | 46 | 86 | 30 | |
| N50 (bp) | 315,943 | 802,369 | 355,153 | 1,329,122 | 494,470 | |
| L50 | 7 | 3 | 6 | 14 | 7 | |
| Number of N’s per 100 Kb | 0 | 0 | 0 | 0 | 0 | |
| Number of scaffolds > 5000 bp | 29 | 11 | 28 | 53 | 23 | |
| Total length > 5000 bp | 10,780,215 | 10,974,595 | 10,557,056 | 10,118,395 | 10,856,688 | |
| Masurca assembler | Substitution errors revised | 42 | 8 | 141 | 428 | 70 |
| Insertion/Deletion errors revised | 1686 | 536 | 2825 | 6144 | 1607 | |
| Assembly length (bp) | 11,089,145 | 11,018,616 | 10,922,446 | 10,863,639 | 11,019,715 | |
| Number of scaffolds | 43 | 13 | 46 | 86 | 30 | |
| N50 (bp) | 532,329 | 1,048,728 | 654,799 | 218,696 | 650,701 | |
| L50 | 7 | 3 | 6 | 14 | 7 | |
| Number of N’s per 100 Kb | 0 | 0 | 0 | 0 | 0 | |
| Number of scaffolds >5000 bp | 29 | 11 | 28 | 53 | 23 | |
| Total length >5000 bp | 10,780,920 | 10,974,959 | 10,558,055 | 10,120,373 | 10,857,358 | |
| RagTag assembler | Assembly length (bp) | 11,092,545 | 11,019,016 | 10,925,846 | 10,870,339 | 11,021,815 |
| Number of scaffolds/chromosomes |
|
| 12 | 19 |
| |
| Number of N´s per 100 Kb | 3065 | 3.63 | 31.12 | 61.64 | 19.05 | |
| Number of scaffolds >5000 bp | 4 | 8 | 4 | 3 | 7 | |
| Total length >5000 bp | 11,025,073 | 11,000,234 | 10,766,609 | 10,606,285 | 10,966,127 | |
| Ploidy | haploid | haploid | haploid | haploid | haploid | |
| GC content (%) | 49.66 | 49.70 | 49.73 | 49.66 | 49.76 | |
Clavispora santaluciae genome annotation statistics.
| A1.18T | A1.5 | A1.7 | A1.19 | LB-NB-3.3 | |
|---|---|---|---|---|---|
|
| |||||
| Total number | 6092 | 6034 | 6067 | 6015 | 6038 |
| Range of protein lengths (aa) | 66–4974 | 63–4974 | 57–4974 | 60–4974 | 66–5293 |
| Average protein length (aa) | 557.6 | 556.6 | 550.9 | 543.5 | 518.3 |
|
| |||||
| microRNAs (miRNAs) | 32 | 32 | 33 | 31 | 21 |
| small RNAs (sRNA) | 20 | 21 | 22 | 20 | 23 |
| nuclear RNAs (snRNA) | 7 | 7 | 6 | 7 | 7 |
| nucleolar RNAs (snoRNA) | 93 | 91 | 99 | 94 | 98 |
| long noncoding RNAs (lncRNA) | 8 | 8 | 9 | 8 | 12 |
| ribosomal RNAs (rRNA) | 96 | 63 | 42 | 69 | 124 |
| transfer RNAs (tRNA) | 276 | 259 | 279 | 299 | 248 |
| Other | 29 | 32 | 32 | 35 | 32 |
|
| |||||
| Genome Completeness (%) | 93.5 | 94.4 | 93.4 | 90.7 | 93.6 |
| Complete BUSCOs | 1595 | 1611 | 1594 | 1547 | 1597 |
| Fragmented BUSCOs | 17 | 14 | 18 | 21 | 4 |
| Missing BUSCOs | 94 | 81 | 94 | 138 | 94 |
|
| |||||
| Genome Completeness (%) | 98.0 | 99.1 | 98.0 | 95.1 | 98.2 |
| Complete BUSCOs | 2094 | 2118 | 2094 | 2032 | 2099 |
| Fragmented BUSCOs | 14 | 11 | 12 | 17 | 13 |
| Missing BUSCOs | 29 | 8 | 31 | 88 | 25 |
|
| |||||
| Genes with KO assigned | 3130 (51.4%) | 3129 (51.9%) | 3125 (51.6%) | 3130 (52.0%) | 3101 (51.4%) |
| Genes with COG assigned | 4180 (68.6%) | 4171 (69.1%) | 4166 (68.7%) | 4119 (68.5%) | 4141 (68.6%) |
|
| |||||
| Number of genes annotated | 120 | 121 | 118 | 112 | 117 |
Figure 1Comparative genomics of Clavispora santaluciae genomes: (A) whole-genome dot-plot comparison between the sequenced strains in pairwise mode. Homologous regions are plotted as dots. Red lines link parallel homologous pairs, and blue lines link anti-parallel pairs; (B) Venn diagram indicating the number of shared coding genes among Clavispora santaluciae strains.
Figure 2Functional annotation of Clavispora santaluciae genome: (A) proteome classification into 23 functional categories, corresponding to clusters of orthologous groups (COGs): A, RNA processing and modification; B, chromatin structure and dynamics; C, energy production and conversion; D, cell cycle control and mitosis; E, amino acid metabolism and transport; F, nucleotide metabolism and transport; G, carbohydrate metabolism and transport; H, coenzyme metabolism; I, lipid metabolism; J, translation; K, transcription; L, replication and repair; M, cell wall/membrane/envelop biogenesis; O, posttranslational modification, protein turnover, chaperone functions; P, inorganic ion transport and metabolism; Q, secondary Structure; S, function unknown; T, signal transduction; U, intracellular trafficking and secretion; Y, nuclear structure; Z, cytoskeleton; (B) classification of the annotated genes into four large functional categories; (C) comparison between the five Clavispora santaluciae strains and other relevant yeast species in proportions of the large functional categories; (D) comparison between relevant yeast species classification of the annotated genes into 23 COG categories; (E) percentage of CAZymes in the five sequenced genomes of Clavispora santaluciae and other relevant yeasts, showing the distribution of predicted proteins into major families.
Figure 3Comparative genomics of Metschnikowiaceae yeasts: (A) average nucleotide identity (% ANI), genome size (Mbp), number of coding genes, and percentage of GC among the complete genomes of Metschnikowiaceae species; (B) number of coding genes across the Metschnikowiaceae family.
Figure 4Phylogram of Metschnikowiaceae family showing the core proteome of 126 yeast strains (alignment of 2362 common proteins). Clavispora santaluciae genomes, sequenced in the present work, are highlighted using a red box, while Clavispora/Candida and Metschnikowia genera are highlighted by blue and green boxes, respectively. Incongruent locations are highlighted by purple boxes. Phylogenetic reconstruction was performed by considering maximum likelihood and 500 bootstrap replicates of the concatenated alignments. Bootstrap values were omitted as they were 100% for all branches.