Literature DB >> 35071701

Genome sequence data of the antagonistic soil-borne yeast Cyberlindnera sargentensis (SHA 17.2).

Maria Paula Rueda-Mejia1, Lukas Nägeli1, Stefanie Lutz2, Raúl A Ortiz-Merino3, Daniel Frei2, Jürg E Frey2, Kenneth H Wolfe3, Christian H Ahrens2,4, Florian M Freimoser1.   

Abstract

Cyberlindnera sargentensis strain SHA 17.2, isolated from a Swiss soil sample, exhibited strong antagonistic activity against several plant pathogenic fungi in vitro and was highly competitive against other yeasts in soil. As a basis for identifying the mechanisms underlying its strong antagonistic activity, we have sequenced the genome of C. sargentensis (SHA 17.2) by long- and short read sequencing, de novo assembled them into seven contigs/chromosomes and a mitogenome (total genome size 11.4 Mbp), and annotated 5455 genes. This high-quality genome is the reference for transcriptome and proteome analyses aiming at elucidating the mode of action of C. sargentensis against fungal plant pathogens. It will thus serve as a resource for identifying potential biocontrol genes and performing comparative genomics analyses of yeast genomes.
© 2022 The Authors.

Entities:  

Keywords:  Antagonism; Biocontrol; Genome assembly and annotation; Mechanism; Plant protection; Yeast

Year:  2022        PMID: 35071701      PMCID: PMC8762083          DOI: 10.1016/j.dib.2022.107799

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The genome of C. sargentensis (SHA 17.2; 7 contigs/chromosomes plus mitogenome) is the basis for identifying the biocontrol mode of action of this strongly antagonistic yeast. The annotated genome sequence released here can be used by biologists, microbiologists or mycologists who study fundamental aspects of microbial interactions or who are interested in developing new and improved biocontrol applications. Bioinformaticians and genome biologists may include the genome in comparative analyses and evolutionary studies. The high-quality genome of C. sargentensis (SHA 17.2) presented here is a reference for functional genomics studies and represents the basis for potential biocontrol genes and similarly active biocontrol strains through genome mining.

Data Description

Cyberlindnera sargentensis (SHA 17.2; CCoS1011) was isolated from an agricultural soil sample collected near Wädenswil (47.223140 °N, 8.676699 °E, 470 m.a.s.l.) in Switzerland. The strain was identified based on the ITS sequence as the species hypothesis SH1545207.08FU, which is currently labelled as Cyberlindnera sargentensis (Wick. & Kurtzman) Minter [1], [2], [3]. The isolate was one of the most strongly antagonistic yeasts against a range of saprophytic and plant pathogenic filamentous fungi (e.g., Botrytis, Fusarium, and Monilinia strains) and was also highly competitive against other yeasts in soil [2,4]. Cyberlindnera sargentensis (SHA 17.2) has thus been selected as a promising yeast for potential biocontrol applications and for further characterising the mechanisms responsible for the strong biocontrol phenotype. The initial de novo assembly of the C. sargentensis (SHA 17.2) genome consisted of 13 contigs, which, after ONT scaffolding, extensive polishing and manual curation, were reduced to a total of seven chromosomes and one mitogenome (Table 1). In order to correctly assemble the mitogenome, a reference-based approach was followed (see Methods), which resulted in the assembly of the 66 kb circular mitogenome. No additional plasmids could be identified. The total genome size was 11’378’532 bp. Variant calling detected only 55 and 12 variants in the Illumina and PacBio data, respectively, which suggested that C. sargentensis SHA 17.2 is a haploid strain. This was confirmed by the presence of only the MATa1 and MATa2 genes (CYSA0D04350 and CYSA0D04340, respectively) and the flanking genes SLA2 (CYSA0D04360) and VPS75 (CYSA0D04330), which often adjoin yeast MAT loci [5,6]. C. sargentensis is thus a heterothallic species and the strain SHA 17.2 a haploid of the mating type a. Overall, the small number of contigs and high coverage of the genome assembly (see Table 1) suggest the C. sargentensis SHA 17.2 genome to be of high quality and completeness.
Table 1

Overview of the final, nearly complete C. sargentensis (SHA 17.2) de novo genome assembly.

Chromosomes
Scaffolds
ContigsIIIIIIIVVVIVIIMitogenome
Length [bp]:

2,886,6912,560,5831,739,8171,341,0351,204,7861,140,646438,57466,400

PacBio > 5 kb:

Coverage53x54x58x63x64x69x64x1418x
Mapped99.82 %

ONT > 20 kb:

Coverage4x5x5x7x6x8x8x298x
Mapped100 %

Illumina 2 × 300 bp:

Coverage65x67x74x79x82x97x105x64x
Mapped99.13 %

No. of telomere patterns 5’:

2018242221018

No. of telomere patterns 3’:Not annotated

4840342665320

No. of genes

14031254841650560542205

No. of tRNAs

4944141622211

Comments:

• I, II, IV, and V: Complete

• III: Complete apart from 10 kb of scaffolded Ns at 670 kb

• Mitogenome: Complete, circular

• VI: First 10 kb consist of collapsed rRNA operons. Two copies are present. The coverage is, however, ∼20x higher. Thus, there should be ∼40 copies, which can only be resolved using very long reads.

• VII: Scaffolds VI and VII might be on the same chromosome since the telomeres are missing at one end, and thus, are not complete. They also have a very similar coverage.

Overview of the final, nearly complete C. sargentensis (SHA 17.2) de novo genome assembly. Comments: • I, II, IV, and V: Complete • III: Complete apart from 10 kb of scaffolded Ns at 670 kb • Mitogenome: Complete, circular • VI: First 10 kb consist of collapsed rRNA operons. Two copies are present. The coverage is, however, ∼20x higher. Thus, there should be ∼40 copies, which can only be resolved using very long reads. • VII: Scaffolds VI and VII might be on the same chromosome since the telomeres are missing at one end, and thus, are not complete. They also have a very similar coverage. The C. sargentensis nuclear genome contained 5455 protein coding genes and 167 tRNA genes. The mitochondrial genome was not annotated. Of all protein coding genes, 5019 sequences were annotated with at least one KEGG orthology identifier (KO identifier, K number). Overall, 3,157 K numbers with a score above the predefined thresholds for individual KOs were assigned to 3044 predicted C. sargentensis genes. Many KEGG pathway modules, functional units of gene sets in metabolic pathways, were complete or missed only few blocks as indicated by the KEGG Mapper Reconstruct tool [7] (Fig. 1). Based on the KofamKOALA KEGG Orthology analysis of the annotated genome, only one secondary metabolite biosynthesis gene (CYSA_0A07570; K06998, similar to a trans-2,3-dihydro-3-hydroxyanthranilate isomerase [EC:5.3.3.17]) was identified. However, the fungal antiSMASH v.6.0 online tool [8] identified two potential secondary metabolite clusters. The first represented a NRPS-like cluster predicted to consist of 15 genes that was localised on scaffold 1 (CYSA_0A11890-CYSA_0A12070). Furthermore, a predicted terpene cluster with seven genes was identified on scaffold 5 (CYSA_0E04890-CYSA_0E04950). Since antiSMASH uses different principles to predict genes, the annotations of the predicted secondary metabolite cluster genes were not identical to those from YGAP. Specific transcriptome and proteome analyses that are enabled by the C. sargentensis SHA 17.2 reference genome will help identifying a set of potential biocontrol genes by the strategy recently used for Aureobasidium pullulans [9].
Fig. 1

Analysis of the C. sargentensis SHA 17.2 genome revealed many complete or nearly complete KEGG pathway modules. Out of the 5455 annotated protein coding genes, 3044 predicted C. sargentensis genes were matched with 3157 K numbers with a score above the predefined thresholds for individual KOs.

Analysis of the C. sargentensis SHA 17.2 genome revealed many complete or nearly complete KEGG pathway modules. Out of the 5455 annotated protein coding genes, 3044 predicted C. sargentensis genes were matched with 3157 K numbers with a score above the predefined thresholds for individual KOs.

Experimental Design, Materials and Methods

Genomic DNA was extracted using a phenol/chloroform extraction protocol. Oxford Nanopore Technologies (ONT) sequencing was carried out in-house. The ONT library was prepared using a 1D2 Sequencing Kit (SQK-LSK308) and sequenced on a FLO-MIN107 (R9.5) flow cell (all from Oxford Nanopore Technologies, Oxford, UK). One 2 × 300 bp Illumina paired end library was prepared in-house using the Nextera XT DNA kit and sequenced on a MiSeq platform (all from Illumina, Inc., San Diego, CA. USA). PacBio sequencing was carried out at the Functional Genomics Centre Zurich (FGCZ) on a Sequel machine (1 SMRT cell shared between three strains) (PacBio, Menlo Park, CA. USA). Size selection was performed using the BluePippin system (Labgene Scientific, Châtel-St-Denis, Switzerland). PacBio and ONT subreads were filtered with Filtlong (v.0.2.0) using a length cut-off of 5 kb and 20 kb, respectively. The Illumina reads were filtered and trimmed using trimmomatic (v0.39; parameters: phred 33, “LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36”, only keep paired reads) [10]. The filtered PacBio reads were assembled using Flye (v.2.4; default parameters, except: estimated genome size of 11 Mb) [11], an assembly algorithm capable of resolving long, nearly identical repeat sequences [12]. Three short contigs were submitted to BLAT [13] and subsequently removed since they appeared spurious. The remaining 10 contigs were polished with the PacBio reads using 3 Arrow runs. The polished contigs were further scaffolded using the longer ONT reads (> 20 kb) and LRScaf (v.1.1.6). To correctly assemble the mitogenome, the mitogenome sequences of three Cyberlindnera strains (NC_022167.1, NC_022163.1, KC993181.1) were downloaded from NCBI and PacBio reads were individually mapped to the three references using minimap2 (set parameters: -a, -x map-pb). Mapping reads were filtered from the bam file using samtools (-F 4) and extracted into a fastq file using bam2fastq (v1.1.0). The reads were filtered by length (> 10 kb) and randomly subsampled (500 sequences) using awk to achieve a suitable coverage. The reads were assembled using Flye in plasmid mode (v.2.4; default parameters, except: estimated genome size of 50 kb, –plasmid) [11]. The circularity and completeness of the mitogenome were confirmed by mapping the PacBio reads to the start-aligned contig using minimap2 (set parameters: -a, -x map-pb) and visual inspection in the Integrative Genomics Viewer (IGV) [14]. All contigs were polished using the PacBio reads and 8 Arrow runs. The contigs were further polished using the Illumina reads and 3 Freebayes (v.1.2.0) [15] runs to correct potential small errors (e.g., homopolymer errors). The PacBio (> 5kb), ONT (> 20 kb) and Illumina reads were mapped to the polished contigs using minimap2 for PacBio (-x map-pb) and ONT (-x map-ont) and bwa for Illumina to verify the completeness and contiguity of the assembly by visual inspection in the IGV. PlasmidSpades [16] was run on the Illumina data in order to detect smaller plasmids. The mean telomere lengths (pattern “TGTGGTGTCTGGAT”) could not be calculated using the Illumina reads and computel (v.1.2) [17]. The number of telomere patterns at both ends of each contig was thus counted manually (see Table 1). The ploidy level of the genome was estimated with the Illumina data by using PloidyNGS (v.3.1.2) [18] and nQuire [19]. Variants were called using the Illumina data and Freebayes (v.1.2.0; parameter: -C 5 (minimum count of observations supporting an alternate allele)) [15]; as well as the PacBio data and longshot (v.0.3.3) [20]. The variants were filtered using vcffilter and a quality cut-off of 20 (parameter: -f “QUAL > 20”). The C. sargentensis (SHA 17.2) genome was annotated using the Yeast Genome Annotation Pipeline (YGAP) [21]. Predictions were assessed for errors (i.e., internal stop codons, no ATG start codon) and manually corrected (indicated by the suffix “ed” in gene names). KEGG Orthologs (KOs; K numbers) were assigned to the predicted proteins by KofamKOALA [22]. The KEGG Mapper Reconstruct tool was used to assign the KOs to pathway modules [7].

CRediT authorship contribution statement

Maria Paula Rueda-Mejia: Investigation, Resources. Lukas Nägeli: Investigation, Resources. Stefanie Lutz: Software, Formal analysis. Raúl A. Ortiz-Merino: Software, Data curation, Formal analysis. Daniel Frei: Investigation, Resources. Jürg E. Frey: Resources, Supervision. Kenneth H. Wolfe: Software, Data curation, Supervision. Christian H. Ahrens: Conceptualization, Software, Supervision. Florian M. Freimoser: Conceptualization, Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships, which have or could be perceived to have influenced the work reported in this article.
SubjectAgricultural Microbiology
Specific subject areaGenome analysis of a yeast that strongly antagonises fungal plant pathogens.
Type of dataHigh-quality draft genome sequence data, genome annotation, table and figure
How data were acquiredGenomic DNA sequencing by Oxford Nanopore Technologies (ONT), Illumina MiSeq, and PacBio platforms, de novo assembly
Data formatRaw data: annotated draft genome assemblySecondary data: table of annotated genes, the encoding proteins, and functional prediction
Parameters for data collectionGenomic DNA was extracted from a pure culture of C. sargentensis (SHA 17.2) using a phenol/chloroform protocol.
Description of data collectionSequencing: Oxford Nanopore Technologies (ONT), Illumina MiSeq, PacBioAssembly: filtering using length cut-offs, de novo assembly of PacBio reads, scaffolding with long ONT reads, reference-based assembly of the mitogenome.Annotation: Yeast Genome Annotation Pipeline (YGAP) and KEGG Orthologs assignment with KofaKOALA.
Data source locationCyberlindnera sargentensis SHA 17.2 was isolated from a fallow farmland soil sample collected near Wädenswil (47.223140 °N, 8.676699 °E, 470 m.a.s.l.), Switzerland. The strain is available at the Culture Collection of Switzerland under CCOS1011.
Data accessibilityThe assembled genome is deposited at NCBI's Genbank under the BioProject PRJNA763105 and the accession numbers CP083464-CP083471 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA763105). Additional data (PacBio and ONT long read data; Illumina miSeq short read data; genome annotation) is available at https://dataverse.harvard.edu/dataverse/Csar_genome.
Related research articleHilber-Bodmer, M., Schmid, M., Ahrens, C.H., Freimoser, F.M., 2017. Competition assays and physiological experiments of soil and phyllosphere yeasts identify Candida subhashii as a novel antagonist of filamentous fungi. BMC Microbiol. 17, 4.10.1186/s12866-016-0908-z
  21 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  KEGG Mapper for inferring cellular functions from protein sequences.

Authors:  Minoru Kanehisa; Yoko Sato
Journal:  Protein Sci       Date:  2019-08-29       Impact factor: 6.725

3.  ploidyNGS: visually exploring ploidy with Next Generation Sequencing data.

Authors:  Renato Augusto Corrêa Dos Santos; Gustavo Henrique Goldman; Diego Mauricio Riaño-Pachón
Journal:  Bioinformatics       Date:  2017-08-15       Impact factor: 6.937

4.  Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

Authors:  Lilit Nersisyan; Arsen Arakelyan
Journal:  PLoS One       Date:  2015-04-29       Impact factor: 3.240

5.  Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats.

Authors:  Michael Schmid; Daniel Frei; Andrea Patrignani; Ralph Schlapbach; Jürg E Frey; Mitja N P Remus-Emsermann; Christian H Ahrens
Journal:  Nucleic Acids Res       Date:  2018-09-28       Impact factor: 16.971

6.  Multiple Reinventions of Mating-type Switching during Budding Yeast Evolution.

Authors:  Tadeusz Krassowski; Jacek Kominek; Xing-Xing Shen; Dana A Opulente; Xiaofan Zhou; Antonis Rokas; Chris Todd Hittinger; Kenneth H Wolfe
Journal:  Curr Biol       Date:  2019-07-25       Impact factor: 10.834

7.  Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.

Authors:  Peter Edge; Vikas Bansal
Journal:  Nat Commun       Date:  2019-10-11       Impact factor: 14.919

8.  A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach.

Authors:  Estelle Proux-Wéra; David Armisén; Kevin P Byrne; Kenneth H Wolfe
Journal:  BMC Bioinformatics       Date:  2012-09-17       Impact factor: 3.169

9.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.