Literature DB >> 25999568

Draft Genome Sequence of Sporidiobolus salmonicolor CBS 6832, a Red-Pigmented Basidiomycetous Yeast.

Marco A Coelho1, João M G C F Almeida2, Chris Todd Hittinger3, Paula Gonçalves2.   

Abstract

We report the genome sequencing and annotation of the basidiomycetous red-pigmented yeast Sporidiobolus salmonicolor strain CBS 6832. The current assembly contains 395 scaffolds, for a total size of about 20.5 Mb and a G+C content of ~61.3%. The genome annotation predicts 5,147 putative protein-coding genes.
Copyright © 2015 Coelho et al.

Entities:  

Year:  2015        PMID: 25999568      PMCID: PMC4440948          DOI: 10.1128/genomeA.00444-15

Source DB:  PubMed          Journal:  Genome Announc


GENOME ANNOUNCEMENT

The carotenoid-producing yeast Sporidiobolus salmonicolor belongs to the order Sporidiobolales (1), which is classified in the subphylum Pucciniomycotina, the earliest branching lineage of Basidiomycota (2). This species is recognized mainly as a phyllosphere yeast and is free-living and distributed worldwide. It has been recovered from a broad spectrum of substrates, including fresh and marine water, soil, and even clinical samples (3). This species has a research interest from the perspective of the evolution of sexual reproduction in basidiomycetes (4–6) and has the potential to serve as a natural source of carotenoids (7) for pharmaceutical, cosmetics, and food industries (8, 9). Here we report the genome sequencing of S. salmonicolor strain CBS 6832, isolated as a contaminant of a clinical sample (10), using a combination of Illumina and Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing technologies. For Illumina sequencing, 0.4-kb paired-end and 2- to 5-kb mate-pair libraries were generated and sequenced using the GAIIx and HiSeq 2000 platforms, respectively. For PacBio sequencing, a 10-kb SMRTbell library was generated and sequenced on a PacBio RS II platform. Illumina sequencing data were pre-processed by Trimmomatic version 0.32 (11). In short, adapter sequences were removed and low-quality bases were trimmed at the end of the reads and when the average quality was below a defined quality threshold (Phred score <20, using a sliding window approach). The de novo hybrid assembly of Illumina and PacBio data was performed using SPAdes version 3.1 assembler (12) with parameters “careful” and “K 35,55,65,77.” PacBio circular consensus sequences (CCS) were used as unpaired single reads, and contiguous long reads (CLR) were used for gap closure and repeat resolution. The accuracy of the resulting assembly was evaluated with REAPR (13), which uses read pairs mapped to the initial assembly to pinpoint misassemblies, such as scaffolding inaccuracies. The final draft assembly consists of 395 scaffolds (165 of which are above 500 bp), for a total size of 20,549,402 bp (N50, 538 kb) and a G+C content of about 61.3% as assessed by QUAST (14). Genes were predicted using Maker version 2.10 (15) with RepBase version 19.5 (16), SNAP, and Augustus trained on the Rhodosporidium toruloides NP11 model (PRJNA169538). Protein-coding sequences were annotated using the SIMAP database (17), as of late May 2014. Overall, we predicted 5,147 putative protein-coding genes. These encompass 798 superfamilies (18); 4,019 genes fall into 2,198 PANTHER families, of which 3,031 were annotated to the subfamily level (19). The most represented families are involved in transport, carbohydrate metabolism, regulation, and steroid metabolism. About 21% of the annotated proteins display a putative transmembrane region, 477 of which present four or more of these regions. A total of 1,031 genes exhibit coiled-coils signs in their products, suggesting involvement in protein–protein interactions (20). About 220 of the proteins present an identifiable signal peptide and are expected to be secreted (21, 22). This genome will enable direct access to genes encoding enzymes with potential biotechnological applications and foster comparative genomics studies to elucidate fundamental biological processes, such as the evolution of sexual reproduction in fungi.

Nucleotide sequence accession numbers.

The genome of S. salmonicolor strain CBS 6832 has been deposited in DDBJ/ENA/GenBank under the accession numbers CENE01000001 to CENE01000395. The version described in this paper is the first version.
  17 in total

1.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.

Authors:  J Gough; K Karplus; R Hughey; C Chothia
Journal:  J Mol Biol       Date:  2001-11-02       Impact factor: 5.469

2.  A combined transmembrane topology and signal peptide prediction method.

Authors:  Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal:  J Mol Biol       Date:  2004-05-14       Impact factor: 5.469

3.  SignalP 4.0: discriminating signal peptides from transmembrane regions.

Authors:  Thomas Nordahl Petersen; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal:  Nat Methods       Date:  2011-09-29       Impact factor: 28.547

Review 4.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

5.  Sporobolomyces salmonicolor var. fischerii, a new yeast.

Authors:  V C Misra; H S Randhawa
Journal:  Arch Microbiol       Date:  1976-05-03       Impact factor: 2.552

6.  Molecular breeding of carotenoid biosynthetic pathways.

Authors:  C Schmidt-Dannert; D Umeno; F H Arnold
Journal:  Nat Biotechnol       Date:  2000-07       Impact factor: 54.908

7.  A deviation from the bipolar-tetrapolar mating paradigm in an early diverged basidiomycete.

Authors:  Marco A Coelho; José Paulo Sampaio; Paula Gonçalves
Journal:  PLoS Genet       Date:  2010-08-05       Impact factor: 5.917

8.  Evidence for maintenance of sex determinants but not of sexual stages in red yeasts, a group of early diverged basidiomycetes.

Authors:  Marco A Coelho; Paula Gonçalves; José P Sampaio
Journal:  BMC Evol Biol       Date:  2011-08-31       Impact factor: 3.260

9.  REAPR: a universal tool for genome assembly evaluation.

Authors:  Martin Hunt; Taisei Kikuchi; Mandy Sanders; Chris Newbold; Matthew Berriman; Thomas D Otto
Journal:  Genome Biol       Date:  2013-05-27       Impact factor: 13.583

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  1 in total

1.  Four Inducible Promoters for Controlled Gene Expression in the Oleaginous Yeast Rhodotorula toruloides.

Authors:  Alexander M B Johns; John Love; Stephen J Aves
Journal:  Front Microbiol       Date:  2016-10-21       Impact factor: 5.640

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.