Literature DB >> 18039716

CandidaDB: a multi-genome database for Candida species and related Saccharomycotina.

Tristan Rossignol1, Pierre Lechat, Christina Cuomo, Qiandong Zeng, Ivan Moszer, Christophe d'Enfert.   

Abstract

CandidaDB (http://genodb.pasteur.fr/CandidaDB) was established in 2002 to provide the first genomic database for the human fungal pathogen Candida albicans. The availability of an increasing number of fully or partially completed genome sequences of related fungal species has opened the path for comparative genomics and prompted us to migrate CandidaDB into a multi-genome database. The new version of CandidaDB houses the latest versions of the genomes of C. albicans strains SC5314 and WO-1 along with six genome sequences from species closely related to C. albicans that all belong to the CTG clade of Saccharomycotina-Candida tropicalis, Candida (Clavispora) lusitaniae, Candida (Pichia) guillermondii, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis-and the reference Saccharomyces cerevisiae genome. CandidaDB includes sequences coding for 54 170 proteins with annotations collected from other databases, enriched with illustrations of structural features and functional domains and data of comparative analyses. In order to take advantage of the integration of multiple genomes in a unique database, new tools using pre-calculated or user-defined comparisons have been implemented that allow rapid access to comparative analysis at the genomic scale.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18039716      PMCID: PMC2238939          DOI: 10.1093/nar/gkm1010

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Candida species are the most important opportunistic fungal pathogens of humans responsible for superficial and systemic infections (1). Among these species, Candida albicans is responsible for the majority of infections, but other species are becoming increasingly common (1). Because of its predominance, C. albicans has been the focus of genomic and molecular studies over the last 20 years, becoming a model organism for other pathogenic Candida species and fungal pathogens. The C. albicans genome was made publicly available by the Stanford Genome Technology Center at the end of the 1990s and different assemblies and annotations have been released since (2–4). This has been accompanied by the implementation of two main genomic databases: CandidaDB (5) and the Candida Genome Database (6,7). As infections due to non-albicans Candida in hospitals have increased (8), research on these emerging species has recently developed. Genome sequencing projects for these species, as well as related non-pathogenic yeast species, have been completed or are nearing completion (4,9–12). The availability of numerous related genomes paves the way for comparative genomic approaches that have already contributed to our understanding of the evolutionary processes that underlie speciation in the Sachharomycotina subphylum (10,13–15). Applied to closely-related pathogenic and non-pathogenic yeast species, comparative genomics should provide insights in virulence processes. To date, most yeast genomes are available at different databases and there is no resource that enables online comparative analysis. The current aim of the CandidaDB database is to provide such a comparative resource for species of the CTG clade of the subphylum Saccharomycotina that is characterized by the translation of the CUG codon into serine instead of leucine. The CTG clade includes C. albicans and several of the most important human pathogenic fungi (16–18). CandidaDB provides genome sequences of four pathogenic [C. albicans, Candida tropicalis, Candida (Clavispora) lusitaniae, Candida (Pichia) guillermondii] and three non-pathogenic (Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis) species belonging to the CTG clade (Table 1). It also provides the Saccharomyces cerevisiae genome sequence as a reference (19). CandidaDB includes sequences coding for 54 170 proteins with annotations collected from other databases. It has been enriched with illustrations of structural features and functional domains and tools for sequence comparisons and analysis. Moreover, new tools for comparative genomics have been implemented in order to take advantage of the integration of multiple genomes in a unique database. Importantly, pre-calculated comparisons provide rapid access to comparative analysis at the protein and genomic scale.
Table 1.

Characteristics of the nine genomes available in the current release of CandidaDB

SpeciesStrainNumber of proteinsNumber of chromosomes and/or supercontigsStatus and release dateSequencing center/Database repositoryDatabase links
Candida albicansSC531460988Draft assembly 13 September 2006CGDhttp://www.candidagenome.org/
Candida albicansWO1615916Draft assembly 15 March 2006Broad Institutehttp://www.broad.mit.edu/annotation/genome/candida_albicans/
Candida guilliermondiiATCC626059209Draft assembly 15 March 2006Broad Institutehttp://www.broad.mit.edu/annotation/genome/candida_guilliermondii/
Candida tropicalisMYA-3404625823Draft assembly 12 June 2006Broad Institutehttp://www.broad.mit.edu/annotation/genome/candida_tropicalis/
Candida lusitaniaeATCC4272059419Draft assembly 25 January. 2006Broad Institutehttp://www.broad.mit.edu/annotation/genome/candida_lusitaniae/
Debaryomyces hanseniiCBS76763187Complete 3 July 2004Génolevureshttp://cbi.labri.fr/Genolevures/elt/DEHA
Pichia stipitisCBS 605458169Complete 17 April 2007JGIhttp://genome.jgi-psf.org/Picst3/Picst3.home.html
Lodderomyces elongisporusNRLL YB-4239580227Draft assembly 12 June 2006Broad Institutehttp://www.broad.mit.edu/annotation/genome/lodderomyces_elongisporus/
Saccharomyces cerevisiaeS288C585816Complete 27 March 2007SGDhttp://www.yeastgenome.org/
Total954 170124
Characteristics of the nine genomes available in the current release of CandidaDB

SOURCE DATA AND COMPATIBILITY WITH OTHER DATABASES

Eight publicly available genome sequences of seven closely related species belonging to the CTG clade are included in the new release of CandidaDB: the genomes of C. albicans strains SC5314 (2) and WO1 (20); three genomes of other pathogenic species, C. tropicalis strain MYA-3404 (21), C. lusitaniae strain ATCC42720 (22) and C. guilliermondii strain ATCC6260 (23); and the genomes of three non-pathogenic species, L. elongisporus strain NRLL YB-4239 (24), an ascososporogenous species, D. hansenii strain CBS767 (10), a halotolerant yeast found in fish and salted dairy products that have a role in agro-food processes and Pichia stipitis strain CBS6054 (12), a xylose fermenting yeast. The new release of CandidaDB also includes the S. cerevisiae strain S288C genome (19) in order to take advantage of the high level of annotation provided for this species that is not part of the CTG clade but is part of the Saccharomycotina subphylum (17). These genome sequences and associated annotations were obtained from the sources indicated in Table 1 that summarizes the general information for the nine genomes available in the current version of CandidaDB. The new version of CandidaDB uses Assembly 20 of the genome sequence of C. albicans strain SC5314 genome available at the Candida Genome Database (CGD) (4,7). While previous releases of CandidaDB used annotations contributed by the Galar Fungail consortium (5), CandidaDB now uses sequences, descriptions, accession numbers and annotations available at CGD which is the reference depository site for C. albicans. This allows homogenization of the nomenclature for this organism and will simplify literature curation. Accession numbers of previous CandidaDB releases are still available as synonyms. The genomes of P. stipitis, D. hansenii and S. cerevisiae available through CandidaDB are considered completed and have been published (10,12,19), while the other genomes are draft assemblies, close to completion and with a low number of contigs. CandidaDB aims to follow the usual accession number for Open Reading Frames (ORFs) provided by the institutions which performed the sequences, for better clarity, inter-database relations and faster update procedures.

IMPLEMENTATION

CandidaDB is based on the general data frame called GenoList (25). GenoList is an integrated environment for multiple genomes based on a relational database run through a web user interface that provides comparative genomic and proteomic tools in complement to the gene descriptions. Structure and design are detailed in the accompanying paper (25). GenoList has been originally developed as a multigenome database for comparative analysis of bacterial genomes (25) and has been adapted to eukaryotes in order to manage the CandidaDB database. When connecting to CandidaDB, users are prompted to register and provide a login and password. Although this is optional and no tracking of the registered users is performed, it allows users to specify parameters for CandidaDB usage (see subsequently) and maintain these parameters upon return to the database. Upon registered or unregistered login, users have access to a web interface that is composed of a main window allowing different forms of queries and analysis at the gene, genome and multi-genome scale. Results of the queries are presented in the main window as gene lists. Genes can be accessed through a gene–specific window providing reports, a dynamic map of the genomic environment, pre-computed data of comparative proteomic analysis and tools for sequence analysis and downloads as described subsequently. An important component of CandidaDB is the possibility for users to select those genomes that they wish to query from the list of all available genomes. Users can define a favourite genome, a query list of genomes and a comparative list of genomes. Through these selections, CandidaDB can be made a database focused on a favourite organism and provide comparative data for genomes of the comparative list only. The query list is used in search and comparative tools as described subsequently. Several comparative and query lists can be specified and remain accessible to registered users upon return to the database.

ANALYSIS AND VISUALIZATION TOOLS

The migration of CandidaDB to the GenoList multi-genome environment combined with the integration of nine genomes expands the possibilities for genome and proteome analysis and allows access to comparative genomics. Search options are identical to those available in the previous version of CandidaDB: the left panel of the main window allows the search by gene names and synonyms, accession numbers, text and location in the set of genomes defined by the user (favourite organism, query or comparative lists) or in all genomes present in CandidaDB. BLAST search (26) and pattern search tools are also accessible from the left panel as well as two new tools for comparative genomic analysis, FindTarget and DiffTool. FindTarget (27) allows the user to identify genes from a given genome (‘Query genome’, the user-defined favourite organism) that, based on tuneable criteria (percentage of identity, E-value, etc.), are specifically present in a set of genomes (‘Reference genomes’, by default the user-defined query list) and, optionally, absent in another set of genomes (‘Exclusion genomes’, by default the user-defined comparative list). The algorithm makes use of pre-computed BLASTP best hits obtained upon systematic comparisons of all protein versus all proteins available in CandidaDB. DiffTool (28) allows the identification of protein families whose components are shared by a set of organisms (‘Reference genomes’) as compared to another set of organisms (‘Exclusion genomes’). Protein families have been pre-computed in CandidaDB using data of systematic BLASTP comparisons of every protein versus all proteins. Several family sets are available according to the criteria used in the clustering procedure (e.g. proteins that share at least 40, 50 or 60% sequence similarity over 80% of the protein length). Results are provided in the main window as a list of annotated protein families, each linked to the list of included proteins and a ClustalW multiple alignment (29). Results of the different searches are displayed in the main window as gene lists, each gene being linked to a specific page that provides description, annotation and a graphical view of the genomic environment of the gene (Figure 1). Pre-computed results from comparative analysis for protein families (DiffTool) and best hits (FindTarget) and a regularly updated BLASTP comparison to the non-redundant protein databank (30) are systematically available (Figure 1). ClustalW pairwise or multiple alignments with best hits found in the genomes of the comparative list are provided. A list of bi-directional best hits (BDBH) is also provided. Additional protein features are displayed graphically showing signal peptide and membrane-spanning domains predicted using the Phobius software (31) and PFAM domains (32) (Figure 1). Direct links to relevant databases are listed in the cross-references panel (Figure 1). Tuneable, not pre-defined, search tools (BLAST, DiffTool, FindTarget) and sequence retrieval tools are accessible in the Analysis and Sequence tabs of this gene window, respectively.
Figure 1.

Snapshot of a gene window for the C. albicans OPT1 gene. The gene window displays annotation data, a dynamic map of the genomic region surrounding the OPT1 gene, access to a protein cluster including the Opt1 protein, a list of best hits identified in genomes of the comparative list with links to pairwise and multiple ClustalW alignments, a list of bi-directional best hits in other genomes available in CandidaDB, a graphical representation of predicted signal peptide, transmembrane domains and PFAM domains, and links to relevant pages in other databases. Other tabs in the gene window allow access to dynamic analysis tools and tools for sequence retrieval.

Snapshot of a gene window for the C. albicans OPT1 gene. The gene window displays annotation data, a dynamic map of the genomic region surrounding the OPT1 gene, access to a protein cluster including the Opt1 protein, a list of best hits identified in genomes of the comparative list with links to pairwise and multiple ClustalW alignments, a list of bi-directional best hits in other genomes available in CandidaDB, a graphical representation of predicted signal peptide, transmembrane domains and PFAM domains, and links to relevant pages in other databases. Other tabs in the gene window allow access to dynamic analysis tools and tools for sequence retrieval.

CONCLUSION AND PERSPECTIVES

The integration in a single database of a large number of genome sequences from related yeast species provides an unprecedented tool for comparative genomics of yeasts. The new version of CandidaDB aims to provide information complementary to that available at the Candida Genome Database by implementing comparative genomic tools and by providing data on functionally-relevant protein domains which were not directly available yet. Access to these data is facilitated by the use of pre-computed multi-genome analysis that are normally CPU-intensive. Yet CandidaDB provides the ability to perform similar queries with user-defined parameters avoiding the limitations of these static results. The user-defined lists of genomes allow the user to limit searches and results to selected organisms, an option that will be increasingly useful when a larger number of genomes becomes available through the database. CandidaDB is a convenient entry point for the community working on other Candida species than C. albicans since any Candida genome can be used as the favourite genome. It should be helpful for those who are working with genomes that are still undergoing annotation. In this regard, the comparative tools available in CandidaDB can be used to refine some of the gene models provided by sequencing centers. They can also be used to focus functional genomic studies that should eventually identify gain or loss of functions that underlie the differences in pathogenicity, virulence and morphogenesis observed between the different species of the CTG clade of Saccharomycotina. Other genomes of species within the CTG clade, e.g. C. parapsilosis and C. dubliniensis, have been recently sequenced and are undergoing annotation. The same is true for species of the Saccharomycotina that do not belong to the CTG clade. Our aim is to incorporate these genomes into CandidaDB as they become publicly available, to update sequences and annotations in a regular manner and to provide new tools for comparative and structural analysis. In particular, the incorporation in CandidaDB of a synteny visualisation tool will greatly help in the interpretation of the comparative data outputs.
  32 in total

1.  FindTarget: software for subtractive genome analysis.

Authors:  Farid Chetouani; Philippe Glaser; Frank Kunst
Journal:  Microbiology       Date:  2001-10       Impact factor: 2.777

2.  DiffTool: building, visualizing and querying protein clusters.

Authors:  Farid Chetouani; Philippe Glaser; Frank Kunst
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

3.  Lodderomyces, a new genus of the Saccharomycetaceae.

Authors:  J P van der Walt
Journal:  Antonie Van Leeuwenhoek       Date:  1966       Impact factor: 2.271

Review 4.  Non-albicans Candida spp. causing fungaemia: pathogenicity and antifungal resistance.

Authors:  V Krcmery; A J Barnes
Journal:  J Hosp Infect       Date:  2002-04       Impact factor: 3.926

5.  Phylogeny and evolution of medical species of Candida and related taxa: a multigenic analysis.

Authors:  Stephanie Diezmann; Cymon J Cox; Gabriele Schönian; Rytas J Vilgalys; Thomas G Mitchell
Journal:  J Clin Microbiol       Date:  2004-12       Impact factor: 5.948

6.  Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast Pichia stipitis.

Authors:  Thomas W Jeffries; Igor V Grigoriev; Jane Grimwood; José M Laplaza; Andrea Aerts; Asaf Salamov; Jeremy Schmutz; Erika Lindquist; Paramvir Dehal; Harris Shapiro; Yong-Su Jin; Volkmar Passoth; Paul M Richardson
Journal:  Nat Biotechnol       Date:  2007-03-04       Impact factor: 54.908

7.  Development of two species-specific fingerprinting probes for broad computer-assisted epidemiological studies of Candida tropicalis.

Authors:  S Joly; C Pujol; K Schröppel; D R Soll
Journal:  J Clin Microbiol       Date:  1996-12       Impact factor: 5.948

Review 8.  Life with 6000 genes.

Authors:  A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver
Journal:  Science       Date:  1996-10-25       Impact factor: 47.728

9.  Development of resistance to amphotericin B in Candida lusitaniae infecting a human.

Authors:  D Pappagianis; M S Collins; R Hector; J Remington
Journal:  Antimicrob Agents Chemother       Date:  1979-08       Impact factor: 5.191

10.  The diploid genome sequence of Candida albicans.

Authors:  Ted Jones; Nancy A Federspiel; Hiroji Chibana; Jan Dungan; Sue Kalman; B B Magee; George Newport; Yvonne R Thorstenson; Nina Agabian; P T Magee; Ronald W Davis; Stewart Scherer
Journal:  Proc Natl Acad Sci U S A       Date:  2004-05-03       Impact factor: 11.205

View more
  14 in total

Review 1.  Utilization of multiple "omics" studies in microbial pathogeny for microbiology insights.

Authors:  Viroj Wiwanitkit
Journal:  Asian Pac J Trop Biomed       Date:  2013-04

2.  Approaches to Fungal Genome Annotation.

Authors:  Brian J Haas; Qiandong Zeng; Matthew D Pearson; Christina A Cuomo; Jennifer R Wortman
Journal:  Mycology       Date:  2011-10-03

3.  Distinct roles of Candida albicans-specific genes in host-pathogen interactions.

Authors:  Duncan Wilson; François L Mayer; Pedro Miramón; Francesco Citiulo; Silvia Slesiona; Ilse D Jacobsen; Bernhard Hube
Journal:  Eukaryot Cell       Date:  2014-03-07

4.  The Candida albicans GAP gene family encodes permeases involved in general and specific amino acid uptake and sensing.

Authors:  Lucie Kraidlova; Griet Van Zeebroeck; Patrick Van Dijck; Hana Sychrová
Journal:  Eukaryot Cell       Date:  2011-07-15

5.  Identification and functional characterization of Candida albicans mannose-ethanolamine phosphotransferase (Mcd4p).

Authors:  Satoru Hasegawa; Yuimi Yamada; Noboru Iwanami; Yusuke Nakayama; Hironobu Nakayama; Shun Iwatani; Takahiro Oura; Susumu Kajiwara
Journal:  Curr Genet       Date:  2019-05-09       Impact factor: 3.886

6.  Candida guilliermondii: biotechnological applications, perspectives for biological control, emerging clinical importance and recent advances in genetics.

Authors:  Nicolas Papon; Vincenzo Savini; Arnaud Lanoue; Andrew J Simkin; Joël Crèche; Nathalie Giglioli-Guivarc'h; Marc Clastre; Vincent Courdavault; Andriy A Sibirny
Journal:  Curr Genet       Date:  2013-04-25       Impact factor: 3.886

7.  An extensive circuitry for cell wall regulation in Candida albicans.

Authors:  Jill R Blankenship; Saranna Fanning; Jessica J Hamaker; Aaron P Mitchell
Journal:  PLoS Pathog       Date:  2010-02-05       Impact factor: 6.823

8.  Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser.

Authors:  David A Fitzpatrick; Peadar O'Gaora; Kevin P Byrne; Geraldine Butler
Journal:  BMC Genomics       Date:  2010-05-10       Impact factor: 3.969

9.  Genome sequence of the recombinant protein production host Pichia pastoris.

Authors:  Kristof De Schutter; Yao-Cheng Lin; Petra Tiels; Annelies Van Hecke; Sascha Glinka; Jacqueline Weber-Lehmann; Pierre Rouzé; Yves Van de Peer; Nico Callewaert
Journal:  Nat Biotechnol       Date:  2009-05-24       Impact factor: 54.908

10.  FUNYBASE: a FUNgal phYlogenomic dataBASE.

Authors:  Sylvain Marthey; Gabriela Aguileta; François Rodolphe; Annie Gendrault; Tatiana Giraud; Elisabeth Fournier; Manuela Lopez-Villavicencio; Angélique Gautier; Marc-Henri Lebrun; Hélène Chiapello
Journal:  BMC Bioinformatics       Date:  2008-10-27       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.