Literature DB >> 18039716

CandidaDB: a multi-genome database for Candida species and related Saccharomycotina.

Tristan Rossignol¹, Pierre Lechat, Christina Cuomo, Qiandong Zeng, Ivan Moszer, Christophe d'Enfert.

Abstract

CandidaDB (http://genodb.pasteur.fr/CandidaDB) was established in 2002 to provide the first genomic database for the human fungal pathogen Candida albicans. The availability of an increasing number of fully or partially completed genome sequences of related fungal species has opened the path for comparative genomics and prompted us to migrate CandidaDB into a multi-genome database. The new version of CandidaDB houses the latest versions of the genomes of C. albicans strains SC5314 and WO-1 along with six genome sequences from species closely related to C. albicans that all belong to the CTG clade of Saccharomycotina-Candida tropicalis, Candida (Clavispora) lusitaniae, Candida (Pichia) guillermondii, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis-and the reference Saccharomyces cerevisiae genome. CandidaDB includes sequences coding for 54 170 proteins with annotations collected from other databases, enriched with illustrations of structural features and functional domains and data of comparative analyses. In order to take advantage of the integration of multiple genomes in a unique database, new tools using pre-calculated or user-defined comparisons have been implemented that allow rapid access to comparative analysis at the genomic scale.

Entities: Chemical Disease Species

Mesh：

Substances：
Fungal Proteins

Year: 2007 PMID： 18039716 PMCID： PMC2238939 DOI： 10.1093/nar/gkm1010

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Candida species are the most important opportunistic fungal pathogens of humans responsible for superficial and systemic infections (1). Among these species, Candida albicans is responsible for the majority of infections, but other species are becoming increasingly common (1). Because of its predominance, C. albicans has been the focus of genomic and molecular studies over the last 20 years, becoming a model organism for other pathogenic Candida species and fungal pathogens. The C. albicans genome was made publicly available by the Stanford Genome Technology Center at the end of the 1990s and different assemblies and annotations have been released since (2–4). This has been accompanied by the implementation of two main genomic databases: CandidaDB (5) and the Candida Genome Database (6,7). As infections due to non-albicans Candida in hospitals have increased (8), research on these emerging species has recently developed. Genome sequencing projects for these species, as well as related non-pathogenic yeast species, have been completed or are nearing completion (4,9–12). The availability of numerous related genomes paves the way for comparative genomic approaches that have already contributed to our understanding of the evolutionary processes that underlie speciation in the Sachharomycotina subphylum (10,13–15). Applied to closely-related pathogenic and non-pathogenic yeast species, comparative genomics should provide insights in virulence processes. To date, most yeast genomes are available at different databases and there is no resource that enables online comparative analysis. The current aim of the CandidaDB database is to provide such a comparative resource for species of the CTG clade of the subphylum Saccharomycotina that is characterized by the translation of the CUG codon into serine instead of leucine. The CTG clade includes C. albicans and several of the most important human pathogenic fungi (16–18). CandidaDB provides genome sequences of four pathogenic [C. albicans, Candida tropicalis, Candida (Clavispora) lusitaniae, Candida (Pichia) guillermondii] and three non-pathogenic (Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis) species belonging to the CTG clade (Table 1). It also provides the Saccharomyces cerevisiae genome sequence as a reference (19). CandidaDB includes sequences coding for 54 170 proteins with annotations collected from other databases. It has been enriched with illustrations of structural features and functional domains and tools for sequence comparisons and analysis. Moreover, new tools for comparative genomics have been implemented in order to take advantage of the integration of multiple genomes in a unique database. Importantly, pre-calculated comparisons provide rapid access to comparative analysis at the protein and genomic scale.

Table 1.

Characteristics of the nine genomes available in the current release of CandidaDB

Species	Strain	Number of proteins	Number of chromosomes and/or supercontigs	Status and release date	Sequencing center/Database repository	Database links
Candida albicans	SC5314	6098	8	Draft assembly 13 September 2006	CGD	http://www.candidagenome.org/
Candida albicans	WO1	6159	16	Draft assembly 15 March 2006	Broad Institute	http://www.broad.mit.edu/annotation/genome/candida_albicans/
Candida guilliermondii	ATCC6260	5920	9	Draft assembly 15 March 2006	Broad Institute	http://www.broad.mit.edu/annotation/genome/candida_guilliermondii/
Candida tropicalis	MYA-3404	6258	23	Draft assembly 12 June 2006	Broad Institute	http://www.broad.mit.edu/annotation/genome/candida_tropicalis/
Candida lusitaniae	ATCC42720	5941	9	Draft assembly 25 January. 2006	Broad Institute	http://www.broad.mit.edu/annotation/genome/candida_lusitaniae/
Debaryomyces hansenii	CBS767	6318	7	Complete 3 July 2004	Génolevures	http://cbi.labri.fr/Genolevures/elt/DEHA
Pichia stipitis	CBS 6054	5816	9	Complete 17 April 2007	JGI	http://genome.jgi-psf.org/Picst3/Picst3.home.html
Lodderomyces elongisporus	NRLL YB-4239	5802	27	Draft assembly 12 June 2006	Broad Institute	http://www.broad.mit.edu/annotation/genome/lodderomyces_elongisporus/
Saccharomyces cerevisiae	S288C	5858	16	Complete 27 March 2007	SGD	http://www.yeastgenome.org/
Total	9	54 170	124

Characteristics of the nine genomes available in the current release of CandidaDB

SOURCE DATA AND COMPATIBILITY WITH OTHER DATABASES

Eight publicly available genome sequences of seven closely related species belonging to the CTG clade are included in the new release of CandidaDB: the genomes of C. albicans strains SC5314 (2) and WO1 (20); three genomes of other pathogenic species, C. tropicalis strain MYA-3404 (21), C. lusitaniae strain ATCC42720 (22) and C. guilliermondii strain ATCC6260 (23); and the genomes of three non-pathogenic species, L. elongisporus strain NRLL YB-4239 (24), an ascososporogenous species, D. hansenii strain CBS767 (10), a halotolerant yeast found in fish and salted dairy products that have a role in agro-food processes and Pichia stipitis strain CBS6054 (12), a xylose fermenting yeast. The new release of CandidaDB also includes the S. cerevisiae strain S288C genome (19) in order to take advantage of the high level of annotation provided for this species that is not part of the CTG clade but is part of the Saccharomycotina subphylum (17). These genome sequences and associated annotations were obtained from the sources indicated in Table 1 that summarizes the general information for the nine genomes available in the current version of CandidaDB. The new version of CandidaDB uses Assembly 20 of the genome sequence of C. albicans strain SC5314 genome available at the Candida Genome Database (CGD) (4,7). While previous releases of CandidaDB used annotations contributed by the Galar Fungail consortium (5), CandidaDB now uses sequences, descriptions, accession numbers and annotations available at CGD which is the reference depository site for C. albicans. This allows homogenization of the nomenclature for this organism and will simplify literature curation. Accession numbers of previous CandidaDB releases are still available as synonyms. The genomes of P. stipitis, D. hansenii and S. cerevisiae available through CandidaDB are considered completed and have been published (10,12,19), while the other genomes are draft assemblies, close to completion and with a low number of contigs. CandidaDB aims to follow the usual accession number for Open Reading Frames (ORFs) provided by the institutions which performed the sequences, for better clarity, inter-database relations and faster update procedures.

IMPLEMENTATION

CandidaDB is based on the general data frame called GenoList (25). GenoList is an integrated environment for multiple genomes based on a relational database run through a web user interface that provides comparative genomic and proteomic tools in complement to the gene descriptions. Structure and design are detailed in the accompanying paper (25). GenoList has been originally developed as a multigenome database for comparative analysis of bacterial genomes (25) and has been adapted to eukaryotes in order to manage the CandidaDB database. When connecting to CandidaDB, users are prompted to register and provide a login and password. Although this is optional and no tracking of the registered users is performed, it allows users to specify parameters for CandidaDB usage (see subsequently) and maintain these parameters upon return to the database. Upon registered or unregistered login, users have access to a web interface that is composed of a main window allowing different forms of queries and analysis at the gene, genome and multi-genome scale. Results of the queries are presented in the main window as gene lists. Genes can be accessed through a gene–specific window providing reports, a dynamic map of the genomic environment, pre-computed data of comparative proteomic analysis and tools for sequence analysis and downloads as described subsequently. An important component of CandidaDB is the possibility for users to select those genomes that they wish to query from the list of all available genomes. Users can define a favourite genome, a query list of genomes and a comparative list of genomes. Through these selections, CandidaDB can be made a database focused on a favourite organism and provide comparative data for genomes of the comparative list only. The query list is used in search and comparative tools as described subsequently. Several comparative and query lists can be specified and remain accessible to registered users upon return to the database.

ANALYSIS AND VISUALIZATION TOOLS

The migration of CandidaDB to the GenoList multi-genome environment combined with the integration of nine genomes expands the possibilities for genome and proteome analysis and allows access to comparative genomics. Search options are identical to those available in the previous version of CandidaDB: the left panel of the main window allows the search by gene names and synonyms, accession numbers, text and location in the set of genomes defined by the user (favourite organism, query or comparative lists) or in all genomes present in CandidaDB. BLAST search (26) and pattern search tools are also accessible from the left panel as well as two new tools for comparative genomic analysis, FindTarget and DiffTool. FindTarget (27) allows the user to identify genes from a given genome (‘Query genome’, the user-defined favourite organism) that, based on tuneable criteria (percentage of identity, E-value, etc.), are specifically present in a set of genomes (‘Reference genomes’, by default the user-defined query list) and, optionally, absent in another set of genomes (‘Exclusion genomes’, by default the user-defined comparative list). The algorithm makes use of pre-computed BLASTP best hits obtained upon systematic comparisons of all protein versus all proteins available in CandidaDB. DiffTool (28) allows the identification of protein families whose components are shared by a set of organisms (‘Reference genomes’) as compared to another set of organisms (‘Exclusion genomes’). Protein families have been pre-computed in CandidaDB using data of systematic BLASTP comparisons of every protein versus all proteins. Several family sets are available according to the criteria used in the clustering procedure (e.g. proteins that share at least 40, 50 or 60% sequence similarity over 80% of the protein length). Results are provided in the main window as a list of annotated protein families, each linked to the list of included proteins and a ClustalW multiple alignment (29). Results of the different searches are displayed in the main window as gene lists, each gene being linked to a specific page that provides description, annotation and a graphical view of the genomic environment of the gene (Figure 1). Pre-computed results from comparative analysis for protein families (DiffTool) and best hits (FindTarget) and a regularly updated BLASTP comparison to the non-redundant protein databank (30) are systematically available (Figure 1). ClustalW pairwise or multiple alignments with best hits found in the genomes of the comparative list are provided. A list of bi-directional best hits (BDBH) is also provided. Additional protein features are displayed graphically showing signal peptide and membrane-spanning domains predicted using the Phobius software (31) and PFAM domains (32) (Figure 1). Direct links to relevant databases are listed in the cross-references panel (Figure 1). Tuneable, not pre-defined, search tools (BLAST, DiffTool, FindTarget) and sequence retrieval tools are accessible in the Analysis and Sequence tabs of this gene window, respectively.

Figure 1.

Snapshot of a gene window for the C. albicans OPT1 gene. The gene window displays annotation data, a dynamic map of the genomic region surrounding the OPT1 gene, access to a protein cluster including the Opt1 protein, a list of best hits identified in genomes of the comparative list with links to pairwise and multiple ClustalW alignments, a list of bi-directional best hits in other genomes available in CandidaDB, a graphical representation of predicted signal peptide, transmembrane domains and PFAM domains, and links to relevant pages in other databases. Other tabs in the gene window allow access to dynamic analysis tools and tools for sequence retrieval.

CONCLUSION AND PERSPECTIVES

The integration in a single database of a large number of genome sequences from related yeast species provides an unprecedented tool for comparative genomics of yeasts. The new version of CandidaDB aims to provide information complementary to that available at the Candida Genome Database by implementing comparative genomic tools and by providing data on functionally-relevant protein domains which were not directly available yet. Access to these data is facilitated by the use of pre-computed multi-genome analysis that are normally CPU-intensive. Yet CandidaDB provides the ability to perform similar queries with user-defined parameters avoiding the limitations of these static results. The user-defined lists of genomes allow the user to limit searches and results to selected organisms, an option that will be increasingly useful when a larger number of genomes becomes available through the database. CandidaDB is a convenient entry point for the community working on other Candida species than C. albicans since any Candida genome can be used as the favourite genome. It should be helpful for those who are working with genomes that are still undergoing annotation. In this regard, the comparative tools available in CandidaDB can be used to refine some of the gene models provided by sequencing centers. They can also be used to focus functional genomic studies that should eventually identify gain or loss of functions that underlie the differences in pathogenicity, virulence and morphogenesis observed between the different species of the CTG clade of Saccharomycotina. Other genomes of species within the CTG clade, e.g. C. parapsilosis and C. dubliniensis, have been recently sequenced and are undergoing annotation. The same is true for species of the Saccharomycotina that do not belong to the CTG clade. Our aim is to incorporate these genomes into CandidaDB as they become publicly available, to update sequences and annotations in a regular manner and to provide new tools for comparative and structural analysis. In particular, the incorporation in CandidaDB of a synteny visualisation tool will greatly help in the interpretation of the comparative data outputs.

32 in total

1. FindTarget: software for subtractive genome analysis.

Authors: Farid Chetouani; Philippe Glaser; Frank Kunst
Journal: Microbiology Date: 2001-10 Impact factor: 2.777

2. DiffTool: building, visualizing and querying protein clusters.

Authors: Farid Chetouani; Philippe Glaser; Frank Kunst
Journal: Bioinformatics Date: 2002-08 Impact factor: 6.937

3. Lodderomyces, a new genus of the Saccharomycetaceae.

Authors: J P van der Walt
Journal: Antonie Van Leeuwenhoek Date: 1966 Impact factor: 2.271

Review 4. Non-albicans Candida spp. causing fungaemia: pathogenicity and antifungal resistance.

Authors: V Krcmery; A J Barnes
Journal: J Hosp Infect Date: 2002-04 Impact factor: 3.926

5. Phylogeny and evolution of medical species of Candida and related taxa: a multigenic analysis.

Authors: Stephanie Diezmann; Cymon J Cox; Gabriele Schönian; Rytas J Vilgalys; Thomas G Mitchell
Journal: J Clin Microbiol Date: 2004-12 Impact factor: 5.948

6. Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast Pichia stipitis.

Authors: Thomas W Jeffries; Igor V Grigoriev; Jane Grimwood; José M Laplaza; Andrea Aerts; Asaf Salamov; Jeremy Schmutz; Erika Lindquist; Paramvir Dehal; Harris Shapiro; Yong-Su Jin; Volkmar Passoth; Paul M Richardson
Journal: Nat Biotechnol Date: 2007-03-04 Impact factor: 54.908

7. Development of two species-specific fingerprinting probes for broad computer-assisted epidemiological studies of Candida tropicalis.

Authors: S Joly; C Pujol; K Schröppel; D R Soll
Journal: J Clin Microbiol Date: 1996-12 Impact factor: 5.948

Review 8. Life with 6000 genes.

Authors: A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver
Journal: Science Date: 1996-10-25 Impact factor: 47.728

9. Development of resistance to amphotericin B in Candida lusitaniae infecting a human.

Authors: D Pappagianis; M S Collins; R Hector; J Remington
Journal: Antimicrob Agents Chemother Date: 1979-08 Impact factor: 5.191

10. The diploid genome sequence of Candida albicans.

Authors: Ted Jones; Nancy A Federspiel; Hiroji Chibana; Jan Dungan; Sue Kalman; B B Magee; George Newport; Yvonne R Thorstenson; Nina Agabian; P T Magee; Ronald W Davis; Stewart Scherer
Journal: Proc Natl Acad Sci U S A Date: 2004-05-03 Impact factor: 11.205

14 in total

Review 1. Utilization of multiple "omics" studies in microbial pathogeny for microbiology insights.

Authors: Viroj Wiwanitkit
Journal: Asian Pac J Trop Biomed Date: 2013-04

2. Approaches to Fungal Genome Annotation.

Authors: Brian J Haas; Qiandong Zeng; Matthew D Pearson; Christina A Cuomo; Jennifer R Wortman
Journal: Mycology Date: 2011-10-03

3. Distinct roles of Candida albicans-specific genes in host-pathogen interactions.

Authors: Duncan Wilson; François L Mayer; Pedro Miramón; Francesco Citiulo; Silvia Slesiona; Ilse D Jacobsen; Bernhard Hube
Journal: Eukaryot Cell Date: 2014-03-07

4. The Candida albicans GAP gene family encodes permeases involved in general and specific amino acid uptake and sensing.

Authors: Lucie Kraidlova; Griet Van Zeebroeck; Patrick Van Dijck; Hana Sychrová
Journal: Eukaryot Cell Date: 2011-07-15

5. Identification and functional characterization of Candida albicans mannose-ethanolamine phosphotransferase (Mcd4p).

Authors: Satoru Hasegawa; Yuimi Yamada; Noboru Iwanami; Yusuke Nakayama; Hironobu Nakayama; Shun Iwatani; Takahiro Oura; Susumu Kajiwara
Journal: Curr Genet Date: 2019-05-09 Impact factor: 3.886

6. Candida guilliermondii: biotechnological applications, perspectives for biological control, emerging clinical importance and recent advances in genetics.

Authors: Nicolas Papon; Vincenzo Savini; Arnaud Lanoue; Andrew J Simkin; Joël Crèche; Nathalie Giglioli-Guivarc'h; Marc Clastre; Vincent Courdavault; Andriy A Sibirny
Journal: Curr Genet Date: 2013-04-25 Impact factor: 3.886

7. An extensive circuitry for cell wall regulation in Candida albicans.

Authors: Jill R Blankenship; Saranna Fanning; Jessica J Hamaker; Aaron P Mitchell
Journal: PLoS Pathog Date: 2010-02-05 Impact factor: 6.823

8. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser.

Authors: David A Fitzpatrick; Peadar O'Gaora; Kevin P Byrne; Geraldine Butler
Journal: BMC Genomics Date: 2010-05-10 Impact factor: 3.969

9. Genome sequence of the recombinant protein production host Pichia pastoris.

Authors: Kristof De Schutter; Yao-Cheng Lin; Petra Tiels; Annelies Van Hecke; Sascha Glinka; Jacqueline Weber-Lehmann; Pierre Rouzé; Yves Van de Peer; Nico Callewaert
Journal: Nat Biotechnol Date: 2009-05-24 Impact factor: 54.908

10. FUNYBASE: a FUNgal phYlogenomic dataBASE.

Authors: Sylvain Marthey; Gabriela Aguileta; François Rodolphe; Annie Gendrault; Tatiana Giraud; Elisabeth Fournier; Manuela Lopez-Villavicencio; Angélique Gautier; Marc-Henri Lebrun; Hélène Chiapello
Journal: BMC Bioinformatics Date: 2008-10-27 Impact factor: 3.169