Literature DB >> 16381905

Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts.

David Sherman¹, Pascal Durrens, Florian Iragne, Emmanuelle Beyne, Macha Nikolski, Jean-Luc Souciet.

Abstract

The Génolevures online database (http://cbi.labri.fr/Genolevures/) provides tools and data relative to 4 complete and 10 partial genome sequences determined and manually annotated by the Génolevures Consortium, to facilitate comparative genomic studies of hemiascomycetous yeasts. With their relatively small and compact genomes, yeasts offer a unique opportunity for exploring eukaryotic genome evolution. The new version of the Génolevures database provides truly complete (subtelomere to subtelomere) chromosome sequences, 25 000 protein-coding and tRNA genes, and in silico analyses for each gene element. A new feature of the database is a novel collection of conserved multi-species protein families and their mapping to metabolic pathways, coupled with an advanced search feature. Data are presented with a focus on relations between genes and genomes: conservation of genes and gene families, speciation, chromosomal reorganization and synteny. The Génolevures site includes an area for specific studies by members of its international community.

Entities: Chemical Disease Species

Mesh：

Substances：
Fungal Proteins

Year: 2006 PMID： 16381905 PMCID： PMC1347522 DOI： 10.1093/nar/gkj160

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Comparative analysis of genomes is greatly facilitated when their sequences are complete, fully assembled and carefully annotated. Detailed analysis of species- and clade-specific gain or loss of function, and expansions or contractions of gene families, provide useful insight into the mechanisms of molecular evolution and can be performed with confidence when data are complete. The Génolevures online database provides such data for complete genomes of four species from the class of Hemiascomycete yeasts, search and analysis tools for comparing these genomes and community pages for ongoing developments. New complete genomes will be added in 2006. With their relatively small and compact genomes, yeasts offer a unique opportunity to explore eukaryotic genome evolution by comparative analysis of several species. Yeasts are widely used as cell factories, for the production of beer, wine and bread and more recently of various metabolic products such as vitamins, ethanol, citric acid, lipids, etc. Yeasts can assimilate hydrocarbons (genera Candida, Yarrowia and Debaryomyces), depolymerise tannin extracts (Zygosaccharomyces rouxii) and produce hormones and vaccines in industrial quantities through heterologous gene expression. For review see Ref. (1). Several yeast species are pathogenic for humans. Among the most frequent disease agents are the Hemiascomycetes Candida albicans, Candida glabrata, Candida tropicalis and the Basidiomycete Cryptococcus neoformans. Even Saccharomyces cerevisiae may be pathogenic in immunocompromised patients (2). The most well known yeast in the Hemiascomycete class is S.cerevisiae (3), widely used as a model organism for molecular genetics and cell biology studies, and as a cell factory. As the most thoroughly-annotated genome of the small eukaryotes, it is a common reference for the annotation of other species. The hemiascomycetous yeasts represent a homogeneous phylogenetic group of eukaryotes with a relatively large diversity at the physiological and ecological levels. Comparative genomic studies within this group have proved very informative (4–7). The Génolevures program is devoted to large-scale comparisons of yeast genomes from various branches of the Hemiascomycete class, with the aim of addressing basic questions of molecular evolution such as the degrees of gene conservation, the identification of species-specific, clade-specific or class-specific genes, the distribution of genes among functional families, the rate of sequence and map divergences and mechanisms of chromosome shuffling.

COMPLETE SEQUENCING AND ANNOTATION OF YEAST GENOMES

The Genoscope and the Institut Pasteur provide high-quality sequence data at 10× or better coverage, assembled into complete chromosomes from subtelomere to subtelomere, usually with no more than one gap per chromosome. Protein-coding and tRNA genes are identified using a variety of in silico methods reported elsewhere and are manually annotated by a network of volunteer experts. Comparative analysis of four genomes was reported in (8). Ongoing Génolevures sequencing projects are reported on and included in the online database as data are released. Currently the database contains 55 693 317 nt comprising 24 147 protein-coding genes and 1124 tRNA or snRNA genes. The focus of the Génolevures database is to describe the relations between genes and genomes. We curate relations of orthology and paralogy between genes, as individuals or as members of protein families, chromosomal map reorganization and gain and loss of genes and functions. We do not provide detailed annotations of individual genes and proteins of S.cerevisiae which are already carefully maintained by MIPS and CYGD () (9) and SGD () (10) as well as in general-purpose databases such as UniProt (11) and EMBL (12).

GÉNOLEVURES PROTEIN FAMILIES

While extensive chromosomal rearrangements combined with segmental and massive duplications make comparisons of yeast genome sequences difficult (13), relations of homology between protein-coding genes can be identified despite their great diversity at the molecular level (8). Families of homologous proteins provide a powerful tool for appreciating conservation, gain and loss of function within yeast genomes. Génolevures provides a unique collection of paralogous and orthologous protein families, identified using a novel consensus clustering algorithm (M. Nikolski, manuscript submitted) applied to a complementary set of homeomorphic [sharing full-length sequence similarity and similar domain architectures, see (14)] and nonhomeomorphic systematic Smith–Waterman (15) and Blast (16) sequence alignments. Similar approaches are developed on a wider scale (14) and are complementary to these yeast-specific families.

EXPLORING GENOLEVURES DATA

The Génolevures online database is designed to help scientists gain insight into the mechanisms of eukaryotic molecular evolution by asking specific questions about the relationships between DNA and protein sequences (Figure 1; examples are shown in online Supplementary Data).

Figure 1

Links between Génolevures data and tools showing the principal workflows used by scientific users. Dark gray boxes represent dynamic, database-backed web pages, white boxes represent static web pages. Shorthand URL prefixes for these pages are shown in a monospaced font.

What genes exist, as orthologs for my favorite gene or as members of a functional class? (URL prefixes /concordance and /blast) Génolevures data can be searched by keyword, S.cerevisiae gene name, alignment to an arbitrary DNA or protein sequence and protein family identifier. A query simultaneously searches for and can return genes that have or may have a translation product, RNA and other genes that may have a transcription product only, cis-active elements and cross-genome protein families. What is known about a given chromosomic element? (URL prefixes /elt) Each element, coding or not, has a summary page with a linkable URL that presents what is known about that element: annotation, chromosomic neighborhood and inter-genome alignments (with a clickable map), membership in a protein family, sequence data and domain architecture when known. Protein family membership is indicated both with the phyletic pattern and the phylogenetic profile of the family, which provides an immediate impression of the degree of conservation of that gene in hemiascomycete yeast species. What relations exist in a protein family? (URL prefixes/fam) A protein family contains proteins with an observable evolutionary relationship that generally speaking lets one infer functional similarity. Each protein family is described on a summary page with a linkable URL that shows a cartoon of the pairwise relations between family members, linked annotations of the individual genes and a decorated multiple alignment of the family members computed with T-COFFEE (17). Links are provided to a pairwise distance matrix, a FastA file of protein sequences and a position-specific scoring matrix; the latter can be used to jump-start an iterative PSI-BLAST (16) search in public databanks for proteins similar to family members. How are the individual genomes organized? (URL prefixes/elt and /perl/gbrowse) Chromosomal maps can be explored starting from the species page (e.g. /elt/CAGL for C.glabrata or /elt/YALI for Y.lipolytica) or directly through the genome browser (18), which provides a zoomable view of a chromosomal neighborhood with annotation tracks for different gene types and sequence features, and relations to orthologs in protein families showing conservation of function and synteny (when observable). How are metabolic pathways conserved? (URL prefixes/path) Conservation of genes participating in KEGG (19) metabolic pathways may be explored, which makes it possible to emit hypotheses concerning the conservation of those pathways or the necessity of a particular gene for a given enzymatic function. Pathway conservation in a species is computed by coloring S.cerevisiae KEGG pathways with orthologs identified by Génolevures protein families. Each colored pathway contains both a summary and a detailed table of orthologs for each enzyme with useful information such as gene deletion effects. How are membrane proteins and transporters classified? (URL prefixes /yeti) The YETI classification of these proteins from André Goffeau's lab (20), which indicates evolutionary relationships traced using non-ambiguous functional and phylogenetic criteria derived from the TCDB (21) classification system, can be explored and searched across the sequenced species. Can I obtain sequence data? (URL prefixes /seq and /download) The latest release of annotated sequence data and protein family classification may be downloaded for local analysis. All Génolevures DNA and protein sequence data are also publicly available in EMBL and UniProt.

ONGOING DEVELOPMENTS

The Consortium is currently sequencing other yeast genomes from the Hemiascomycete class which will benefit from the same annotation pipeline. These genomes will be particularly helpful in refining Génolevures protein families, and in ongoing work on the construction of comparative views of cell function through inference of networks of protein–protein and protein–ligand interactions. Consortium member laboratories will continue to contribute results from a variety of focused studies, e.g. (22–24).

TECHNICAL NOTES

The Génolevures database uses a straightforward object model mapped to a relational database. Flexibility in the design is guaranteed through the use of controlled vocabularies: the Sequence Ontology (25) for DNA sequence features and GLO, our own ontology for comparative genomics (D. Sherman, unpublished data). Browsing of genomic maps and sequence features is provided by the Generic Genome Browser (18). The Blast service is provided by NCBI Blast 2.2.6 (16). The Génolevures web site uses a REST architecture internally (26) and extensively uses the BioPerl package (27) for manipulation of sequence data.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

25 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors: C Notredame; D G Higgins; J Heringa
Journal: J Mol Biol Date: 2000-09-08 Impact factor: 5.469

3. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

4. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

5. Emergence of species-specific transporters during evolution of the hemiascomycete phylum.

Authors: Benoît De Hertogh; Frédéric Hancy; André Goffeau; Philippe V Baret
Journal: Genetics Date: 2005-08-22 Impact factor: 4.562

6. MIPS: a database for genomes and protein sequences.

Authors: H W Mewes; D Frishman; U Güldener; G Mannhaupt; K Mayer; M Mokrejs; B Morgenstern; M Münsterkötter; S Rudd; B Weil
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

7. Genomic exploration of the hemiascomycetous yeasts: 18. Comparative analysis of chromosome maps and synteny with Saccharomyces cerevisiae.

Authors: B Llorente; A Malpertuy; C Neuvéglise; J de Montigny; M Aigle; F Artiguenave; G Blandin; M Bolotin-Fukuhara; E Bon; P Brottier; S Casaregola; P Durrens; C Gaillardin; A Lépingle; O Ozier-Kalogéropoulos; S Potier; W Saurin; F Tekaia; C Toffano-Nioche; M Wésolowski-Louvel; P Wincker; J Weissenbach; J Souciet; B Dujon
Journal: FEBS Lett Date: 2000-12-22 Impact factor: 4.124

8. Genomic exploration of the hemiascomycetous yeasts: 21. Comparative functional classification of genes.

Authors: C Gaillardin; G Duchateau-Nguyen; F Tekaia; B Llorente; S Casaregola; C Toffano-Nioche; M Aigle; F Artiguenave; G Blandin; M Bolotin-Fukuhara; E Bon; P Brottier; J de Montigny; B Dujon; P Durrens; A Lépingle; A Malpertuy; C Neuvéglise; O Ozier-Kalogéropoulos; S Potier; W Saurin; M Termier; M Wésolowski-Louvel; P Wincker; J Souciet; J Weissenbach
Journal: FEBS Lett Date: 2000-12-22 Impact factor: 4.124

9. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

Authors: Paul Cliften; Priya Sudarsanam; Ashwin Desikan; Lucinda Fulton; Bob Fulton; John Majors; Robert Waterston; Barak A Cohen; Mark Johnston
Journal: Science Date: 2003-05-29 Impact factor: 47.728

10. Sequencing and comparison of yeast species to identify genes and regulatory elements.

Authors: Manolis Kellis; Nick Patterson; Matthew Endrizzi; Bruce Birren; Eric S Lander
Journal: Nature Date: 2003-05-15 Impact factor: 49.962

22 in total

1. Heterogeneous expression of the virulence-related adhesin Epa1 between individual cells and strains of the pathogen Candida glabrata.

Authors: Samantha C Halliwell; Matthew C A Smith; Philippa Muston; Sara L Holland; Simon V Avery
Journal: Eukaryot Cell Date: 2011-12-02

2. The cell wall of the human pathogen Candida glabrata: differential incorporation of novel adhesin-like wall proteins.

Authors: Piet W J de Groot; Eefje A Kraneveld; Qing Yuan Yin; Henk L Dekker; Uwe Gross; Wim Crielaard; Chris G de Koster; Oliver Bader; Frans M Klis; Michael Weig
Journal: Eukaryot Cell Date: 2008-09-19

3. YeastWeb: a workset-centric web resource for gene family analysis in yeast.

Authors: Yanhui Chu; Xiaohuan Yuan; Yanqin Guo; Yufei Zhang; Yan Wu; Haifeng Liu; Dan Wu; Haihua Bao; Lixin Guan; Xiudong Jin
Journal: BMC Genomics Date: 2010-07-13 Impact factor: 3.969

4. Tuning gene expression in Yarrowia lipolytica by a hybrid promoter approach.

Authors: John Blazeck; Leqian Liu; Heidi Redden; Hal Alper
Journal: Appl Environ Microbiol Date: 2011-09-16 Impact factor: 4.792

5. The reconstruction of condition-specific transcriptional modules provides new insights in the evolution of yeast AP-1 proteins.

Authors: Christel Goudot; Catherine Etchebest; Frédéric Devaux; Gaëlle Lelandais
Journal: PLoS One Date: 2011-06-09 Impact factor: 3.240

6. Genome-wide computational prediction of tandem gene arrays: application in yeasts.

Authors: Laurence Despons; Philippe V Baret; Lionel Frangeul; Véronique Leh Louis; Pascal Durrens; Jean-Luc Souciet
Journal: BMC Genomics Date: 2010-01-21 Impact factor: 3.969

7. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

Authors: Ujjwal Maulik; Anasua Sarkar
Journal: PLoS One Date: 2013-02-15 Impact factor: 3.240

8. Genome adaptation to chemical stress: clues from comparative transcriptomics in Saccharomyces cerevisiae and Candida glabrata.

Authors: Gaëlle Lelandais; Véronique Tanty; Colette Geneix; Catherine Etchebest; Claude Jacq; Frédéric Devaux
Journal: Genome Biol Date: 2008-11-24 Impact factor: 13.583

9. Processing of predicted substrates of fungal Kex2 proteinases from Candida albicans, C. glabrata, Saccharomyces cerevisiae and Pichia pastoris.

Authors: Oliver Bader; Yannick Krauke; Bernhard Hube
Journal: BMC Microbiol Date: 2008-07-14 Impact factor: 3.605

10. Génolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes.

Authors: David J Sherman; Tiphaine Martin; Macha Nikolski; Cyril Cayla; Jean-Luc Souciet; Pascal Durrens
Journal: Nucleic Acids Res Date: 2008-11-16 Impact factor: 16.971