| Literature DB >> 16103911 |
Burkhard R Braun1, Marco van Het Hoog, Christophe d'Enfert, Mikhail Martchenko, Jan Dungan, Alan Kuo, Diane O Inglis, M Andrew Uhl, Hervé Hogues, Matthew Berriman, Michael Lorenz, Anastasia Levitin, Ursula Oberholzer, Catherine Bachewich, Doreen Harcus, Anne Marcil, Daniel Dignard, Tatiana Iouk, Rosa Zito, Lionel Frangeul, Fredj Tekaia, Kim Rutherford, Edwin Wang, Carol A Munro, Steve Bates, Neil A Gow, Lois L Hoyer, Gerwald Köhler, Joachim Morschhäuser, George Newport, Sadri Znaidi, Martine Raymond, Bernard Turcotte, Gavin Sherlock, Maria Costanzo, Jan Ihmels, Judith Berman, Dominique Sanglard, Nina Agabian, Aaron P Mitchell, Alexander D Johnson, Malcolm Whiteway, André Nantel.
Abstract
Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications.Entities:
Year: 2005 PMID: 16103911 PMCID: PMC1183520 DOI: 10.1371/journal.pgen.0010001
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Visualization of Protein Sequence Similarities
Sample from a Web page used by annotators of the C. albicans genome to visualize the significance of the best hit from whole-proteome BLASTP searches. Each putative ORF was compared to the NR database, the Candida ORF list itself (Ca19; showing results from the four top hits), and amino acid sequences from the proteomes of S. cerevisiae (Sac), S. pombe (S.p), M. grisea (Mag), N. crassa (Neu), H. sapiens (H.S), M. musculus (M.m), D. melanogaster (Dro), C. elegans (C.e), and A. thaliana (A.t). The BLASTP e-value from the top hit was converted to a color scale as indicated. Examples of C. albicans genes with interesting similarity patterns are indicated.
Features of Completed Fungal Genomes
aNumber of base pairs in genome divided by number of genes.
bNumber and proportion of proteins with no significant similarity to known proteins.
nd, not determined.
Statistics of the C. albicans Annotation
aExcluding “unknown.”
Number, Abundance Ranking, and Proportion of Gene Products Containing the Indicated Interpro Protein Domain in C. albicans and Other Eukaryotes
Numbers represent how many gene products have the given domain. Ordered ranking of each domain is given in parentheses. Percentages represent the proportion of gene products that contain at least one of the domains.
DOI: 10.1371/journal.pgen.0010001.t003
Genes from C. albicans with a Strong Homolog in the S. cerevisiae, S. pombe, A. niger, M. grisea, and N. crassa genomes but Absent from the H. sapiens and M. musculus Genomes
Frequency and Characteristics of Short Tandem Repeats in the Coding Sequences of Fungal Genomes
aSTRs with a less than 5% chance of being random
Figure 2Identification of Spurious Genes
Assessing criteria that identify candidate spurious genes in S. cerevisiae, using a reference set of known spurious genes [16].
(A) For every gene in S. cerevisiae, the average Pearson correlation coefficient with all other genes was calculated. Shown are histograms of the correlations associated with genes characterized as spurious in the reading frame conservation test ([16]; red) and all genes in the genome (black).
(B) The distribution of gene lengths is shown for genes characterized as spurious (red) and for all genes of the genome (black).
(C) Assessing the likelihood of being spurious as a function of gene length and correlation score. Shown is the proportion of spurious genes out of all genes whose length and correlation score fall into each of the intervals. The proportion is color-coded according to the color bar shown. S. cerevisiae genes with an ortholog in C. albicans were excluded from the analysis.
Genes Encoding Members of the ABC Transporter Family
aSubfamily nomenclature as proposed by Bauer et al. [42].
bPublished names are underlined.
Assembly 19 ORFs That Correspond to ALS Genes
Phospholipases in C. albicans
aPublished names are underlined.
Continued
Continued
Continued
Continued