| Literature DB >> 19465905 |
Geraldine Butler1, Matthew D Rasmussen, Michael F Lin, Manuel A S Santos, Sharadha Sakthikumar, Carol A Munro, Esther Rheinbay, Manfred Grabherr, Anja Forche, Jennifer L Reedy, Ino Agrafioti, Martha B Arnaud, Steven Bates, Alistair J P Brown, Sascha Brunke, Maria C Costanzo, David A Fitzpatrick, Piet W J de Groot, David Harris, Lois L Hoyer, Bernhard Hube, Frans M Klis, Chinnappa Kodira, Nicola Lennard, Mary E Logue, Ronny Martin, Aaron M Neiman, Elissavet Nikolaou, Michael A Quail, Janet Quinn, Maria C Santos, Florian F Schmitzberger, Gavin Sherlock, Prachi Shah, Kevin A T Silverstein, Marek S Skrzypek, David Soll, Rodney Staggs, Ian Stansfield, Michael P H Stumpf, Peter E Sudbery, Thyagarajan Srikantha, Qiandong Zeng, Judith Berman, Matthew Berriman, Joseph Heitman, Neil A R Gow, Michael C Lorenz, Bruce W Birren, Manolis Kellis, Christina A Cuomo.
Abstract
Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19465905 PMCID: PMC2834264 DOI: 10.1038/nature08064
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Phylogeny of sequenced Candida and Saccharomyces clade species. Tree topology and branch lengths were inferred with MrBayes (see supplementary methods S5). Posterior probabilities are indicated for each branch. The * marks a branch that was constrained based on syntenic conservation40.
Candida genome features
| Species | Genome size (Mb) | %GC | Genes (#) | Gene size avg. (bp) | Intergenic avg. (bp) | Ploidy | Pathogen |
|---|---|---|---|---|---|---|---|
| 14.4 | 33.5% | 6,159 | 1,444 | 921 | diploid | ++ | |
| 14.3 | 33.5% | 6,107 | 1,468 | 858 | diploid | ++ | |
|
| 14.5 | 33.1% | 6,258 | 1,454 | 902 | diploid | ++ |
|
| 13.1 | 38.7% | 5,733 | 1,533 | 752 | diploid | ++ |
|
| 15.4 | 37.0% | 5,802 | 1,530 | 1,174 | diploid | − |
|
| 10.6 | 43.8% | 5,920 | 1,402 | 426 | haploid | + |
|
| 12.1 | 44.5% | 5,941 | 1,382 | 770 | haploid | + |
|
| 12.2 | 36.3% | 6,318 | 1,382 | 550 | haploid | − |
C. albicans SC5314 assembly 21 and gene set dated 28-Jan-2008 downloaded from the Candida Genome Database (www.candidagenome.org); D. hansenii assembly from GenBank9. The remaining assemblies are reported as part of this work, and are available in GenBank and at http://www.broad.mit.edu/annotation/genome/candida_group/MultiHome.html
Relative level of pathogen strength (++, strong pathogen; +, moderate pathogen; −, rare pathogen).
Figure 2C. albicans WO-1 is highly homozygous. Red lines show SNPs per kb, normalized by coverage, within WO-1, and blue lines show SNPs per kb between WO-1 and SC5314. While both copies of chromosome 5 have rearranged at the MRS (yellow box) in WO-1, we show this as a single chromosome to allow a haploid reference for polymorphism. Relative to SC5314, chromosomes 1, 4, and 6 are in the opposite orientation (Supplementary figure 6).
Figure 3Evolutionary effects of CUG coding. A. Average percent of genes with 0, 1, or 2 CUG and CUA codons in Candida (grey bars) and Saccharomyces (black bars). Error bars indicate standard deviations. All differences are significant with p ≤ 0.0004 (t-test) except for genes with two CUA codons genes (p=0.02). “Candida” and “Saccharomyces” here refer to the CTG and WGD clades in Figure 1, but including P. stipitis and excluding C. dubliniensis. B. CUN codon usage for all codon counts. C. Decoding rules for CUN codons in Saccharomyces and Candida.
Gene families enriched in pathogenic Candida sp.
| # Annotation | Pathogen | Nonpathogen | Pval. | Dup. | Loss | Gene | C.alb | C.tro | C.par | L.elo | C.gui | C.lus | D.han | C.gla | Yeast |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 GPI-family 18 (Hyr/Iff-like) | 56 | 10 | 1.4E-16 | 52 | 11 | 16.2 | 11 | 18 | 17 | 9 | 3 | 7 | 1 | 0 | 0.0 |
| 2 Leucine-rich repeat (IFA/FGR38-like) | 34 | 0 | 4.2E-16 | 32 | 5 | 18.3 | 33 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 |
| 3 Ferric reductase family | 45 | 10 | 1.9E-12 | 30 | 25 | 2.5 | 12 | 19 | 7 | 7 | 3 | 4 | 2 | 0 | 0.1 |
| 4 Reductase family | 43 | 11 | 3.2E-11 | 31 | 30 | 2.3 | 7 | 12 | 9 | 6 | 13 | 2 | 4 | 0 | 0.1 |
| 5 GPI-family 17 (ALS-like Adhesins) | 31 | 5 | 4.4E-10 | 29 | 4 | 20.5 | 8 | 16 | 5 | 4 | 2 | 0 | 1 | 0 | 0.0 |
| 6 GPI-family 13 (Pga30-like) | 34 | 7 | 5.0E-10 | 25 | 5 | 14.8 | 12 | 14 | 6 | 6 | 1 | 1 | 1 | 0 | 0.0 |
| 7 Unclassified | 20 | 0 | 9.0E-10 | 13 | 3 | 15.9 | 9 | 9 | 0 | 0 | 2 | 0 | 0 | 0 | 0.0 |
| 8 Cell wall mannoprotein biosynthesis | 38 | 18 | 7.2E-07 | 19 | 34 | 2.1 | 8 | 7 | 8 | 8 | 11 | 4 | 9 | 0 | 0.1 |
| 9 Major facilitator transporters | 25 | 7 | 9.2E-07 | 14 | 17 | 2.0 | 3 | 3 | 7 | 3 | 10 | 2 | 4 | 0 | 0.0 |
| 10 Oligopeptide transporters | 31 | 13 | 2.2E-06 | 23 | 11 | 6.7 | 6 | 9 | 9 | 4 | 4 | 3 | 1 | 0 | 0.9 |
| 11 Unclassified | 25 | 9 | 6.3E-06 | 15 | 6 | 11.1 | 7 | 9 | 3 | 5 | 3 | 1 | 4 | 2 | 0.2 |
| 12 Amino acid permeases | 27 | 11 | 7.7E-06 | 11 | 18 | 1.7 | 6 | 6 | 6 | 4 | 6 | 3 | 6 | 0 | 0.1 |
| 13 Sphingomyelin phosphodiesterases | 18 | 5 | 3.2E-05 | 11 | 9 | 7.4 | 4 | 5 | 4 | 2 | 3 | 2 | 1 | 0 | 0.2 |
| 14 FGR6 family (filamentous growth) | 12 | 1 | 3.3E-05 | 7 | 1 | 14.5 | 8 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0.0 |
| 15 Secreted lipases | 20 | 7 | 4.6E-05 | 17 | 8 | 9.6 | 10 | 5 | 4 | 4 | 1 | 0 | 3 | 0 | 0.0 |
| 16 Cytochrome p450 family | 34 | 21 | 5.5E-05 | 23 | 22 | 6.0 | 6 | 8 | 10 | 7 | 5 | 4 | 6 | 1 | 1.0 |
| 17 Amino acid permeases | 16 | 4 | 5.6E-05 | 14 | 10 | 1.5 | 2 | 3 | 6 | 3 | 2 | 3 | 1 | 0 | 0.0 |
| 18 Zinc-finger transcription factors | 31 | 18 | 6.2E-05 | 17 | 14 | 12.3 | 5 | 8 | 7 | 7 | 7 | 4 | 11 | 0 | 0.0 |
| 19 Unclassified | 13 | 2 | 6.3E-05 | 8 | 0 | 8.1 | 3 | 1 | 6 | 1 | 2 | 1 | 1 | 0 | 0.0 |
| 20 Predicted transmembrane family | 17 | 5 | 7.2E-05 | 9 | 2 | 7.5 | 4 | 4 | 5 | 3 | 3 | 1 | 2 | 0 | 0.0 |
| 21 Unclassified secreted family | 20 | 8 | 1.1E-04 | 7 | 6 | 9.3 | 4 | 4 | 6 | 4 | 4 | 2 | 4 | 0 | 0.0 |
Pathogen genes = total genes in family for C. albicans, C. tropicalis, C. parapsilosis, C. guilliermondii, C. lusitaniae, and C. glabrata. Nonpathogen genes = total genes in family for L. elongisporus, D. hansenii, and all Saccharomyces clade species (Figure 1) expect C. glabrata. Pval.= Pvalue of the hypergeometric test; all families shown above have a false discovery rate less than 0.05 (Supplementary text S10c). Dup., Loss = Duplications and losses (Supplementary Methods 10b). Gene rate = average mutation rate for each family (Supplementary text S10d); the average gene rate across all families is 5.8. Yeast (avg.) = average count for all Saccharomyces species.
Figure 4Conserved domains of Als and Hyr/Iff cell wall families. The N-terminal Als domain (red) and Hyr/Iff domain (green) are shown as ovals. Intragenic tandem repeats (ITRs, see Supplementary text S11) are shown as rectangles, colored to represent similar amino acid sequences.
Figure 5Organization of MTL loci in the Candida clade. A. MTLa-specific genes are shown in grey, MTLα-specific in black, and other orthologs in color. Two idiomorphs from C. albicans and C. tropicalis are shown. Arrows indicate inversions relative to C. albicans. “X” symbols show gene losses and the C. parapsilosis MTLa1 pseudogene. There is no MTL locus in L. elongisporus; gene order around the PAPa, OBPa and PIKa genes is shown. For C. guilliermondii and C. lusitaniae the genome project sequenced one idiomorph, the second was obtained by J. Reedy. In D. hansenii, PAP, OBP and PIK are separate from the fused MTL locus32. B. Placement of gene losses on the phylogenetic tree.