| Literature DB >> 22064862 |
Diane O Inglis1, Martha B Arnaud, Jonathan Binkley, Prachi Shah, Marek S Skrzypek, Farrell Wymore, Gail Binkley, Stuart R Miyasato, Matt Simison, Gavin Sherlock.
Abstract
The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22064862 PMCID: PMC3245171 DOI: 10.1093/nar/gkr945
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
CGD curation statistics
| Candida albicans | Candida glabrata | |
|---|---|---|
| Number of ORFs | 6108 | 5212 |
| Number of tRNAs | 156 | 230 |
| Verified ORFs | 1403 | 178 |
| Uncharacterized ORFs | 4705 | 5034 |
| Dubious ORFs | 152 | N/A |
| Manual GO annotations | 4697 | 4689 |
| Features with manual GO annotations | 13 707 | 2622 |
| Orthology-based GO annotations | 13 246 | 19 655 |
| Features with orthology-based GO annotations | 3099 | 4157 |
| Protein-domain (InterPro)-based GO annotations | 6048 | 5087 |
| Features with protein-domain (InterPro)-based GO annotations | 2963 | 2583 |
| Features with orthology-based description lines | 1352 | 3982 |
Figure 1.Updates to the CGD Locus Summary Page (LSP). The LSP is the hub around which the CGD gene information is organized. LSPs for both C. albicans and C. glabrata now feature new expanded orthology information sections, orthology-based description lines for uncharacterized genes, orthology-based GO term predictions and protein domain-based GO term predictions.
Figure 2.CGD genome snapshots. Pie chart from the CGD Genome Snapshots, comparing the current extent of the characterization of the predicted protein-coding genes in the C. albicans and C. glabrata genomes. ORFs are classified as ‘Verified’ if there is experimental evidence for a functional gene product. ‘Uncharacterized’ ORFs are predicted based on sequence analysis but currently lack experimental characterization. Candida albicans ORFs labeled as ‘Dubious’ have no experimental characterization and appear to be indistinguishable from random non-coding sequences (5).
Figure 3.Protein information page. The Protein Information page provides data including structural information inferred from homologs in PDB (RCSB Protein Data Bank), an interactive domains/motifs browser, protein sequence and physicochemical property details, BLASTP search against other CGD sequences and links to external protein resources such as UniProt.