| Literature DB >> 15608215 |
C d'Enfert1, S Goyard, S Rodriguez-Arnaveilhe, L Frangeul, L Jones, F Tekaia, O Bader, Antje Albrecht, L Castillo, A Dominguez, J F Ernst, C Fradin, C Gaillardin, S Garcia-Sanchez, P de Groot, B Hube, F M Klis, S Krishnamurthy, D Kunze, M-C Lopez, A Mavor, N Martin, I Moszer, D Onésime, J Perez Martin, R Sentandreu, E Valentin, A J P Brown.
Abstract
CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608215 PMCID: PMC540078 DOI: 10.1093/nar/gki124
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of the protein features in CandidaDB
| Characteristics of proteins | Number of entries (%) |
|---|---|
| Total | 6114 (100) |
| With an allelic counterpart | 5315 (86.9) |
| With PFAM match(es) | 3712 (60.7) |
| Within a partitiona | 2103 (34.4) |
| With a tentative signal peptideb | 694 (11.4) |
| With at least one membrane-spanning domainc | 1238 (20.3) |
| With a homologued | 5611 (91.8) |
| With a homologue in all three phylogenetic domains | 1781 (29.1) |
| With a homologue only in the eukaryotic domain | 2606 (42.6) |
| With a homologue only in Ascomycetes | 1712 (28.0) |
| Unique to | 608 (9.9) |
| With a direct orthologuea in | 3809 (62.3) |
aSee Source Data and Methods for definition.
bAs defined using SignalP (16).
cAs defined using TMHMM 2.0 (15).
dC.albicans proteins were compared with 540 687 proteins identified within the proteomes of eukaryotic (n = 27), bacterial (n = 53) and archeal (n = 19) fully sequenced species and to UniProt (31) filtered from any of these previous 540 687 sequences. Homology was deemed significant when the BLASTP E-value was <10−10 for eukaryotic proteins and <10−4 for most bacterial or archeal proteins.
Figure 1 Overall genome redundancy as deduced from partition analysis. The total number of partitions according to their size is shown. Partitions were established as described in the Source Data and Methods section. Proteins within a partition have at least one reciprocal BLASTP link with another protein in the partition with an E-value