| Literature DB >> 23161684 |
Abstract
We present 'dcGO' (http://supfam.org/SUPERFAMILY/dcGO), a comprehensive ontology database for protein domains. Domains are often the functional units of proteins, thus instead of associating ontological terms only with full-length proteins, it sometimes makes more sense to associate terms with individual domains. Domain-centric GO, 'dcGO', provides associations between ontological terms and protein domains at the superfamily and family levels. Some functional units consist of more than one domain acting together or acting at an interface between domains; therefore, ontological terms associated with pairs of domains, triplets and longer supra-domains are also provided. At the time of writing the ontologies in dcGO include the Gene Ontology (GO); Enzyme Commission (EC) numbers; pathways from UniPathway; human phenotype ontology and phenotype ontologies from five model organisms, including plants; anatomy ontologies from three organisms; human disease ontology and drugs from DrugBank. All ontological terms have probabilistic scores for their associations. In addition to associations to domains and supra-domains, the ontological terms have been transferred to proteins, through homology, providing annotations of >80 million sequences covering 2414 complete genomes, hundreds of meta-genomes, thousands of viruses and so forth. The dcGO database is updated fortnightly, and its website provides downloads, search, browse, phylogenetic context and other data-mining facilities.Entities:
Mesh:
Year: 2012 PMID: 23161684 PMCID: PMC3531119 DOI: 10.1093/nar/gks1080
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
A summary of the dcGO database contents (on 15 August 2012)
| Ontology | Domains (superfamily level) | Domains (family level) | Supra-domains (superfamily level) | Data source | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of terms | Number of domains | Number of annotations | Number of terms | Number of domains | Number of annotations | Number of terms | Number of supra- domains | Number of annotations | ||
| Functions | ||||||||||
| Gene Ontology (GO) | High-quality version | High-quality version | UniProtKB-GOA ( | |||||||
| 4761 | 481 | 27 833 | 4652 | 657 | 29 529 | |||||
| High-coverage version | High-coverage version | |||||||||
| 10 497 | 1265 | 139 958 | 10 032 | 2026 | 157 618 | 13 306 | 7761 | 587 917 | ||
| Diseases | ||||||||||
| Disease Ontology (DO) | 357 | 115 | 1345 | 364 | 145 | 1702 | 402 | 276 | 2765 | DO ( |
| Phenotypes | ||||||||||
| Human Phenotype (HP) | 670 | 147 | 2079 | 605 | 141 | 1930 | 750 | 289 | 4104 | HPO ( |
| Mammalian Phenotype (MP) | 1858 | 299 | 8555 | 2040 | 368 | 12 008 | 2202 | 844 | 23 149 | MGI ( |
| Worm Phenotype (WP) | 556 | 296 | 4349 | 540 | 320 | 4572 | 571 | 507 | 6976 | WormBase ( |
| Yeast Phenotype (YP) | 76 | 271 | 1070 | 72 | 256 | 1039 | 79 | 392 | 1529 | SGD ( |
| Fly Phenotype (FP) | 64 | 140 | 268 | 62 | 167 | 314 | 69 | 283 | 557 | FlyBase ( |
| Fly Anatomy (FA) | 502 | 191 | 3210 | 555 | 210 | 4183 | 551 | 349 | 8151 | FlyBase ( |
| Zebrafish Anatomy (ZA) | 158 | 66 | 694 | 164 | 57 | 701 | 173 | 121 | 1316 | ZFIN ( |
| Xenopus Anatomy (XA) | 243 | 474 | 8376 | 245 | 583 | 11 187 | 253 | 875 | 17 730 | Xenbase ( |
| Arabidopsis Plant (AP) | 259 | 579 | 20 689 | 253 | 778 | 32 405 | 266 | 1093 | 45 311 | TAIR ( |
| Others | ||||||||||
| Enzyme Commission (EC) | 1918 | 830 | 8483 | 1973 | 1565 | 10 278 | 1947 | 2958 | 21 028 | IntEnz ( |
| DrugBank ATC_code (DB) | 964 | 143 | 2801 | 904 | 145 | 2659 | 984 | 230 | 3950 | DrugBank ( |
| UniProtKB KeyWords (KW) | 857 | 1573 | 19 312 | 840 | 2798 | 25 841 | 866 | 5579 | 84 815 | UniProt ( |
| UniPathway (UP) | 664 | 474 | 6332 | 626 | 796 | 7395 | 665 | 1356 | 12 918 | UniPathway ( |
aThe total number of ontology terms used to annotate.
bThe number of annotatable domains (or supra-domains).
cThe number of domain-centric annotations.
dThis version is truly domain-centric, supported both by single-domain proteins and all proteins (including multi-domain proteins).
eThis version is only supported by all proteins, suitable for large-scale studies.
Figure 1.The dcGO website has the ‘Faceted Search’ interface as a hub to mine the resource. By searching against keywords of interest, the user can access the resource in an organized manner and can link to additional analysis tools.
Figure 2.Using ‘PSnet’ to cross-link phenotypes and other ontologies based on shared domain-centric annotations. (A) A list of superfamilies and families annotated by a disease term ‘immune system cancer’. (B) The top well-correlated ontological terms are returned for the disease term in this query.
Figure 3.Converting genome sequences to knowledge about function, phenotype and disease using the ‘dcGO Predictor’. (A) A batch query facility allows the user to upload up to 1000 sequences for the prediction on function, disease, phenotype and other information, such as enzyme classification, drugs and pathways. (B) The result page provides a summary of the prediction content. New predictions are supported by instantly switching to other ontologies. In addition to the download, the user can also explore predictions for each of the input sequences, such as Q01826 (human SATB1 protein; see next). (C) The domain architecture of the human SATB1 protein is graphically displayed using the SCOP domains at the superfamily level, whereas the bottom panel shows the predicted Disease Ontology terms.