| Literature DB >> 21062816 |
David A de Lima Morais1, Hai Fang, Owen J L Rackham, Derek Wilson, Ralph Pethica, Cyrus Chothia, Julian Gough.
Abstract
The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and other gene collections such as UniProt. All models and assignments are available to browse and download at http://supfam.org. A new hidden Markov model library based on SCOP 1.75 has been created and a previously ignored class of SCOP, coiled coils, is now included. Our scoring component now uses HMMER3, which is in orders of magnitude faster and produces superior results. A cloud-based pipeline was implemented and is publicly available at Amazon web services elastic computer cloud. The SUPERFAMILY reference tree of life has been improved allowing the user to highlight a chosen superfamily, family or domain architecture on the tree of life. The most significant advance in SUPERFAMILY is that now it contains a domain-based gene ontology (GO) at the superfamily and family levels. A new methodology was developed to ensure a high quality GO annotation. The new methodology is general purpose and has been used to produce domain-based phenotypic ontologies in addition to GO.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21062816 PMCID: PMC3013712 DOI: 10.1093/nar/gkq1130
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Presence/absence of the fibronectin type III superfamily in selected genomes by automatic highlighting of branches of the phylogenetic tree that contain the superfamily in green.
SUPERFAMILY 1.75
| Release date | September 2010 |
|---|---|
| Number of HMM models | 15 438 |
| Number of completely sequenced genomes, strains and collections | 1628 |
| Eukaryotes | 341 |
| Archeabacterial | 87 |
| Eubacterial | 1077 |
| Metagenomes | 118 |
| Plasmids | 2354 |
SUPERFAMILY 1.75 statistics
| Protein with assignments (%) | Amino acid coverage (%) | |
|---|---|---|
| Eukaryotes | 59.11 | 38.9 |
| Archeabacteria | 65.13 | 61.67 |
| Eubacteria | 68.08 | 63.4 |
| Uniprot | 64 | 56 |
| Metagenomes | 51.47 | 54.1 |
| Plasmids | 47 | 47 |
Figure 2.Functional and phenotypic annotations of structural domains at the SCOP superfamily (SF) and family (FA) levels. (A) Flowchart of inferring domain-centric GOAs using UniprotKB-GOA database and domain assignments in SUPERFAMILY database. (B) Illustration of the procedure to create SDFO based on information theoretic analysis of Domain2 GOA profiles. (C) Venn diagram in which the area of each region is proportional to the differences and intersections among domains annotated to a GO term `DNA binding’ [GO:0003677] using all UniProt sequences (90, circled in green), domains annotated to the term only using singleton domain UniProt sequences (20, circled in blue), and domains in DBD which can be found in at least one UniProt sequence annotated to the term (24, circled in red). (D) Venn diagram showing the differences and intersections among domains annotated to a GO term `transcription regulator activity’ [GO:0030528] using all UniProt sequences, only using singleton domain UniProt sequences, and in DBD which can be found in at least one UniProt sequence annotated to the term. (E) The total number (shown in parenthood) of domains annotated to ontologies. GO depicts three biological concepts: BP, Biological Process; MF, Molecular Function; CC, Cellular Component. Results are based on Domain2 GOAs supported both by singleton domain UniProt sequences and all UniProt sequences. In MPO, it describes mammalian phenotype (MP) related to the mouse with a specific genetic mutation. HPO has three sub-ontologies: IN, inheritance; ON, onset and clinical course; OA, organ abnormality.