| Literature DB >> 19036790 |
Derek Wilson1, Ralph Pethica, Yiduo Zhou, Charles Talbot, Christine Vogel, Martin Madera, Cyrus Chothia, Julian Gough.
Abstract
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19036790 PMCID: PMC2686452 DOI: 10.1093/nar/gkn762
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.TaxViz displays the distribution of domains across the major taxonomic kingdoms, and organisms within each kingdom. Shown here is the distribution of the P-loop containing nucleoside triphosphate hydrolase domains. Each circle or node, represents the features of a single taxonomic group or individual organism. The nodes are arranged hierarchically in concentric rings. The higher taxonomic groups (superkingdoms: Eukaryota, Bacteria and Archaea), located in the centre, lead recursively outwards towards their children (the kingdoms or phyla within each superkingdom). For taxonomic groups, the size of the node increases logarithmically with the mean number of domains found per organism in the taxonomic group. The distribution of domains in individual species can be navigated to using the outer nodes. There are three specialized nodes which display the distribution of domains in (i) selected model organisms; (ii) organisms containing the maximum number of domains; and (iii) organisms containing the minimum number of domains.
Figure 2.Phylogenetic tree example. Shows all sequenced Drosophila species, and Primates plotted on the same tree, using the relationships calculated from all genomes.