| Literature DB >> 16381865 |
Corin Yeats1, Michael Maibaum, Russell Marsden, Mark Dibley, David Lee, Sarah Addou, Christine A Orengo.
Abstract
The Gene3D release 4 database and web portal (http://cathwww.biochem.ucl.ac.uk:8080/Gene3D) provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives--including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein-protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381865 PMCID: PMC1347420 DOI: 10.1093/nar/gkj057
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Distribution of multi-domain proteins in 240 genomes. For each genome the approximate percentage of multi-domain proteins was calculated. The likely domain content for those protein which have no known domains was approximated on the basis of length (for details see text). The length threshold was calculated for each genome individually. Of note, the multi-domain percentage for Eukaryotes was within the range displayed by Eubacteria, but the mean for Eukaryotes is substantially higher than for Prokaryotes.