| Literature DB >> 18032434 |
Corin Yeats1, Jonathan Lees, Adam Reid, Paul Kellam, Nigel Martin, Xinhui Liu, Christine Orengo.
Abstract
Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/Entities:
Mesh:
Substances:
Year: 2007 PMID: 18032434 PMCID: PMC2238970 DOI: 10.1093/nar/gkm1019
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Gene coverage of completed genomes in Gene3D. Shown in this figure are the percentages of genes in bacteria, archaea and eukaryotes that have at least one domain assigned by either (A) CATH, (B) Pfam or (C) both. It should be noted that not all the genomes have been completely scanned with Pfam—hence the coverage is lower than would be expected.
Figure 2.The Gene3D search bar. This bar can be found at the top of all the Gene3D pages and is used to navigate the site. It consists of two main components—the query (A) and the filter (B)—that allow sophisticated data retrieval. Both components also consist of two inputs. (A) The first box describes the identifier type, with the default being any. Different resources often use identical identifier types to represent different proteins or protein families. As a result, the returned data can be ambiguous; users can restrict the identifier to a certain resource to remove ambiguity. The second box accepts the search term. (B) The filter allows the results to be restricted to particular subsets of the database. The first input is the filter type: at the moment ‘Genomes’, ‘GO Term’, ‘FunCat Category’ and ‘Affymetrix platform’. The second box accepts the filter term—for instance, ‘human’, ‘9606’ or ‘Mammalia’. (C) Possible terms for the query and the filter are shown as a drop-down list while the user types.