| Literature DB >> 19906693 |
Jonathan Lees1, Corin Yeats, Oliver Redfern, Andrew Clegg, Christine Orengo.
Abstract
Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10,000,000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.Entities:
Mesh:
Year: 2009 PMID: 19906693 PMCID: PMC2808988 DOI: 10.1093/nar/gkp987
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
List of imported resources, type and reference
| Name | New? | Type | Description |
|---|---|---|---|
| UniProt ( | N | SD | Protein database |
| RefSeq ( | N | SD | Protein database |
| Ensembl metazoa ( | Y | SD/G | Genome assemblies |
| Integr8 ( | N | G | UniProt-based genome sets |
| CATH ( | N | SF | Domain family classification |
| Pfam | N | SF | Domain family classification |
| Superfamily | N | SF | Domain family classification |
| SMART | N | SF | Domain family classification |
| Ensembl Variation ( | Y | SF | Sequence polymorphisms |
| UniProt features ( | Y (partly) | SF | Functional elements |
| TMHMM v2.0 | Y | SF | Transmembrane helices |
| Seg | N | SF | Low complexity regions |
| Coils | N | SF | Coiled-coil regions |
| Panther | N | SF | Protein family classification |
| DisoPred2 ( | Y | SF | Disordered regions |
| GO pathways ( | N | P | Pathway descriptions |
| MINT ( | N | P | Protein–protein interactions |
| IntAct ( | N | P | Protein–protein interactions |
| GO cellular loc. ( | N | P | Cellular locations. |
| KEGG pathways ( | N | P | Pathway assignments |
| KEGG orthologue ( | N | MF | Molecular function |
| GO molecular ( | N | MF | Molecular function |
| UniProt descriptions ( | Y | MF | Molecular function |
| eggNOG ( | Y | MF | Molecular function |
| NCBI taxonomy ( | N | T | Taxonomic hierarchy |
aObtained via SIMAP (26).
If the resource has been added to Gene3D since 2008 it is marked as ‘Y’ for ‘New’.
Types: ‘SD’: imported protein sequence databases; ‘G’: genome assemblies; ‘SF’: sequence feature annotation; ‘P’: metabolic, regulatory and biological pathways; ‘MF’: molecular function; ‘T’: taxonomy tree.
Figure 1.The Gene3D genome coverage visualizer. Genome coverage—the number of genes with a match to a CATH superfamily—can be viewed using the new visualization web tool (http://gene3d.biochem.ucl.ac.uk:8090/GenomeCoverageGraphs/). Coverage for a specific superfamily can be retrieved using the top query bar. In the left-hand column, species can be removed from the graph or their species names manually edited. New species can be added via tables found by selecting the ‘Database Summaries’, ‘Ensembl Genome Data’ and ‘Integr8 Genome Data’ tab headers.
Figure 2.Sequence feature annotation for human proto-oncogene vav. Displayed is the feature annotation returned for UniProt identifier VAV_HUMAN. Laying out each type and source of annotation against each other allows the user to easily cross-reference functional information. Clicking on domains brings up available functional descriptions and a link to the original resource. Detailed tables of assignments can be found in separate tabs. To aid users each family is given a unique colour that is consistent across the website; for CATH-Gene3D domains the colour-picking algorithm makes common superfamilies brighter, and ensures superfamilies within the same fold have similar hues.