| Literature DB >> 24270792 |
Jonathan G Lees1, David Lee, Romain A Studer, Natalie L Dawson, Ian Sillitoe, Sayoni Das, Corin Yeats, Benoit H Dessailly, Robert Rentzsch, Christine A Orengo.
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.Entities:
Mesh:
Year: 2013 PMID: 24270792 PMCID: PMC3965083 DOI: 10.1093/nar/gkt1205
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the Gene3D pipeline from data input from external resources (Parallelogram shaped boxes) to useful functions for the user (Diamond shaped boxes). Rectangular boxes represent data processing steps.
Figure 2.The MDA alignment method in Gene3D uses the Needleman–Wunsch (NW) algorithm. (A) Domain matches in the substitution matrix for the NW algorithm can take place at multiple levels. The highest scoring match is the FunFam level (FunFam Match). The next highest scoring match is between different FunFams from the same superfamily scored by their similarity in a hierarchical tree of FunFams built from profile–profile comparisons (FunFam-Tree Match). The next highest scoring match is at the homologous superfamily level (Superfamily Match). Finally, domains with the same fold can also contribute a positive similarity score in the domain alignment (Fold Match). (B) Domain alignments can be used to find functionally similar proteins by identifying proteins with a similar MDA. (C) All versus All MDA alignments have been carried out to identify those proteins with distinctive domain combinations in a genome (C).
Figure 3.Individual domain summary page (example is the C-terminal domain of human siah2,SIAH2_HUMAN) showing a modelled structure along with the Ramachandran plot from the Rampage software package (25) used as part of the quality control step. Residues in the sequence and structure are coloured by conservation across the FunFam (blue->red indicates increasing conservation).