| Literature DB >> 17099230 |
Elon Portugaly1, Nathan Linial, Michal Linial.
Abstract
Protein domains are subunits of proteins that recur throughout the protein world. There are many definitions attempting to capture the essence of a protein domain, and several systems that identify protein domains and classify them into families. EVEREST, recently described in Portugaly et al. (2006) BMC Bioinformatics, 7, 277, is one such system that performs the task automatically, using protein sequence alone. Herein we describe EVEREST release 2.0, consisting of 20,029 families, each defined by one or more HMMs. The current EVEREST database was constructed by scanning UniProt 8.1 and all PDB sequences (total over 3,000,000 sequences) with each of the EVEREST families. EVEREST annotates 64% of all sequences, and covers 59% of all residues. EVEREST is available at http://www.everest.cs.huji.ac.il/. The website provides annotations given by SCOP, CATH, Pfam A and EVEREST. It allows for browsing through the families of each of those sources, graphically visualizing the domain organization of the proteins in the family. The website also provides access to analyzes of relationships between domain families, within and across domain definition systems. Users can upload sequences for analysis by the set of EVEREST families. Finally an advanced search form allows querying for families matching criteria regarding novelty, phylogenetic composition and more.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17099230 PMCID: PMC1669739 DOI: 10.1093/nar/gkl850
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Example protein record. Excerpt from the protein page of HMUU_YERPE – ‘Hemin transport system permease protein hmuU’ showing the graphical representation of the domains on the protein. The width of the record is proportional to the length of the protein sequence. Colored segments mark domains found by different systems (here EVEREST and Pfam) on the sequence. EVEREST segments are color coded for the best score their family receives with respect to any reference family in the database. Other segments are color coded for family. A color legend is available in a vertical stripe in the left side of the page.
Figure 2Relationship between EV02.00096 and SCOP c.69.1.12. Excerpt from the family page of EV02.00096 is shown. (A) Record for PDB sequence 1BRT is highlighted. The EV02.00096 domain, in red, is a sub-domain of the SCOP c.69.1.12 domain, in striped dark blue. (B) The relationship between EV02.00096 and c.69.1.12 is described by (1) the keyword ‘Super’ indicating that c.69.1.12 domains are super-domains of EV02.00096 domains, (2) the left bar graph, which through the height of the bar indicates that less than a quarter of EV02.00096 domains participate in this relationship and (3) the right bar graph, indicating that all of the domains of c.69.1.12 participate in this relationship. (C) EV02.00096 is also a super-family of sub-domains of c.69.1.11.
Figure 3Five types of relations between domain instances. Illustration of the five defined relation types between two domain instances on the same protein. 1. sub-domain: domain a is a sub-segment of domain b. 2. super-domain: domain a is a super-segment of domain b. 3. same: domain a is the same segment as domain b. 4. N-neighbor: domain a is N-terminal to domain b. 5. C-neighbor: domain a is C-terminal to domain b.
Parameters for defining relations between two domain instances
| Relation | Conditions | |||||
|---|---|---|---|---|---|---|
| Strongly following | Possibly following | |||||
| Sub-domain | ||||||
| Super-domain | ||||||
| Same | ||||||
| N-neighbor | ||||||
| C-neighbor | ||||||