| Literature DB >> 18974821 |
Abstract
Entities:
Mesh:
Substances:
Year: 2008 PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Homology based annotation transfer: Problems.
(A) Paralogy problem: Paralogs are more likely to diverge functionally with respect to orthologs. If our putative template is a paralog, the probability that the query has similar function decreases. (B) Moonlighting problem: If the template performs multiple functions, the query could have retained only some of them (and vice-versa, if the query were a moonlighting protein, using a non-moonlighting template would result in an incomplete annotation of the query). (C) Multi-domain proteins problem: If the template is annotated based on the function of a domain that is not aligned to the query, annotation transfer is not possible. (D) Database mis-annotations problem: Database entries may have been mis-annotated; the risk is especially high if annotation was performed automatically via homology transfer.
Do's and Don'ts of annotation transfer by homology.
| Functional property to be conserved | Sequence identity | Conservation rate | Reference |
| Non-enzyme | 50% | 98% |
|
| All 4 EC numbers | 70% | 90% |
|
| All 4 EC numbers | 40% | 70% |
|
| First 3 EC numbers | 50% | 90% |
|
| First 3 EC numbers | 30% | 70% |
|
| All 4 EC numbers | 50% | 30% |
|
| First 3 EC numbers | 25% | 70% |
|
| SWISS-PROT keywords | 40% | 70% |
|
| Subcellular localization (11 classes) | 70% | 90% |
|
*: 98% of non enzymes that have at least one enzyme homolog.
**: Global identity, defined in [89].
Note: different estimates for the same functional aspects reflect the different methods, procedures, and datasets used to assess sequence similarity by the various groups.
Do's and Don'ts of annotation transfer by homology.
| Yes | No | |||
| Homology | = | Same function | √ | |
| Orthology | = | Same function | √ | |
| Paralogy | = | Same function | √ | |
| Orthology | = | >Probability of same function | √ | |
| Paralogy | = | <Probability of same function | √ | |
| Same sequence | = | Same function | √ | |
| Sequence similarity>threshold | = | Same function | √ | |
| Homology+conservation of functional residues | = | Same function | √ | |
| Similar structure | = | Similar function | √ | |
| >Sequence similarity | = | >Probability of same function | √ | |
| >Structure similarity | = | >Probability of same function | √ |
Figure 2Using structure to predict function.
The protein represented here is PDBid: 2eve. All figures are derived from the Northeast Structural Genomics Consortium structure gallery (http://nmr.cabm.rutgers.edu:9090/gallery/jsp/Gallery.jsp). AstexViewer 2.0 [49] is used for visualization. (A) Superposition of 2eve structure (gray) and of the structure of a homolog (blue, PDBid: 2ar1), using Skan [59]. 2eve hosts three co-crystallized small non-functional ligands (green; ball and stick). Three structurally aligned residues of 2eve and 2ar1 are also shown (red and yellow; ball and stick). (B) Surface residue conservation: Conserved residues (mauve) versus variable residues (cyan). Conservation is calculated as follows: homologs of 2eve are collected using three iterations of PSI-BLAST [15] retaining all homologs with E-value<10−3 and reducing redundancy at 80% sequence identity with CD-HIT [85]. Then, a multiple sequence alignment is created using CLUSTALW [86]. Finally, the multiple sequence alignment is used as input to ConSurf [54], which uses it to calculate residue conservation. (C) Residue conservation within the protein largest cavity (as defined by SCREEN [87]). (D) 2eve surface electrostatic potential (using GRASP2 [59]) (positive in blue, negative in red).