| Literature DB >> 17570147 |
Marc A Marti-Renom1, Andrea Rossi, Fátima Al-Shahrour, Fred P Davis, Ursula Pieper, Joaquín Dopazo, Andrej Sali.
Abstract
BACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-S4-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Sensitivity and precision of AnnoLite.
| 10-6 | 92.7 | 88.4 | |
| 10-3 | 95.7 | 90.1 | |
| 10-3 | 88.4 | 78.2 | |
| 10-4 | 90.5 | 82.8 | |
| 10-4 | 93.3 | 79.7 | |
| 10-1 | 84.3 | 80.9 | |
| 10-3 | 85.5 | 74.8 | |
| 10-2 | 77.6 | 58.6 |
AnnoLite comparison against BLAST-based searches.
| AUC | COV | AUC | COV | AUC | COV | |
| 0.85 | 93.6 | 0.79 | 90.8 | 0.62 | 89.5 | |
| 0.82 | 91.2 | 0.81 | 87.7 | 0.57 | 86.6 | |
| 0.80 | 86.8 | 0.71 | 79.8 | 0.65 | 77.8 | |
| 0.83 | 91.1 | 0.76 | 86.4 | 0.62 | 82.8 | |
| 0.75 | 87.9 | 0.77 | 83.0 | 0.71 | 82.5 | |
| 0.68 | 86.0 | 0.60 | 78.8 | 0.65 | 76.9 | |
| 0.80 | 88.3 | 0.72 | 82.4 | 0.63 | 80.1 | |
| 0.69 | 93.2 | 0.63 | 90.0 | 0.61 | 86.9 | |
AUC: area under the curve; COV: coverage (%).
Sensitivity and precision of AnnoLyze.
| 30 | 71.9 | 13.7 | |
| 40 | 72.9 | 55.7 |
Accuracy of AnnoLyze in locating binding sites for small ligands.
| ADENOSINE-5'-DISPHOSPATE | 172 | 80.2 | 93.2 | 100.0 | 45.6 | |
| ADENOSINE-5'-MONOPHOSPHATE | 56 | 80.4 | 91.4 | 100.0 | 31.0 | |
| PHOSPHOAMINOPHOSPHONIC ACID – ADENYLATE ESTER | 74 | 68.9 | 91.0 | 100.0 | 51.3 | |
| ADENOSINE-5'-TRIPHOSPHATE | 107 | 72.9 | 92.7 | 97.4 | 57.7 | |
| B-OCTYLGLUCOSIDE | 31 | 54.8 | 71.9 | 76.5 | 18.2 | |
| CITRIC ACID | 52 | 61.5 | 82.2 | 90.6 | 33.3 | |
| FLAVIN-ADENINE DINUCLEOTIDE | 110 | 91.8 | 96.1 | 100.0 | 60.9 | |
| FLAVIN MONONUCLEOTIDE | 62 | 85.5 | 94.5 | 100.0 | 60.0 | |
| FUCOSE | 35 | 82.9 | 67.9 | 72.4 | 0.0 | |
| D-GALACTOSE | 70 | 80.0 | 84.5 | 92.9 | 41.7 | |
| GUANOSINE-5'-DIPHOSPHATE | 72 | 95.8 | 93.2 | 97.1 | 33.3 | |
| GLUCOSE | 115 | 80.0 | 84.1 | 93.5 | 35.3 | |
| HEMEC | 42 | 95.2 | 96.9 | 100.0 | 55.6 | |
| PROTOPRPHYRIN IX CONTAINING FE | 360 | 94.7 | 97.1 | 99.7 | 73.4 | |
| ALPHA D-MANNOSE | 52 | 86.5 | 84.6 | 95.6 | 15.4 | |
| ETHANESULFONIC ACID | 53 | 43.4 | 78.2 | 82.6 | 29.4 | |
| NICOTAMINE ADENINE DINUCLEOTIDE | 183 | 85.8 | 95.9 | 100.0 | 55.6 | |
| N-ACETYL-D-GLUCOSAMINE | 153 | 86.9 | 84.8 | 94.7 | 4.6 | |
| NADP NICOTINAMIDE-ADENINE-DINUCLEOTIDE PHOSPHATE | 73 | 84.9 | 93.6 | 98.4 | 65.4 | |
| NADPH DIHYDRO-NICOTINAMIDE-ADENINE-DINUCLEOTIDE PHOSPHATE | 64 | 85.9 | 94.9 | 100.0 | 58.3 | |
| 97 | 79.9 | 88.4 | 94.6 | 41.3 |
The last column shows the percentage of correct predictions by the Patcher algorithm.
AnnoLite functional predictions for pairs of enzyme-non-enzyme homologous structures.
| 1a73A | Hydrolase/DNA | -- | 0004519 | 1mhdA | Transcription/DNA | -- | 0003700 |
| 1xikA | Oxidoreductase | 1.17.4.1 | 0005506 | 1dpsA | DNA binding | -- | 0008199 |
| 2fha | Iron storage | 1.16.3.1 | 0008199 | ||||
| 1pda | Lyase | 2.5.1.61 | 0004418 | 1ixh | Phosphate transport | -- | 0005315 |
| 1crxA | Replication/DNA | -- | 0003677 | 1bl0A | Transcription/DNA | -- | |
| 1qjgA | Isomerase | 5.3.3.1 | 0004769 | 1ounA | Transport | -- | 0008565 |
| 1ndoB | Dioxygenase | 1.14.12.12 | 0016702 | ||||
| 1aozA | Oxydoreductase | 1.10.3.3 | 0008447 | 1nwpA | Electron transport | -- | 0005507 |
| 1bugA | Oxydoreductase | 1.10.3.1 | 0004097 | 1oxy | Oxygen transport | -- | 0005344 |
Underlined functional predictions are not annotated in the MSD.
Figure 1Functional annotation of a newly determined protein structure. Application to the target APC28983 from the Midwest Center for Structural Genomics Consortium (PDB code 2azw chain A). (a) Known annotation of chain 2azwA. (b) Significant AnnoLite predictions. (c) Significant AnnoLyze predictions.
Filtering cutoffs for removing redundancy in the testing sets.
| Initial set | Sequence Identity (%) | Equivalent positions (%) | RMSD (Å) | Difference in length | Final set | |
| 10,997 | N/A | ≥60.0 | ≤2.0 | ≤30 | 1,879 | |
| 30,126 | ≥30.0 | ≥75.0 | ≤4.0 | ≤50 | 4,948 | |
| 30,425 | ≥30.0 | ≥75.0 | ≤4.0 | ≤50 | 4,613 | |
| 30,126 | ≥90.0 | ≥90.0 | ≤2.0 | N/A | 1,936 |
Sequence identity is the percentage of identical residues with respect to the aligned positions in the structural alignment. Equivalent position is the percentage of residues that align within 4 Å with respect to the shorter of the two aligned structures. The RMSD is calculated using Cα atoms of the two aligned structures.
Figure 2Flowchart of main steps in AnnoLite.
Figure 3Flowchart of main steps in AnnoLyze.