| Literature DB >> 18524800 |
Ting-Ying Chien1, Darby Tien-Hao Chang, Chien-Yu Chen, Yi-Zhong Weng, Chen-Ming Hsu.
Abstract
Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18524800 PMCID: PMC2447799 DOI: 10.1093/nar/gkn324
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the analysis procedures incorporated in E1DS. In this figure, procedures in the ‘Signature Construction’ are performed only once, while other procedures are performed every time when a new query comes.
Figure 2.An example to demonstrate the ‘Signature Matching’ procedure adopted by E1DS. Yellow residues on the reference sequence are ‘covered’ by the signature. On the query sequence, green residues are those residues aligned with the covered residues of the reference sequence and are not an Ala, Ile, Leu, Pro or Val. The residues marked as green are predicted as functionally important residues of the query sequence based on the signature shown.
Figure 3.The structure panel of E1DS that provides 3D view of the signature. The list control sitting at the right side provides an interactive interface to select the protein structure for rendering.
Performance statistics for E1DS and PROSITE on CSA831 and CSA346
| E1DS | PROSITE | |||
|---|---|---|---|---|
| CSA831 | CSA346 | CSA831 | CSA346 | |
| Prediction of catalytic site | ||||
| Correct predictions (%) | 35.5 | 51.7 | 18.9 | 38.2 |
| Partially correct predictions (%) | 14.1 | 18.2 | 14.8 | 31.2 |
| Total success rate (%) | 49.6 | 69.9 | 33.7 | 69.4 |
| Prediction of catalytic residue | ||||
| Sensitivity (%) | 30.0 | 40.9 | 16.3 | 31.6 |
| Specificity (%) | 96.7 | 95.8 | 98.6 | 97.2 |
| No. of predicted residues in average | 12.7 | 15.9 | 5.6 | 11.0 |
Performance statistics for E1DS on CatRes177 and CatRes121
| CatRes177 | CatRes121 | |
|---|---|---|
| Prediction of catalytic site | ||
| Correct predictions (%) | 41.8 | 45.0 |
| Partially correct predictions (%) | 15.2 | 19.2 |
| Total success rate (%) | 57.0 | 64.2 |
| Prediction of catalytic residue | ||
| Sensitivity (%) | 32.9 | 39.8 |
| Specificity (%) | 96.9 | 95.8 |