| Literature DB >> 16844982 |
Lusheng Chen1, Wei Wang, Shaoping Ling, Caiyan Jia, Fei Wang.
Abstract
Predicting domains of proteins is an important and challenging problem in computational biology because of its significant role in understanding the complexity of proteomes. Although many template-based prediction servers have been developed, ab initio methods should be designed and further improved to be the complementarity of the template-based methods. In this paper, we present a novel domain prediction system KemaDom by ensembling three kernel machines with the local context information among neighboring amino acids. KemaDom, an alternative ab initio predictor, can achieve high performance in predicting the number of domains in proteins. It is freely accessible at http://www.iipl.fudan.edu.cn/lschen/kemadom.htm and http://www.iipl.fudan.edu.cn/~lschen/kemadom.htm.Entities:
Mesh:
Year: 2006 PMID: 16844982 PMCID: PMC1538912 DOI: 10.1093/nar/gkl331
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Comparison between average profile of critical residues (a) and the average profile of six physio-chemical classes (b).
Features of the sub-models
| Model | Unit position | Description |
|---|---|---|
| KemaSelf | 1–5 | Secondary structure and solvent accessibility of a center residue; |
| 6–11 | Physio-chemical properties of a center residue; | |
| 12–31 | Secondary structure and solvent accessibility of residues with 0 < d ≤ 2; | |
| 32 | Amino acid entropy of a center residue; | |
| KemaNeiOne | 1–6 | Secondary structure of the residues with |
| 7–10 | Solvent accessibility of the residues with | |
| 11–22 | Physio-chemical properties of the residues with | |
| 23–24 | Amino acid entropy of the neighboring residues with | |
| 25–26 | Labels to denote the exceeding of the N-terminus or C-terminus of the chain. | |
| KemaNeiTwo | 1–6 | Secondary structure of the left residues with |
| 7–10 | Solvent accessibility of the left residues with | |
| 11–22 | Physio-chemical properties of the left residues with | |
| 23–24 | Amino acid entropy of the neighboring residues with | |
| 25–26 | Labels to denote the exceeding of the N-terminus or C-terminus of the chain. |
Figure 2The architecture of KemaDom for domain prediction.
Performance of KemaDom and sub-models
| Model/Sub-model | 1D SN | 1D SP | 2D SN | 2D SP | |
|---|---|---|---|---|---|
| KemaDom | 0.88 | 0.83 | 0.41 | 0.57 | 0.76 |
| KemaSelf | 0.89 | 0.81 | 0.36 | 0.55 | 0.74 |
| KemaNeiOne | 0.90 | 0.79 | 0.23 | 0.44 | 0.71 |
| KemaNeiTwo | 0.90 | 0.79 | 0.22 | 0.42 | 0.71 |
| Baseline | 0.74 | 0.72 | 0.26 | 0.23 | 0.60 |
Performance of ab initio predictorsa
| Predictor name | 1D SN | 1D SP | 2D SN | 2D SP | Dataset | |
|---|---|---|---|---|---|---|
| KemaDom | 0.88 | 0.83 | 0.41 | 0.57 | 0.76 | (10) |
| DOMpro | 0.76 | 0.85 | 0.59 | 0.38 | 0.69 | (10) |
| CHOPnetb | 0.42–0.73 | N/A | 0.40–0.59 | N/A | 0.69 | (9) |
| KemaDom | 0.95 | 0.77 | 0.24 | 0.57 | 0.74 | CAFASP 4 |
| DOMpro | 0.85 | 0.76 | 0.35 | 0.50 | 0.70 | CAFASP 4 |
| Biozon | 0.10 | 10.00 | 0.35 | 0.19 | 0.17 | CAFASP 4 |
| Globplot | 0.83 | 0.71 | 0.18 | 0.60 | 0.64 | CAFASP 4 |
| Dompred-DPS | 0.68 | 0.78 | 0.47 | 0.50 | 0.62 | CAFASP 4 |
| Mateo | 0.51 | 0.78 | 0.12 | 0.15 | 0.40 | CAFASP 4 |
aThe values taken from the previous publications and the website of CAFASP 4.
bThe performance of CHOPnet is tested against multiple datasets with cross validation of networks; S values are not shown in their paper and are denoted by N/A in this table.
Figure 3The interface of KemaDom. The Email address and the customized job name are required. The target sequence should be input with format which BioPerl (Bio::SeqIO) can recognize.