| Literature DB >> 19401756 |
Svetlana Kirillova1, Suresh Kumar, Oliviero Carugo.
Abstract
One of the important fields to apply computational tools for domain boundaries prediction is structural biology. They can be used to design protein constructs that must be expressed in a stable and functional form and must produce diffraction-quality crystals. However, prediction of protein domain boundaries on the basis of amino acid sequences is still very problematical. In present study the performance of several computational approaches are compared. It is observed that the statistical significance of most of the predictions is rather poor. Nevertheless, when the right number of domains is correctly predicted, domain boundaries are predicted within very few residues from their real location. It can be concluded that prediction methods cannot be used yet as routine tools in structural biology, though some of them are rather promising.Entities:
Year: 2009 PMID: 19401756 PMCID: PMC2669640 DOI: 10.2174/1874091X00903010001
Source DB: PubMed Journal: Open Biochem J ISSN: 1874-091X
Bioinformatics Tools Examined in CASP7 (Names were Taken from CASP)
| Tools | URL | Reference |
|---|---|---|
| baker | [ | |
| chop | [ | |
| chophomo | [ | |
| distill | [ | |
| domfold | ||
| domssea | [ | |
| dps | [ | |
| foldpro | [ | |
| hhpred1 | [ | |
| hhpred3 | [ | |
| maopus | ||
| metadp | [ | |
| NNput | ||
| Robetta | [ |
- No information provided by authors.
Matthews Correlation (mcc) at Various Threshold Values (t)
| t | mcc |
|---|---|
| 70 | 0.063 |
| 80 | 0.111 |
| 90 | 0.173 |
| 100 | 0.233 |
| 110 | 0.276 |
| 120 | 0.307 |
| 130 | 0.367 |
| 140 | 0.397 |
| 150 | 0.469 |
| 160 | 0.535 |
| 170 | 0.582 |
| 180 | 0.586 |
| 190 | 0.614 |
| 200 | 0.628 |
| 210 | 0.544 |
| 220 | 0.559 |
| 230 | 0.510 |
| 240 | 0.445 |
| 250 | 0.462 |
| 260 | 0.346 |
| 270 | 0.330 |
A protein is predicted to contain a single domain if it contains less residues that t and it is predicted to contain more than one domain if it has a number of residues larger than t. Data are taken from the proteins examined in the CASP7 experiment.
Matthews's Correlation Coefficients (mcc) Associated with the Prediction of Multi-Domain Proteins by Various Methods Used in the CASP7 Experiment
| Method | mcc |
|---|---|
| baker | 0.722 |
| chop | 0.178 |
| chophomo | 0.230 |
| distill | 0.260 |
| domfold | 0.262 |
| domssea | 0.410 |
| dps | 0.277 |
| foldpro | 0.840 |
| hhpred1 | 0.304 |
| hhpred3 | 0.272 |
| maopus | 0.696 |
| metadp | 0.189 |
| NNput | 0.097 |
| robetta | 0.734 |
Average Values of the Indices J,R, and FM and of the Probability pJ, pR, and pFM that a Values Higher than the One that is Observed Might be Obtained by Chance. Standard Deviations of the Mean are Reported in Parentheses
| Method | J | R | FM | pJ | pR | pFM |
|---|---|---|---|---|---|---|
| baker | 0.80(0.02) | 0.82(0.02) | 0.88(0.01) | 39(4) | 35(4) | 37(4) |
| chop | 0.66(0.03) | 0.70(0.03) | 0.79(0.02) | 66(5) | 63(5) | 63(5) |
| chophomo | 0.66(0.03) | 0.69(0.03) | 0.79(0.02) | 67(5) | 65(5) | 64(5) |
| distill | 0.70(0.02) | 0.73(0.02) | 0.82(0.01) | 58(4) | 56(4) | 55(4) |
| domfold | 0.76(0.02) | 0.77(0.02) | 0.86(0.01) | 49(5) | 48(5) | 46(5) |
| domssea | 0.76(0.03) | 0.78(0.02) | 0.86(0.02) | 50(5) | 48(5) | 48(5) |
| dps | 0.74(0.03) | 0.77(0.02) | 0.84(0.02) | 55(5) | 52(5) | 52(5) |
| foldpro | 0.82(0.02) | 0.84(0.02) | 0.90(0.01) | 34(4) | 32(4) | 31(4) |
| hhpred1 | 0.77(0.02) | 0.78(0.02) | 0.86(0.01) | 46(4) | 45(4) | 42(4) |
| hhpred3 | 0.76(0.02) | 0.78(0.02) | 0.86(0.01) | 46(4) | 45(4) | 43(4) |
| maopus | 0.80(0.02) | 0.83(0.02) | 0.88(0.01) | 42(5) | 36(5) | 39(5) |
| metadp | 0.76(0.03) | 0.77(0.03) | 0.86(0.02) | 49(5) | 48(5) | 46(5) |
| NNput | 0.71(0.02) | 0.73(0.02) | 0.83(0.01) | 56(4) | 55(4) | 53(4) |
| robetta | 0.79(0.02) | 0.81(0.02) | 0.87(0.01) | 40(4) | 36(4) | 37(4) |
Accuracy with which the Domain Boundaries are Identified by Various Prediction Methods
| Method | Pc_c | Delta_b | Delta_e |
|---|---|---|---|
| baker | 56.2 | -1.2(0.3) | 2.2(0.5) |
| chop | 26.1 | -2.9(1.0) | 1.9(0.7) |
| chophomo | 25.0 | -2.6(1.0) | 2.9(1.0) |
| distill | 33.6 | -1.5(0.6) | 3.2(0.8) |
| domfold | 38.0 | -1.9(0.6) | 2.9(0.7) |
| domssea | 42.9 | -1.7(0.6) | 2.5(0.7) |
| dps | 38.7 | -2.2(0.8) | 1.6(0.9) |
| foldpro | 62.8 | -1.3(0.4) | 2.0(0.4) |
| hhpred1 | 43.3 | -2.1(0.5) | 2.6(0.5) |
| hhpred3 | 43.4 | -2.1(0.5) | 2.7(0.5) |
| maopus | 54.2 | -1.4(0.6) | 3.0(0.8) |
| metadp | 39.8 | -1.3(0.7) | 3.3(0.7) |
| NNput | 30.8 | -1.9(0.7) | 2.4(0.8) |
| robetta | 57.9 | -1.0(0.3) | 1.5(0.5) |
The following data are shown: the percentage of domains that are correctly predicted (see text for details) PC_C, the average deviation between the real and the predicted beginning of the domain Delta_b, and the average difference between the real and the predicted end of the domain Delta_e (standard deviations of the mean in parentheses).