| Literature DB >> 18194539 |
Marc Vincent1, Andrea Passerini, Matthieu Labbé, Paolo Frasconi.
Abstract
BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18194539 PMCID: PMC2375136 DOI: 10.1186/1471-2105-9-20
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of data sets
| Data set | # chains | All | None | Mix |
| PDBselect | 1,589 | 488 | 1,051 | 50 |
| SPX- | 2,547 | 1,650 | 757 | 140 |
Statistics for the PDBselect and the SPX- data sets. The three types of chains are defined as follows. All: all cysteines are intra-chain bonded half-cystines. None: all cysteines are either free, metal bound, or inter-chain bonded. Mix: Both cases are present.
Binary classification of chains
| Method | PDBselect | SPX- | ||||
| APTK (PSSM) | 87 | 83 | 77 | 82 | 79 | 67 |
| APTK (profiles) | 86 | 83 | 75 | 82 | 80 | 64 |
| DISULFIND (PSSM) | 86 | 82 | 75 | 81 | 80 | 60 |
| DISULFIND (profiles) | 86 | 82 | 73 | 81 | 80 | 63 |
| D-simple (PSSM) | 85 | 81 | 74 | 82 | 81 | 64 |
| D-simple (profiles) | 86 | 84 | 74 | 82 | 82 | 62 |
| DIpro [6] | - | - | - | 74 | 83 | 56 |
Experimental comparison of various algorithms for binary classification of chains. A chain is a positive examples if and only if it has least one intra-chain disulfide bridge. We report two-state prediction accuracy (data sets. Q1), precision (P1) and recall (R1) on both PDBselect and the SPX- data sets.
Binary classification of cysteines
| Method | PDBselect | SPX- | PDB4136 | ||||||
| APTK (PSSM) | 88.8 | 83.8 | 87.7 | 86.1 | 79.0 | 82.8 | 88.2 | 84.0 | 82.5 |
| APTK (profiles) | 87.7 | 83.1 | 85.1 | 85.3 | 78.6 | 80.5 | 89.7 | 81.0 | 88.5 |
| DISULFIND (PSSM) | 88.3 | 85.0 | 84.3 | 85.3 | 82.6 | 74.1 | 88.0 | 79.1 | 85.5 |
| DISULFIND (profiles) | 88.6 | 87.4 | 82.1 | 86.5 | 83.0 | 77.5 | 89.4 | 81.2 | 87.4 |
| D-simple (PSSM) | 82.2 | 77.0 | 76.4 | 81.3 | 74.5 | 71.5 | 83.0 | 79.5 | 69.3 |
| D-simple (profiles) | 81.5 | 76.0 | 75.3 | 81.1 | 74.3 | 71.1 | 83.0 | 77.1 | 73.4 |
| APTK + DISULFIND | 89.9 | 87.8 | 85.5 | 87.0 | 82.6 | 80.2 | 90.3 | 82.1 | 89.2 |
| multiple SVM + CSS [14] | - | - | - | - | - | - | 90 | 91 | 77 |
Experimental comparison of various algorithms for binary classification of cysteines. Positive examples are disulfide-bond cysteines. We report two-state prediction accuracy (Q), precision (P) and recall (R) on PDBselect, SPX- and PDB4136 data sets.
Prediction of bridges and connectivity patterns
| # bridges | 1-NN | DISULFIND | DISULFIND+1-NN | DIpro | ||||||||
| 1 | 65 | 61 | 58 | 66 | 62 | 59 | 68 | 63 | 59 | 71 | 47 | 58 |
| 2 | 59 | 61 | 52 | 53 | 54 | 49 | 68 | 69 | 63 | 59 | 59 | 55 |
| 3 | 70 | 71 | 63 | 46 | 46 | 35 | 73 | 73 | 64 | 59 | 65 | 50 |
| 4 | 58 | 59 | 42 | 24 | 24 | 9 | 59 | 59 | 48 | 44 | 49 | 27 |
| all | 60 | 59 | 52 | 49 | 48 | 41 | 64 | 62 | 55 | 71 | 47 | 48 |
Chains with at least one intra-chain bridge (SPX+ data set): comparison between 1-NN, DISULFIND, DISULFIND+1-NN and DIpro.
Prediction of bridges and connectivity patterns
| # bridges | 1-NN | DISULFIND | DIpro | SOSVM | CSP | SVMpattern | ||||||
| 2 | 76 | 76 | 73 | 73 | 74 | 74 | 77 | 77 | 73 | 73 | 74 | 74 |
| 3 | 66 | 55 | 51 | 41 | 61 | 51 | 62 | 52 | 66 | 55 | 69 | 61 |
| 4 | 53 | 38 | 37 | 24 | 44 | 27 | 51 | 36 | 49 | 33 | 40 | 30 |
| 5 | 39 | 18 | 30 | 13 | 41 | 11 | 43 | 13 | 36 | 17 | 31 | 12 |
| 2–5 | 64 | 55 | 49 | 44 | 56 | 49 | 65 | 53 | 62 | 53 | 57 | 55 |
Results obtained assuming knowledge of the bonding state (SP39 data set): comparison between 1-NN, DISULFIND, DIpro, SOSVM, CSP and SVMpattern.
Prediction of bridges and connectivity patterns from scratch
| # bridges | APTK(PSSM)+1-NN | DISULFIND | DISULFIND+1-NN | DIpro | ||||||||
| 1 | 30 | 30 | 27 | 30 | 30 | 30 | 30 | 30 | 30 | - | - | - |
| 2 | 51 | 54 | 47 | 38 | 39 | 36 | 51 | 51 | 49 | - | - | - |
| 3 | 63 | 65 | 58 | 27 | 27 | 15 | 66 | 67 | 61 | - | - | - |
| 4 | 50 | 51 | 40 | 30 | 30 | 10 | 48 | 49 | 37 | - | - | - |
| all | 43 | 44 | 37 | 29 | 29 | 23 | 43 | 44 | 39 | 32 | 48 | - |
Results on the SPX- data set: comparison between APTK (PSSM)+1-NN, DISULFIND, DISULFIND+1-NN and DIpro.
Figure 1Observed connectivity patterns. Number of observed distinct patterns on the SPX+ data set, grouped by number of cysteines.
Figure 2Observed frequencies on chains having 6 cysteines. Histogram of the number of occurrences of distinct patterns for chain having 6 cysteines on the SPX+ data set. Patterns are sorted by rank.
Figure 3Observed frequencies on chains having 8 cysteines. Histogram of the number of occurrences of distinct patterns for chain having 8 cysteines on the SPX+ data set. Patterns are sorted by rank.