| Literature DB >> 16526956 |
Jiangning Song1, Kevin Burrage, Zheng Yuan, Thomas Huber.
Abstract
BACKGROUND: The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function.Entities:
Mesh:
Year: 2006 PMID: 16526956 PMCID: PMC1450308 DOI: 10.1186/1471-2105-7-124
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distribution of the Xaa-Pro . Protein chains are grouped according to the number of Xaa-Pro cis peptide bonds.
Figure 2Distribution of the Xaa-Pro . Protein chains are grouped according to the number of Xaa-Pro trans peptide bonds.
Prediction accuracy comparison with different kernel functions and parameters. The results were obtained by 5-fold cross-validation.
| SVM model | Kernel function | Parameters | Accuracy (%) |
| 1 | Polynomial | α = 1, β = 1, d = 2 | 59.0 |
| 2 | Polynomial | α = 1, β = 1, d = 5 | 60.5 |
| 3 | RBF | γ = 0.01, C = 2.0 | 62.8 |
| 4 | RBF | γ = 0.06, C = 2.0 | 60.5 |
| 5 | RBF | γ = 0.2, C = 1.0 | 62.6 |
Figure 3ROC Curves of five different SVM models. A ROC curve provides a graphical representation of the relationship between the true-positive and false-positive prediction rate of a SVM model. ROC curve is obtained by plotting all 1-Specificity values (false-positive rate) on the X axis and Sensitivity (true-positive rate) on the Y axis. The resulting area under the ROC curve is an important index for evaluating the classification performance, i.e. the highest and leftmost ROC curve in the plot represents the best SVM model.
Predictive performance of SVM based on singe sequence inputs of different local window sizes. More details for prediction accuracy measurement are given in the Methods section. The results were obtained by 5-fold cross-validation.
| Window size | Prediction accuracy (%) | |||
| MCC | ||||
| 3 | 61.2 | 0.22 | 64.4 | 57.9 |
| 5 | 62.5 | 0.25 | 63.3 | 61.6 |
| 7 | 61.8 | 0.24 | 61.4 | 62.3 |
| 9 | 62.1 | 0.24 | 61.1 | 63.2 |
| 11 | 62.8 | 0.26 | 56.6 | 68.7 |
| 13 | 61.7 | 0.23 | 59.2 | 63.8 |
| 15 | 61.6 | 0.23 | 55.4 | 67.6 |
| 17 | 61.0 | 0.22 | 56.3 | 65.6 |
| 19 | 59.8 | 0.19 | 55.4 | 63.9 |
Figure 4The prediction accuracy (. The local window size is defined as the residue numbers involved in the local sequence windows centered on proline.
Predictive performance of SVM based on amino acid compositions of different local window sizes. More details for prediction accuracy measurement are given in the Methods section. The results were obtained by 5-fold cross-validation.
| Window size | Prediction accuracy (%) | |||
| MCC | ||||
| 9 | 59.9 | 0.20 | 62.1 | 57.9 |
| 11 | 60.6 | 0.21 | 60.3 | 60.9 |
| 15 | 61.6 | 0.23 | 59.8 | 63.2 |
| 21 | 60.4 | 0.21 | 50.4 | 69.9 |
| 25 | 59.5 | 0.19 | 56.0 | 62.7 |
| Full length | 59.3 | 0.18 | 72.6 | 44.5 |
Comparison of predictive performance of SVM based on different encoding input information. More details for prediction accuracy measurement are given in the Methods section. The results were obtained by 5-fold cross-validation.
| Methods | Prediction accuracy (%) | |||
| MCC | ||||
| LSa | 62.8 | 0.26 | 56.6 | 68.7 |
| AAb | 61.6 | 0.23 | 59.8 | 63.2 |
| MSc | 69.8 | 0.40 | 70.5 | 68.7 |
| SSd | 63.6 | 0.27 | 57.8 | 69.3 |
| MS+SSe | 71.5 | 0.43 | 70.7 | 72.2 |
aLS: prediction performance for the local sequence encoding scheme;
bAA: prediction performance for the amino acid composition encoding scheme of local sequence;
cMS: prediction performance for the multiple sequence alignment encoding scheme in the form of PSI-BLAST profile;
dSS: prediction performance for the predicted secondary structure encoding scheme by PSIPRED;
eMS+SS: prediction performance for the multiple sequence alignment plus secondary structure encoding scheme.
Figure 5ROC Curves of five different SVM models. Five SVM models were constructed using five different sequence encoding schemes: single local sequence ("LS"), amino acid compositions of local sequence ("AA"), multiple sequence alignment ("MS"), secondary structure information ("SS"), and multiple sequence alignment with secondary structure ("MS+SS").
Comparison of predictive performance with Naïve Bayes, Logistic regression, IBk and J48 classifier. More details for prediction accuracy measurement are given in the Methods section. The results were obtained by 5-fold cross-validation.
| Methods | Prediction accuracy (%) | |||
| MCC | ||||
| SVM | 71.5 | 0.43 | 70.7 | 72.2 |
| Naïve Bayes | 59.1 | 0.18 | 57.0 | 61.1 |
| Logistic regression | 58.7 | 0.17 | 56.6 | 60.8 |
| IBk ( | 52.9 | 0.06 | 44.9 | 60.5 |
| J48 (decision trees) | 54.2 | 0.09 | 53.6 | 54.7 |
Comparison of predictive performance with other methods.
| Methods | Prediction accuracy (%) | Dataset used | Prediction performance evaluation method | |
| MCC | ||||
| Statistical patterna | 72.7 | - | 242 Xaa-Pro bonds | self-consistency |
| COPSb | 63.6 | - | 8584 proteins | 10-fold cross-validation |
| SVM single sequencec | 69.8 | - | 2193 proteins | independence test |
| SVM single sequenced | 76.6 | 0.53 | 2193 proteins | Jack-knife |
| SVM single sequencee | 62.8 | 0.26 | 2424 proteins | 5-fold cross-validation |
| SVM PSI-BLASTf | 69.8 | 0.40 | 2424 proteins | 5-fold cross-validation |
| SVM PSIPREDg | 63.6 | 0.27 | 2424 proteins | 5-fold cross-validation |
| SVM PSI-BLAST and PSIPREDh | 71.5 | 0.43 | 2424 proteins | 5-fold cross-validation |
aPrediction accuracy reported by Frömmel and Preissner [16]. The result cannot be determined from the paper.
bPrediction accuracy estimated based on the average statistical results of COPS [18].
cPrediction accuracy using independence test reported by Wang et al [17].
dPrediction accuracy using jack-knife test reported by Wang et al [17].
ePrediction accuracy of SVM based on single sequence encoding scheme using our dataset.
fPrediction accuracy of SVM based on PSI-BLAST encoding scheme using our dataset.
gPrediction accuracy of SVM based on PSIPRED encoding scheme using our dataset.
hPrediction accuracy of SVM based on PSI-BLAST and PSIPRED encoding scheme using our dataset.