| Literature DB >> 20967128 |
Jayashree Ramana1, Dinesh Gupta.
Abstract
Progression through the cell cycle involves the coordinated activities of a suite of cyclin/cyclin-dependent kinase (CDK) complexes. The activities of the complexes are regulated by CDK inhibitors (CDKIs). Apart from its role as cell cycle regulators, CDKIs are involved in apoptosis, transcriptional regulation, cell fate determination, cell migration and cytoskeletal dynamics. As the complexes perform crucial and diverse functions, these are important drug targets for tumour and stem cell therapeutic interventions. However, CDKIs are represented by proteins with considerable sequence heterogeneity and may fail to be identified by simple similarity search methods. In this work we have evaluated and developed machine learning methods for identification of CDKIs. We used different compositional features and evolutionary information in the form of PSSMs, from CDKIs and non-CDKIs for generating SVM and ANN classifiers. In the first stage, both the ANN and SVM models were evaluated using Leave-One-Out Cross-Validation and in the second stage these were tested on independent data sets. The PSSM-based SVM model emerged as the best classifier in both the stages and is publicly available through a user-friendly web interface at http://bioinfo.icgeb.res.in/cdkipred.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20967128 PMCID: PMC2954193 DOI: 10.1371/journal.pone.0013357
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance of different SVM classifiers in LOO CV.
|
|
|
|
|
|
|
|
|
|
| AAC | 1 | 0.01 | −0.7 | 87.50 | 90.33 | 89.88 | 0.68 | 0.62 |
| SAAC | 5 | 0.001 | −0.6 | 83.92 | 83.33 | 83.42 | 0.55 | 0.48 |
| DPC | 0 | 0.01 | −0.7 | 85.71 | 86.33 | 86.23 | 0.60 | 0.53 |
| 2-gram | 0 | 0.01 | −0.8 | 82.14 | 79.33 | 79.77 | 0.48 | 0.42 |
| PSSM | 5 | 5.00 | −0.6 | 87.50 | 91.00 | 90.44 | 0.69 | 0.64 |
Th- Threshold, SN – sensitivity, SP – specificity, MCC – Matthews Correlation Coefficient, PPV- Positive predictive value.
Performance of best ANN classifiers in LOO CV.
|
|
|
|
|
|
|
| AAC | 67.85 | 88.33 | 85.11 | 0.50 | 0.27 |
| SAAC | 64.28 | 89.00 | 85.11 | 0.49 | 0.52 |
| DPC | 82.14 | 92.00 | 90.44 | 0.67 | 0.65 |
| 2-gram | 69.64 | 89.66 | 86.51 | 0.54 | 0.27 |
| PSSM | 67.85 | 91.66 | 87.92 | 0.56 | 0.60 |
SN – sensitivity, SP – specificity, MCC – Matthews Correlation Coefficient, PPV- Positive predictive value.
Performance on benchmark datasets.
|
|
|
|
|
| SVM-AAC | 38 (79.16) | 274 (88.96) | 0.52 |
| SVM-DPC | 45 (93.75) | 285 (92.53) | 0.66 |
| SVM-PSSM | 47 (97.91) | 294 (95.45) | 0.77 |
| ANN-DPC | 42 (87.50) | 277 (89.93) | 0.57 |
PPV- Positive Predictive Value. The numbers show the correctly predicted sequences out of the total shown in the first row, 48 for the positive set and 308 for the negative set. The sensitivity and specificity percentages are reported within the brackets in the second and third columns respectively.
Figure 1ROC plot of PSSM-based SVM model.
The ROC curve depicts relative trade-offs between true positive and false positives. The green line is the reference line while the blue curve represents the ROC curve for the PSSM-based SVM model with an Area Under Curve (AUC) of 0.933.
Figure 2Snapshot of web server sample output.
The web server predicts CDKIs based on the best classifier, i.e. PSSM-based SVM model. The server accepts FASTA formatted sequences and allows user defined thresholds of prediction, ranging from −1.0 to 1.0.