| Literature DB >> 20546630 |
Mirva Piippo1, Niina Lietzén, Olli S Nevalainen, Jussi Salmi, Tuula A Nyman.
Abstract
BACKGROUND: Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to predict caspase cleavage sites from individual proteins, but currently none of them can be used to predict caspase cleavage sites from multiple proteins or entire proteomes, or to use several classifiers in combination. The possibility to create a database from predicted caspase cleavage products for the whole genome could significantly aid in identifying novel caspase targets from tandem mass spectrometry based proteomic experiments.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20546630 PMCID: PMC2893604 DOI: 10.1186/1471-2105-11-320
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Optimized parameter values for trained classifiers.
| Classifier | Parameter | Value |
|---|---|---|
| SVM | SVM method | NU-SVC |
| Kernel type | RBF | |
| Kernel function parameter γ | 2-5.5 | |
| Error parameter ν | 0.536 | |
| Stopping criterion ε | 0.00001 | |
| Sequence length | 10 | |
| 6 before cut site | ||
| 4 after cut site | ||
| Random forest | Maximum depth | unlimited |
| Number of features | 4 | |
| Number of trees | 143 | |
| Sequence length | 24 | |
| 12 before cut site | ||
| 12 after cut site | ||
| J48 | Confidence factor | 0.285 |
| Minimum number of objects | 5 | |
| Number of folds | 3 | |
| Binary splits | true | |
| Reduced error pruning | false | |
| Subtree raising | true | |
| Unpruned | false | |
| Use Laplace | false | |
| Sequence length | 6 | |
| 4 before cut site | ||
| 2 after cut site |
Figure 1The ROC curves of the trained classifiers. Area under ROC curve (AUC) is highest for SVM-6-4 and lowest for J48-4-2, and thus SVM-6-4 and RF-12-12 show better performance than the J48-4-2 classifier.
Quality measures of trained classifiers and comparison with publicly available caspase cut site prediction models.
| Classifier | ACC | PRC | FDR | SPC | MCC |
|---|---|---|---|---|---|
| SVM-6-4 | 87,4% | 85,3% | 14,7% | 84,4% | 74,8% |
| RF-12-12 | 85,7% | 82,5% | 17,5% | 82,5% | 71,7% |
| J48-4-2 | 81,6% | 79,7% | 20,3% | 78,3% | 63,3% |
| Vote | 96,8% | 100% | 0,0% | 100% | 93,7% |
| PeptideCutter | 50,8% | 63,0% | 37,0% | 97,7% | 4,6% |
| GraBCas | 67,7% | 67,6% | 32,4% | 67,5% | 35,4% |
| CASVM P4-P1 | 62,3% | 83,0% | 17% | 93,7% | 31,6% |
| CASVM P4-P2' | 72,7% | 73,1% | 26,9% | 73,6% | 45,4% |
| CASVMP14-P10' | 83,1% | 81,6% | 18,4% | 80,1% | 66,2% |
Peptide cutter, GraBCas and CASVM were all tested with the complete training data. The values obtained from SVM-6-4, J48-4-2 and RF-12-12 represent the values obtained from the test sequences of the training with the leave-one-out method. The default cut off value 1.2 was used in GraBCas. Classifiers RF-12-12 and SVM-6-4 were used in the Vote classifier. Abbreviations: ACC, accuracy; PRC, precision; FDR, false discovery rate; SPC, specificity; MCC, Matthews correlation coefficient.
Figure 2ROC curve comparisons of caspase cleavage site classifiers. The data points represent the average True positive and False positive rates from the 10 000 bootstrap values. The lines surrounding the data points represent the standard deviation calculated from the bootstrap data. The ROC curve shows that the Vote classifier that combined the RF-12-12 and SVM-6-4 classifiers appears to be the best among the compared classifiers, since its True positive rate is very close to one and False positive rate is zero.
Predicted caspase cut sites of the human protein sequences downloaded from UniProtKB database.
| SVM-6-4 | RF-12-12 | J48-4-2 | Vote | |
|---|---|---|---|---|
| Predicted number of cut proteins | 70 343 | 73104 | 79 926 | 66 513 |
| % of all proteins (96 123) | 73,2 | 76,1 | 83,1 | 69,2 |
| Average number of cut sites per sequence | 4,3 | 4,8 | 5,9 | 3,7 |
| Predicted cut sites total | 301 678 | 348 600 | 468 865 | 246 692 |
| % of all possible cut sites (1 749 441) | 17,2 | 19,9 | 26,8 | 14,1 |
The cut sites were predicted with Pripper using Vote classification (RF-12-12 and SVM-6-4). The cut sites were predicted only from positions that had Asp (D) in the cut site; the given number of all possible cut sites is the number of Asp (D) residues in the protein sequences.
Figure 3Caspase cleavage products identified based on MS/MS-data. Protein identifications were done with Mascot against a database of predicted caspase cleavage products created with Pripper (Vote-classifier: SVM-6-4 and RF-12-12). Complete protein sequences are shown with all the identified peptides (red), caspase cleavage motifs (yellow) and peptides identified at the cleavage site (black box). Peptide spectrum matches for the cleavage site peptides are also shown. A) Cytokeratin 18 identified from the 2-DE sample B) Potential new caspase target leukosialin identified from the iTRAQ-sample.