| Literature DB >> 29860277 |
Yu Bao1, Simone Marini2, Takeyuki Tamura1, Mayumi Kamada3, Shingo Maegawa4, Hiroshi Hosokawa4, Jiangning Song5, Tatsuya Akutsu1.
Abstract
As one of the few irreversible protein posttranslational modifications, proteolytic cleavage is involved in nearly all aspects of cellular activities, ranging from gene regulation to cell life-cycle regulation. Among the various protease-specific types of proteolytic cleavage, cleavages by casapses/granzyme B are considered as essential in the initiation and execution of programmed cell death and inflammation processes. Although a number of substrates for both types of proteolytic cleavage have been experimentally identified, the complete repertoire of caspases and granzyme B substrates remains to be fully characterized. To tackle this issue and complement experimental efforts for substrate identification, systematic bioinformatics studies of known cleavage sites provide important insights into caspase/granzyme B substrate specificity, and facilitate the discovery of novel substrates. In this article, we review and benchmark 12 state-of-the-art sequence-based bioinformatics approaches and tools for caspases/granzyme B cleavage prediction. We evaluate and compare these methods in terms of their input/output, algorithms used, prediction performance, validation methods and software availability and utility. In addition, we construct independent data sets consisting of caspases/granzyme B substrates from different species and accordingly assess the predictive power of these different predictors for the identification of cleavage sites. We find that the prediction results are highly variable among different predictors. Furthermore, we experimentally validate the predictions of a case study by performing caspase cleavage assay. We anticipate that this comprehensive review and survey analysis will provide an insightful resource for biologists and bioinformaticians who are interested in using and/or developing tools for caspase/granzyme B cleavage prediction.Entities:
Keywords: caspase; cleavage sites; prediction tool
Mesh:
Substances:
Year: 2019 PMID: 29860277 PMCID: PMC6917222 DOI: 10.1093/bib/bby041
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
A summary of key features of each tool evaluated in this article
| Tools | SitePrediction | Cascleave | PoPS | Pripper |
|---|---|---|---|---|
| Species | Multispecies | Multispecies | Multispecies | Multispecies |
| Web server availability |
|
|
| No server |
| Algorithm | Combination of frequency score representing amino acids occurrence and position similarity | BEAA trained and tested support vector regression (SVR) model | PSSM matrix | Combination of SVM/random forest and J48 algorithm |
| Option of batch prediction | Yes | No | Yes | Yes |
| Adjustment of prediction thresholds | No | o | Yes | No |
| Standalone software availability | No | No | Yes | No |
| Language implemented | C++ | Perl | Java | Java |
| Dataset origin | Data from MEROPS | Multiple resources | Data from MEROPS | Data from EBI [ |
| Ratio of positive to negative samples | – | 1:3 | – | 1: 1 |
| Sliding window size | – | 16 amino acids | – | 10 amino acids |
| Computing time for processing a sequence | Within a second | 5 min | Within a second | Within a second |
| Whether structural information considered | Secondary structure prediction, SA and PEST sequence occurrence considered | Secondary structure, SA and natively disordered regions considered | Secondary or tertiary structure of the substrate considered | Not considered |
| Types of caspases applicable | Specific training sets corresponding to caspases 1, 3, 6, 7, 8 | Mixed training sets for all caspases | Mixed training sets for all caspases | Mixed training sets for all caspases |
| Tools | CAT3 | PCSS | Blast | PROSPER |
| Species | Multispecies | Multispecies | N.A. | Multi-Species |
| Web server availability | No web server |
| N.A. |
|
| Algorithm | PSSM matrix | SVM with radial basis function (RBF) kernel | N.A. | BEAA trained and tested SVR model with RBF kernel combined with MDGI feature selection |
| Option of batch prediction | Yes | Yes | N.A. | No |
| Adjustment of prediction thresholds | Yes | Yes | N.A. | No |
| Standalone software availability | Yes | No | N.A. | No |
| Language implemented | Perl | – | N.A. | Perl |
| Dataset origin | Data from PubMed [ | Multiple resources | N.A. | Data from MEROPS, CutDB and PMAP [ |
| Ratio of positive to negative samples | – | – | N.A. | 1:3 |
| Sliding window size | – | – | N.A. | Six amino acids |
| Computing time for processing a sequence | Within a second | A few minutes | N.A. | A few minutes |
| Whether Structure information considered | Not considered | Regular secondary structure considered | N.A. | Secondary structure, SA and native disorder considered |
| Types of caspases applicable | Training sets corresponding to caspases-3 | Separated training sets for caspases and granzyme B | N.A. | Mixed training sets for all caspases |
| Tools | GraBCas | CasPredictor | CASVM | Cascleave 2.0 |
| Species | Multispecies | Multispecies | Multispecies | Multispecies |
| Web server availability |
|
|
|
|
| Algorithm | Scoring matrices | BLOSUM 62 Substitution Matrix-based CCSearcher algorithm | SVM | Maximum relevance, minimum redundancy and forward feature selection techniques trained SVM model |
| Option of batch prediction | – | – | – | – |
| Adjustment of prediction thresholds | – | – | – | – |
| Standalone software availability | – | – | – | – |
| Language implemented | Java | Visual Basic | Perl | Java |
| Dataset origin | – | Various databases, including SwissProt [ | Various resources | MEROPS |
| Ratio of positive to negative samples | – | – | – | 1:1 |
| Sliding window size | – | – | Three scanning window sizes are available: P4P1, P4P2’ and P14P10’ | – |
| Computing time for processing a sequence | – | – | – | – |
| Whether Structure information considered | Not considered | Not considered | Not considered | Secondary structure, SA and natively disordered regions considered |
| Types of caspases applicable | Specific training sets corresponding to caspases-3 and granzyme B | Mixed training sets for all caspases | Mixed training sets for all caspases | Mixed training sets for all caspases |
Note: These features include applicable species, whether web server exists, algorithm used, whether the batch prediction option is available, whether threshold is adjustable, whether stand-alone software exists, programming language used to implement the program, the origins of training data set, ratio of positive and negative samples, sliding window size (if exists), computing time to process one sequence and whether SA and SS is considered. The ‘-’ option means not available or not mentioned in the original paper.
Detailed description of the eight test data sets used in this study
| Test set name | Positive or negative | Test set description |
|---|---|---|
| Cas1-all | Positive set | Combination of caspase-1 substrates from |
| Negative set | Combination of protein from | |
| Cas3-all | Positive set | Combination of caspase-3 substrates from |
| Negative set | Combination of protein from | |
| Cas1-homo | Positive set | Caspase-1 substrates from |
| Negative set | Protein excluding caspase-1 substrates from | |
| Cas3-homo | Positive set | Caspase-3 substrates from |
| Negative set | Protein excluding caspase-3 substrates from | |
| Cas1-mus | Positive set | Caspase-1 substrates from |
| Negative set | Protein excluding caspase-1 substrates from | |
| Cas3-mus | Positive set | Caspase-3 substrates from |
| Negative set | Protein excluding caspase-3 substrates from | |
| Cas1-coli | Positive set | Caspase-1 substrates from |
| Negative set | Protein excluding caspase-1 substrates from | |
| Cas3-coli | Positive set | Caspase-3 substrates from |
| Negative set | Protein excluding caspase-3 substrates from | |
Figure 1.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper and SitePrediction on the Cas1-all set.
Figure 2.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper, CAT3 and SitePrediction on the Cas3-all set.
Figure 3.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper and SitePrediction on the Cas1-homo set.
Figure 4.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper, CAT3 and SitePrediction on the Cas3-homo set.
Figure 5.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper, CAT3, SitePrediction on the Cas3-mus set.
Figure 6.ROC curves of Blast, Cascleave, PCSS, PoPS, Pripper, CAT3, SitePrediction on the Cas3-coli set.
Summary of the top three tools that achieved the highest performance of AUC values for each set evaluated
| Data set | Top three tools of the highest performance of AUC values | ||
|---|---|---|---|
| Cas1-all | Cascleave (0.796) | PoPS (0.739) | Pripper (0.655) |
| Cas3-all | SitePrediction (0.754) | CAT3 (0.711) | Cascleave (0.693) |
| Cas1-homo | Cascleave (0.771) | PoPS (0.744) | Pripper (0.663) |
| Cas3-homo | SitePrediction (0.787) | Cascleave (0.745) | CAT3 (0.703) |
| Cas3-mus | SitePrediction (0.760) | Cascleave (0.729) | PoPS (0.712) |
| Cas3-coli | SitePrediction (0.702) | CAT3 (0.638) | PoPS (0.627) |
Note: The data sets used include are Cas1-all, Cas3-all, Cas1-homo, Cas3-homo, Cas3-mus and Cas3-coli.
Figure 7.A flowchart of the procedures for caspase-3 and caspase-8 substrate cleavage site prediction of the human proteome.
Figure 8.Western blotting of caspase assay analysis. Recombinant GST-mycGFP, 75 kDa band, was detected in both conditions with and without caspase-8 protein treatment. In the recombinant GST-IETD-mycGFP protein case, a 75 kDa band was detected in caspase-8 nontreatment condition, while in contrast a 50 kDa protein band was detected in caspase-8 treatment recombinant GSTIETD-mycGFP, indicating that the IETD linker was cleaved by caspase-8.
The caspase cleavage assay results of predicted potential caspase-3 substrates by PoPS, SitePrediction and cascleave
| Predicted caspase-3 substrate cleavage site | PoPS score | SitePrediction score | Cascleave score | Experimental result | Corresponding annotations in MEROPS |
|---|---|---|---|---|---|
| DVVD—GADT | 21.32 | 1560.39 | 1.578 | ◯ | – |
| EEVD—GSSP | 20.08 | 1515.922 | 1.461 | ◯ | – |
| EEVD—GSQG | 20.08 | 1515.922 | 1.461 | ◯ | C14 homologue |
| DETD—SGAG | 21.77 | 3400.88 | 1.345 | ◯ | C14.003: caspase-3, C14.005: caspase-6 |
| EEVD—GAPR | 20.08 | 1888.277 | 1.307 | ◯ | C14.005: caspase-6 |
| DSVD—GSLT | 21.26 | 1909.074 | 1.21 | ◯ | – |
| DDTD—GLTP | 17.79 | 791.345 | 1.157 | ◯ | C14.005: caspase-6, C14.006: caspase-2 |
| AEVD—GVDE | 19.93 | 295.25 | 1.061 | × | C14 homologue |
| DDPD—SAYL | 18.08 | 680.822 | 1.058 | ◯ | – |
| SEVD—GNDS | 20.05 | 449.294 | 1.039 | ◯ | C14.004: caspase-7, C14.006: caspase-2 |
| AEVD—GATP | 19.94 | 623.306 | 1.034 | ◯ | – |
| EEPD—GGFR | 16.97 | 414.973 | 0.969 | ◯ | – |
| TEPD—SPSP | Non-cleavage | Non-cleavage | 0.961 | × | – |
| SEID—GLKG | 18.7 | 220.873 | 0.911 | ◯ | – |
| EEPD—SANS | 17.14 | 761.635 | 0.82 | ◯ | C14.005: caspase-6, C14.006: caspase-2, C14 homologue |
| NEVD—GSNE | 20.01 | 223.501 | 0.766 | ◯ | – |
| EETD—GLDP | 16.86 | 886.89 | 0.747 | ◯ | C14.001: caspase-1, C14.005: caspase-6, C14.006: caspase-2, C14 homologue |
| EETD—GLHE | 16.86 | 886.89 | 0.747 | ◯ | – |
| GEVD—GKAI | 19.85 | 271.729 | 0.691 | ◯ | – |
| TEMD—SETL | Non-cleavage | Non-cleavage | 0.632 | × | – |
| LESD—SESL | Non-cleavage | Non-cleavage | 0.585 | × | – |
Note: ‘◯’ indicates the sequence is cleaved in the cleavage assay experiment, while ‘×’ indicates the sequence is not cleaved in the cleavage assay experiment.
The caspase cleavage assay results of predicted potential caspase-8 substrates by PoPS, SitePrediction and Cascleave
| Predicted caspase-8 substrate cleavage site | PoPS score | SitePrediction score | Cascleave score | Experimental result | Corresponding annotations in MEROPS |
|---|---|---|---|---|---|
| DVVD—GADT | 17.9 | 206.97 | 1.578 | ◯ | – |
| EEVD—GSSP | 21.24 | 771.98 | 1.461 | ◯ | – |
| EEVD—GSQG | 21.24 | 771.98 | 1.461 | ◯ | C14 homologue |
| DETD—SGAG | 17.77 | 3194.444 | 1.345 | ◯ | C14.003: caspase-3, C14.005: caspase-6 |
| EEVD—GAPR | 21.24 | 1621.17 | 1.307 | ◯ | C14.005: caspase-6 |
| DEVD—GAND | 22.46 | 3371.648 | 1.261 | ◯ | – |
| DETD—SPTV | 21.14 | 4921.875 | 1.236 | ◯ | C14.005: caspase-6, C14.006: caspase-2 |
| DSVD—GSLT | 17.59 | 525.68 | 1.21 | ◯ | C14 homologue |
| AEVD—GVDE | 22.46 | 1010.936 | 1.061 | ◯ | – |
| SEVD—GNDS | Non-cleavage | Non-cleavage | 1.039 | ◯ | C14.004: caspase-7, C14.006: caspase-2 |
| AEVD—GATP | 21.21 | 1268.26 | 1.034 | ◯ | – |
| TETD—SVGT | 20.01 | 854.701 | 0.999 | ◯ | – |
| EEPD—GGFR | Non-cleavage | Non-cleavage | 0.969 | ◯ | – |
| TEPD—SPSP | 17.38 | 92.307 | 0.961 | × | – |
| LEMD—SVLK | 19.27 | 412.088 | 0.935 | ◯ | C14.005: caspase-6, C14.006: caspase-2, C14 homologue |
| EEPD—SANS | Non-cleavage | Non-cleavage | 0.82 | ◯ | – |
| EETD—GLDP | 22.17 | 559.69 | 0.747 | ◯ | C14.001: caspase-1, C14.005: caspase-6, C14.006: caspase-2, C14 homologue |
| EETD—GLHE | 22.17 | 559.69 | 0.747 | ◯ | – |
| TEED—SVSV | 18.61 | 275.71 | 0.714 | ◯ | – |
| TEMD—SETL | 19.27 | 167.993 | 0.632 | × | – |
| LESD—SESL | 18.58 | 526.556 | 0.585 | × | – |
Note: ‘◯’ indicates the sequence is cleaved in the cleavage assay experiment while ‘×’ indicates the sequence is not cleaved in the cleavage assay experiment.