| Literature DB >> 29280997 |
Castrense Savojardo1, Pier Luigi Martelli1, Piero Fariselli2, Rita Casadio1.
Abstract
Motivation: The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29280997 PMCID: PMC5946842 DOI: 10.1093/bioinformatics/btx818
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The architecture of the DCNN processing an input protein sequence to detect signal peptides. Feature extraction involves the application of three convolution-pooling (conv-pool) stages. The final classification is performed by a standard fully-connected neural network
Fig. 2.The signal-peptide GRHCRF model capturing the modular structure of the signal peptide. States labeled with N, H, and C represents the positively charged N-region, the hydrophobic H-region and the cleavage C-region, respectively (see Section 2.4 for further details)
Statistics of the three datasets adopted in this study
| Dataset | Organism | SP | T | N/C | Total |
|---|---|---|---|---|---|
| SignalP4.0 | Eukaryotes | 1640 | 987 | 5133 | 7760 |
| Gram-positive | 208 | 117 | 360 | 685 | |
| Gram-negative | 423 | 523 | 912 | 1858 | |
| SPDS17 | Eukaryotes | 46 | 323 | 689 | 1058 |
| Gram-positive | 9 | 189 | 240 | 438 | |
| Gram-negative | 23 | 89 | 99 | 211 | |
| – | 573 | 1024 | 4375 | 5972 |
Note: SP, signal-peptide proteins; T, transmembrane proteins (with a single alpha helix in the N-terminal region); N/C, Nuclear and/or Cytosolic proteins (proteins without signal peptide); Total, total sum.
Performance of different versions of SignalP and DeepSig on signal peptide detection and cleavage site prediction in 5-fold cross-validation on the SignalP4.0 dataset (Petersen )
| Method | Eukaryotes | Gram-positive | Gram-negative | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MCC | FPRT | F1cs | MCC | FPRT | F1cs | MCC | FPRT | F1cs | |
| SignalP 4.0 | 0.874 | 6.1 | 67.1 | 0.851 | 2.6 | 77.8 | 0.848 | 1.5 | 68.0 |
| SignalP-TM | 0.871 | 3.3 | 67.2 | 0.851 | 2.6 | 77.8 | 0.815 | 1.1 | 67.7 |
| SignalP-noTM | 0.674 | 38.1 | 54.6 | 0.556 | 47.9 | 49.4 | 0.497 | 35.8 | 67.7 |
| DeepSig (no relevance) | 0.910 | 2.6 | 71.1 | 0.878 | 5.9 | 69.7 | 0.900 | 1.5 | 83.5 |
| DeepSig | 0.910 | 2.6 | 73.3 | 0.878 | 5.9 | 72.3 | 0.900 | 1.5 | 86.2 |
Note: MCC, Matthews Correlation Coefficient; FPRT, False Positive Rate on transmembrane proteins; F1cs, The harmonic mean between precision and recall on cleavage-site detection. No relevance = without relevance profile as feature for cleavage-site prediction (Section 2.4).
Data taken from Petersen .
Comparative benchmark of different methods in signal peptide detection and cleavage site prediction on the SPDS17 independent dataset
| Method | Eukaryotes | Gram-positive | Gram-negative | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MCC | FPRT | F1cs | MCC | FPRT | F1cs | MCC | FPRT | F1cs | |
| SPOCTOPUS | 0.54 | 16.7 | 0.20 | 0.28 | 20.2 | 0.37 | 0.63 | 14.3 | 0.12 |
| PRED-TAT | 0.55 | 9.3 | 0.33 | 0.26 | 2.2 | 0.72 | 0.82 | 9.9 | 0.14 |
| Philius | 0.62 | 6.5 | 0.46 | 0.31 | 3.4 | 0.72 | 0.87 | 7.4 | 0.22 |
| PolyPhobius | 0.73 | 7.4 | 0.42 | 0.44 | 11.2 | 0.53 | 0.80 | 7.9 | 0.06 |
| TOPCONS2.0 | 0.74 | 5.3 | 0.27 | 0.49 | 4.5 | 0.60 | 0.91 | 2.6 | 0.08 |
| SignalP4.1 | 0.82 | 4.0 | 0.69 | 0.50 | 0.0 | 0.79 | 0.93 | 4.2 | 0.33 |
| DeepSig | 0.86 | 2.5 | 0.72 | 0.54 | 0.0 | 0.82 | 0.95 | 2.6 | 0.36 |
Note: MCC, Matthews Correlation Coefficient; FPRT, False Positive Rate on transmembrane proteins; F1cs, The harmonic mean between precision and recall on cleavage-site detection.