| Literature DB >> 29652843 |
Yadong Tang1, Lu Xie2, Lanming Chen3.
Abstract
Apoptosis proteins (APs) control normal tissue homeostasis by regulating the balance between cell proliferation and death. The function of APs is strongly related to their subcellular location. To date, computational methods have been reported that reliably identify the subcellular location of APs, however, there is still room for improvement of the prediction accuracy. In this study, we developed a novel method named iAPSL-IF (identification of apoptosis protein subcellular location-integrative features), which is based on integrative features captured from Markov chains, physicochemical property matrices, and position-specific score matrices (PSSMs) of amino acid sequences. The matrices with different lengths were transformed into fixed-length feature vectors using an auto cross-covariance (ACC) method. An optimal subset of the features was chosen using a recursive feature elimination (RFE) algorithm method, and the sequences with these features were trained by a support vector machine (SVM) classifier. Based on three datasets ZD98, CL317, and ZW225, the iAPSL-IF was examined using a jackknife cross-validation test. The resulting data showed that the iAPSL-IF outperformed the known predictors reported in the literature: its overall accuracy on the three datasets was 98.98% (ZD98), 94.95% (CL317), and 97.33% (ZW225), respectively; the Matthews correlation coefficient, sensitivity, and specificity for several classes of subcellular location proteins (e.g., membrane proteins, cytoplasmic proteins, endoplasmic reticulum proteins, nuclear proteins, and secreted proteins) in the datasets were 0.92-1.0, 94.23-100%, and 97.07-100%, respectively. Overall, the results of this study provide a high throughput and sequence-based method for better identification of the subcellular location of APs, and facilitates further understanding of programmed cell death in organisms.Entities:
Keywords: Markov chains; apoptosis proteins; physicochemical properties; position specific scoring matrix; recursive feature elimination; support vector machine
Mesh:
Substances:
Year: 2018 PMID: 29652843 PMCID: PMC5979326 DOI: 10.3390/ijms19041190
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The flowchart of the iAPSL-IF method. NR: non-redundant (NR) database of the National Center for Biotechnology Information (NCBI) (available online: https://www.ncbi.nlm.nih.gov/).
Figure 2The effects of g on overall accuracy based on the datasets ZD98 and CL317 after ACC transformation of the physicochemical property matrices.
Figure 3The effect of g on overall accuracy based on the datasets ZD98 and CL317 after ACC transformation of the PSSMs.
Figure 4The effect of the Top-K features on overall accuracy based on datasets ZD98, CL317, and ZW225.
Performance of the iAPSL-IF on the three datasets.
| Dataset | Location | ||||
|---|---|---|---|---|---|
| ZD98 | Cyto | 97.67 | 100 | 0.98 | 98.98 |
| Memb | 100 | 98.53 | 0.98 | ||
| Mito | 100 | 100 | 1.0 | ||
| other | 100 | 100 | 1.0 | ||
| CL317 | Cyto | 95.54 | 97.07 | 0.92 | 94.95 |
| Memb | 94.55 | 98.85 | 0.93 | ||
| Mito | 88.24 | 98.94 | 0.88 | ||
| Secr | 100 | 99.67 | 0.97 | ||
| Nucl | 94.23 | 98.87 | 0.93 | ||
| Endo | 97.87 | 100 | 0.99 | ||
| ZW225 | Cyto | 100 | 98.71 | 0.98 | 97.33 |
| Memb | 98.88 | 99.26 | 0.98 | ||
| Mito | 88.00 | 99.50 | 0.90 | ||
| Nucl | 95.12 | 98.98 | 0.94 |
Performance comparison of different methods on the ZD98 dataset.
| Method | Reference | |||||
|---|---|---|---|---|---|---|
| Cyto | Memb | Mito | Other | |||
| Covariant | 97.7 | 73.3 | 30.8 | 25.0 | 72.5 | [ |
| ID_SVM | 95.3 | 93.3 | 84.6 | 58.3 | 88.8 | [ |
| DWT_SVM | 95.4 | 93.3 | 53.9 | 91.7 | 88.8 | [ |
| ID | 90.7 | 90.0 | 92.3 | 91.7 | 90.8 | [ |
| EBGW_SVM | 97.7 | 90.0 | 92.3 | 83.3 | 92.9 | [ |
| PseAAC_SVM | 95.3 | 93.3 | 92.3 | 83.3 | 92.9 | [ |
| DF_SVM | 97.7 | 96.7 | 92.3 | 75.0 | 93.9 | [ |
| Dual_layer SVM | 95.4 | 96.7 | 92.3 | 91.7 | 94.9 | [ |
| APSLAP | 95.3 | 90.0 | 100 | 91.7 | 94.9 | [ |
| FKNN | 95.3 | 96.7 | 100 | 91.7 | 95.9 | [ |
| PSSM-AC | 97.7 | 96.7 | 100 | 83.3 | 95.9 | [ |
| PSSM-trigram | 95.3 | 100 | 100 | 91.7 | 96.9 | [ |
| iAPSL-IF | 97.7 | 100 | 100 | 100 | 99.0 | This study |
Performance comparison of different methods on the CL317 dataset.
| Method | Reference | |||||||
|---|---|---|---|---|---|---|---|---|
| Cyto | Memb | Mito | Secr | Nucl | Endo | |||
| ID | 81.3 | 81.8 | 85.3 | 88.2 | 82.7 | 83.0 | 82.7 | [ |
| ID_SVM | 91.1 | 89.1 | 79.4 | 58.8 | 73.1 | 87.2 | 84.2 | [ |
| DF_SVM | 92.9 | 85.5 | 76.5 | 76.5 | 93.6 | 86.5 | 88.0 | [ |
| Auto_Cova | 86.4 | 90.7 | 93.8 | 85.7 | 92.1 | 93.8 | 90.0 | [ |
| FKNN | 93.8 | 92.7 | 82.4 | 76.5 | 90.4 | 93.6 | 90.9 | [ |
| PseAAC_SVM | 93.8 | 90.9 | 85.3 | 76.5 | 90.4 | 95.7 | 91.1 | [ |
| EN_FKNN | 98.2 | 83.6 | 79.4 | 82.4 | 90.4 | 97.9 | 91.5 | [ |
| PSSM-AC | 93.8 | 90.9 | 91.2 | 82.4 | 86.5 | 95.7 | 91.5 | [ |
| APSLAP | 99.1 | 89.1 | 85.3 | 88.2 | 84.3 | 95.8 | 92.4 | [ |
| EI_SVM | 94.6 | 95.7 | 92.7 | 82.4 | 90.4 | 70.6 | 91.1 | [ |
| iAPSL-IF | 95.5 | 94.5 | 88.2 | 100 | 94.2 | 97.9 | 95.0 | This study |
Performance comparison of different methods on the ZW225 dataset.
| Method | Reference | |||||
|---|---|---|---|---|---|---|
| Cyto | Memb | Mito | Nucl | |||
| EBGW_SVM | 90.0 | 93.3 | 60.0 | 63.4 | 83.1 | [ |
| DF_SVM | 87.1 | 92.1 | 64.0 | 73.2 | 84.0 | [ |
| PSSM-AC | 82.9 | 92.1 | 68.0 | 78.0 | 84.0 | [ |
| ID_SVM | 92.9 | 91.0 | 68.0 | 73.2 | 85.8 | [ |
| Auto_Cova | 81.3 | 93.3 | 85.7 | 84.6 | 87.1 | [ |
| EN_FKNN | 94.3 | 94.4 | 60.0 | 80.5 | 88.0 | [ |
| PSSM-trigram | 97.1 | 98.9 | 96.0 | 97.6 | 97.8 | [ |
| iAPSL-IF | 100 | 98.9 | 88.0 | 95.1 | 97.3 | This study |
The original values of the ten physiochemical properties for all amino acids [41].
| AA | P(1) | P(2) | P(3) | P(4) | P(5) | P(6) | P(7) | P(8) | P(9) | P(10) |
|---|---|---|---|---|---|---|---|---|---|---|
| A | 8.100 | −1.302 | −0.733 | 1.57 | −0.146 | 0.620 | −0.500 | 27.500 | 0.046 | 1.181 |
| C | 5.500 | 0.465 | −0.862 | −1.02 | −0.255 | 0.290 | −1.000 | 44.600 | 0.128 | 1.461 |
| D | 13.000 | 0.302 | −3.656 | −0.259 | −3.242 | −0.900 | 3.000 | 40.000 | 0.105 | 1.587 |
| E | 12.300 | −1.453 | 1.477 | 0.113 | −0.837 | −0.740 | 3.000 | 62.000 | 0.151 | 1.862 |
| F | 5.200 | −0.59 | 1.891 | −0.397 | 0.412 | 1.190 | −2.500 | 115.500 | 0.290 | 2.228 |
| G | 9.000 | 1.652 | 1.33 | 1.045 | 2.064 | 0.480 | 0.000 | 0.000 | 0.000 | 0.881 |
| H | 10.400 | −0.417 | −1.673 | −1.474 | −0.078 | −0.400 | −0.500 | 79.000 | 0.230 | 2.025 |
| I | 5.200 | −0.547 | 2.131 | 0.393 | 0.816 | 1.380 | −1.800 | 93.500 | 0.186 | 1.810 |
| K | 11.300 | −0.561 | 0.533 | −0.277 | 1.648 | −1.500 | 3.000 | 100.000 | 0.219 | 2.258 |
| L | 4.900 | −0.987 | −1.505 | 1.266 | −0.912 | 1.060 | −1.800 | 93.500 | 0.186 | 1.931 |
| M | 5.700 | −1.524 | 2.219 | −1.005 | 1.212 | 0.640 | −1.300 | 94.100 | 0.221 | 2.034 |
| N | 11.600 | 0.828 | 1.299 | −0.169 | 0.933 | −0.780 | 2.000 | 58.700 | 0.134 | 1.655 |
| P | 8.000 | 2.081 | −1.628 | 0.421 | −1.392 | 0.120 | 0.000 | 41.900 | 0.131 | 1.468 |
| Q | 10.500 | −0.179 | −3.005 | −0.503 | −1.853 | −0.850 | 0.200 | 80.700 | 0.180 | 1.932 |
| R | 10.500 | −0.055 | 1.502 | 0.44 | 2.897 | −2.530 | 3.000 | 105.000 | 0.291 | 2.560 |
| S | 9.200 | 1.399 | −4.76 | 0.67 | −2.647 | −0.180 | 0.300 | 29.300 | 0.062 | 1.298 |
| T | 8.000 | 0.326 | 2.213 | 0.908 | 1.313 | −0.050 | −0.400 | 51.300 | 0.108 | 1.525 |
| V | 5.900 | −0.279 | −0.544 | 1.242 | −1.262 | 1.080 | −1.500 | 71.500 | 0.140 | 1.645 |
| W | 5.400 | 0.009 | 0.672 | −2.128 | −0.184 | 0.810 | −3.400 | 145.500 | 0.409 | 2.663 |
| Y | 6.200 | 0.83 | 3.097 | −0.838 | 1.512 | 0.260 | −2.300 | 117.300 | 0.298 | 2.368 |