| Literature DB >> 28890846 |
Ravindra Kumar1,2, Bandana Kumari1, Manish Kumar1.
Abstract
BACKGROUND: The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum.Entities:
Keywords: Amino acid composition; Compositional difference; Leave-one-out cross-validation; Pseudo amino acid composition; Split amino acid composition
Year: 2017 PMID: 28890846 PMCID: PMC5588793 DOI: 10.7717/peerj.3561
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Distribution of ER resident and non-resident proteins in different datasets.
| Proteins | Training dataset | Independent dataset |
|---|---|---|
| Endoplasmic Reticulum Resident Proteins | 124 | 65 |
| Non- Endoplasmic Reticulum Resident Proteins | 1,200 | 2,900 |
Figure 1Prediction schema of Endoplasmic reticulum resident proteins using split amino acid based input to SVM.
Figure 2Relative enrichment and depletion profile of amino acids in ERRPs with reference to non-ERRPs.
A negative value indicates the depletion and a positive value indicates the enrichment of amino acid.
Performance of SVM-Models based on different input vectors during leave-one-out cross validation.
| Input vector | Sensitivity | Specificity | Accuracy | MCC | AUC |
|---|---|---|---|---|---|
| AAC | 72.58 | 73.42 | 73.34 | 0.29 | 0.78 |
| Pseudo AAC | 70.97 | 75.25 | 74.85 | 0.30 | 0.77 |
| Dipeptide Composition | 69.35 | 72.58 | 72.28 | 0.26 | 0.76 |
| N-ter-SAAC | 75.00 | 80.00 | 79.53 | 0.37 | 0.83 |
| C-ter-SAAC | 72.58 | 70.92 | 71.07 | 0.27 | 0.77 |
| SAAC-3 parts | 79.84 | 81.58 | 81.42 | 0.42 | 0.85 |
Notes.
AAC: amino acid composition, Pseudo AAC: pseudo amino acid composition, N-ter-SAAC: 25 N-terminal and remaining sequence composition, C-ter-SAAC: 25 C-terminal and remaining sequence composition, and SAAC-3 parts: 25 N-terminal, 25 C-terminal and remaining amino acid composition. MCC and AUC represent Matthews’s correlation coefficient and area under ROC curve, respectively.
Figure 3ROC plots of ERRPs prediction using different amino acid features.
AAC, PsedoAAC, Dipeptide, N-ter-SAAC, C-ter-SAAC and SAAC-3-parts represents amino acid composition, pseudo-amino acid composition, dipeptide composition, 25 N-terminal and remaining amino acid composition, 25 C-terminal and remaining amino acid composition and 25 N-terminal, 25 C-terminal and remaining amino acid composition, respectively.
Comparative performance of ERPred vis-à-vis iLoc-Euk, Cello v.2.5 and Euk-mPloc 2.0 on independent dataset.
| Methods | Sensitivity (%) | Specificity (%) |
|---|---|---|
| ERPred | 72.31 | 83.69 |
| Cello 2.5 | 16.92 | 99.86 |
| iLoc-Euk | 15.38 | 99.76 |
| Euk-mPLoc 2.0 | 66.15 | 99.00 |
Proteome level prediction of ERRPs using ERPred and comparison with ER-GolgiDB and Locate databases.
| Proteome | Number of proteins in complete proteome | Number of ERRPs predicted by ERPred | % of ERRPs in proteome | Number of protein in different database | |
|---|---|---|---|---|---|
| ER-GolgiDB | Locate | ||||
| 68,554 | 2,293 | 3.34 | 2,543 | 1,762 | |
| 45,185 | 1,781 | 3.94 | 2,248 | 1,588 | |
| 22,024 | 707 | 3.21 | 1,075 | – | |
| 26,109 | 1,014 | 3.88 | 1,196 | – | |
| 5,450 | 148 | 2.72 | 407 | – | |
| 31,527 | 1,089 | 3.45 | 1,765 | – | |