| Literature DB >> 31921288 |
Yang-Yang Miao1,2, Wei Zhao1, Guang-Ping Li1, Yang Gao3, Pu-Feng Du1.
Abstract
Background: The endoplasmic reticulum (ER) is an important organelle in eukaryotic cells. It is involved in many important biological processes, such as cell metabolism, protein synthesis, and post-translational modification. The proteins that reside within the ER are called ER-resident proteins. These proteins are closely related to the biological functions of the ER. The difference between the ER-resident proteins and other non-resident proteins should be carefully studied.Keywords: endoplasmic reticulum resident protein; leave-one-out cross-validation, weight transfer; pseudo-amino acid composition; support vector machine
Year: 2019 PMID: 31921288 PMCID: PMC6932965 DOI: 10.3389/fgene.2019.01231
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Breakdown of the dataset.
| Data set | ERRP | non-ERRP |
|---|---|---|
| Training set | 124 | 1200 |
| Independent testing set | 65 | 2900 |
ERRP, Endoplasmic reticulum resident proteins.
non-ERRP, Non-endoplasmic reticulum resident proteins.
Figure 1Flowchart of the algorithm. The input sequence will be first converted to matrix-based notations. These notations will be converted into fixed-length numerical vectors, which can represent the sequence order information, the evolutionary information, and the importance of the terminal signaling peptides.
Prediction performance estimations using a jackknife test.
| Methods | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|
| This work | 83.1% | 86.4% | 86.1% | 50.6% |
| ERPred | 79.8% | 81.6% | 81.4% | 42.0% |
Prediction performance comparison using the independent dataset.
| Methods | Sensitivity | Specificity |
|---|---|---|
| This work | 85.7% | 67.2% |
| ERPred | 72.3% | 83.7% |
| Cello 2.5 | 16.9% | 99.9% |
| iLoc-Euk | 15.4% | 99.8% |
| Euk-mPloc 2.0 | 66.2% | 99.0% |
Figure 2Illustration of the U-shaped weight-transfer function with various k values. The U-shaped function transfers weights from the middle part of a sequence to its terminals. The total weight of a sequence does not change after applying the U-shaped weight-transfer function. When the parameter k is 0, every residue on the sequence has equal weights, which will produce identical results as where there is no weight-transfer function. When the value of k increases, more and more weights are transferred from the residues in the middle part of a sequence to the residues on its terminals.
Figure 3Performance analysis with different weight-transfer functions. Prediction performance varies with the value of parameter k in the weight-transfer function. When k = 0.1, the performance value peaks. This means that the residues on the terminals are slightly more important than those in the middle part in predicting ER-resident proteins.