| Literature DB >> 30795499 |
Zhan-Heng Chen1,2, Zhu-Hong You3,4, Li-Ping Li5, Yan-Bin Wang6, Leon Wong7,8, Hai-Cheng Yi9,10.
Abstract
It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.Entities:
Keywords: fast Fourier transform; position-specific scoring matrix; random projection; self-interacting proteins
Mesh:
Substances:
Year: 2019 PMID: 30795499 PMCID: PMC6412412 DOI: 10.3390/ijms20040930
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
The results of the RP-FFT method with 5-fold cross-validation on the human dataset.
| Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 94.44 | 88.28 | 100.00 | 89.36 |
| 2 | 92.53 | 85.37 | 100.00 | 86.07 |
| 3 | 92.19 | 85.48 | 100.00 | 85.51 |
| 4 | 93.75 | 86.76 | 100.00 | 88.08 |
| 5 | 94.81 | 89.73 | 100.00 | 90.12 |
| Average | 93.54 ± 1.15 | 87.12 ± 1.87 | 100.00 ± 0.00 | 87.83 ± 2.01 |
The results of the RP-FFT method with 5-fold cross-validation on the yeast dataset.
| Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 80.99 | 97.12 | 65.52 | 65.71 |
| 2 | 83.45 | 92.14 | 75.00 | 68.03 |
| 3 | 82.04 | 97.89 | 66.20 | 67.57 |
| 4 | 84.86 | 95.14 | 74.29 | 71.13 |
| 5 | 83.45 | 92.41 | 74.10 | 67.83 |
| Average | 82.96 ± 1.48 | 94.94 ± 2.63 | 71.02 ± 4.73 | 68.05 ± 1.95 |
The results of the RP-FFT method with 5-fold cross-validation on the human dataset.
| Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|
| 1 | 96.23 | 79.51 | 97.74 | 75.72 | 88.63 |
| 2 | 96.20 | 80.34 | 97.65 | 75.89 | 89.00 |
| 3 | 96.58 | 82.49 | 97.89 | 78.61 | 90.19 |
| 4 | 96.40 | 79.78 | 97.79 | 75.40 | 88.79 |
| 5 | 96.00 | 85.28 | 97.01 | 76.68 | 91.15 |
| Average | 96.28 ± 0.22 | 81.48 ± 2.43 | 97.62 ± 0.35 | 76.46 ± 1.29 | 89.55 ± 1.08 |
The results of the RP-FFT method with 5-fold cross-validation on the yeast dataset.
| Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|
| 1 | 91.32 | 50.00 | 96.73 | 53.09 | 73.37 |
| 2 | 91.72 | 47.33 | 97.81 | 55.35 | 72.57 |
| 3 | 92.20 | 49.63 | 97.39 | 54.80 | 73.51 |
| 4 | 91.00 | 42.36 | 97.36 | 49.06 | 69.86 |
| 5 | 93.09 | 54.74 | 97.83 | 60.82 | 76.29 |
| Average | 91.87 ± 0.82 | 48.81 ± 4.50 | 97.42 ± 0.45 | 54.62 ± 4.25 | 73.12 ± 2.30 |
The results of RP classifier based on different feature extraction methods on the yeast dataset.
| Feature Extraction Methods | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|
| SVD | 88.73 ± 0.75 | 10.25 ± 2.93 | 98.86 ± 0.43 | 19.76 ± 2.96 | 54.55 ± 1.31 |
| DCT | 90.35 ± 0.84 | 20.38 ± 2.62 | 99.36 ± 0.32 | 37.57 ± 1.74 | 59.87 ± 1.18 |
| COV | 91.93 ± 0.81 | 42.43 ± 4.82 | 98.31 ± 0.25 | 53.10 ± 4.91 | 70.37 ± 2.49 |
| FFT | 91.87 ± 0.82 | 48.81 ± 4.50 | 97.42 ± 0.45 | 54.62 ± 4.25 | 73.12 ± 2.30 |
Performance comparison of RP and SVM on the human dataset.
| Model | Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|---|
| RP + FFT | 1 | 96.23 | 79.51 | 97.74 | 75.72 | 88.63 |
| 2 | 96.20 | 80.34 | 97.65 | 75.89 | 89.00 | |
| 3 | 96.58 | 82.49 | 97.89 | 78.61 | 90.19 | |
| 4 | 96.40 | 79.78 | 97.79 | 75.40 | 88.79 | |
| 5 | 96.00 | 85.28 | 97.01 | 76.68 | 91.15 | |
| Average | 96.28 ± 0.22 | 81.48 ± 2.43 | 97.62 ± 0.35 | 76.46 ± 1.29 | 89.55 ± 1.08 | |
| SVM + FFT | 1 | 93.55 | 22.22 | 100.00 | 45.57 | 61.11 |
| 2 | 93.64 | 23.79 | 100.00 | 47.17 | 61.90 | |
| 3 | 93.21 | 20.54 | 100.00 | 43.73 | 60.27 | |
| 4 | 94.19 | 24.34 | 100.00 | 47.86 | 62.17 | |
| 5 | 93.82 | 28.09 | 100.00 | 51.30 | 64.05 | |
| Average | 93.68 ± 0.36 | 23.80 ± 2.82 | 100.00 ± 0.00 | 47.13 ± 2.82 | 61.90 ± 1.41 |
Performance comparison of RP and SVM on the yeast dataset.
| Model | Testing Set | Acc. (%) | Sen. (%) | Spe. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|---|
| RP+FFT | 1 | 91.32 | 50.00 | 96.73 | 53.09 | 73.37 |
| 2 | 91.72 | 47.33 | 97.81 | 55.35 | 72.57 | |
| 3 | 92.20 | 49.63 | 97.39 | 54.80 | 73.51 | |
| 4 | 91.00 | 42.36 | 97.36 | 49.06 | 69.86 | |
| 5 | 93.09 | 54.74 | 97.83 | 60.82 | 76.29 | |
| Average | 91.87 ± 0.82 | 48.81 ± 4.50 | 97.42 ± 0.45 | 54.62 ± 4.25 | 73.12 ± 2.30 | |
| SVM+FFT | 1 | 90.11 | 14.58 | 100.00 | 36.22 | 57.29 |
| 2 | 90.84 | 24.00 | 100.00 | 46.62 | 62.00 | |
| 3 | 90.76 | 14.81 | 100.00 | 36.64 | 57.41 | |
| 4 | 90.51 | 18.06 | 100.00 | 40.38 | 59.03 | |
| 5 | 90.92 | 17.52 | 100.00 | 39.87 | 58.76 | |
| Average | 90.63 ± 0.33 | 17.79 ± 3.80 | 100.00 ± 0.00 | 39.95 ± 4.17 | 58.90 ± 1.90 |
Figure 1Comparison of ROC curves between RP and SVM on human (5-fold cross validation). (a) is the ROC curve of SVM method on human dataset by 5-fold cross validation. (b) is the ROC curve of RP classifier on human dataset by 5-fold cross validation.
Figure 2Comparison of ROC curves between RP and SVM on yeast (5-fold cross validation). (a) is the ROC curve of SVM method on yeast dataset by 5-fold cross validation. (b) is the ROC curve of RP classifier on yeast dataset by 5-fold cross validation.
Comparison of RP-FFT with the other existing models on the yeast dataset.
| Model | Acc. (%) | Spe. (%) | Sen. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|
| SLIPPER [ | 71.90 | 72.18 | 69.72 | 28.42 | 70.95 |
| DXECPPI [ | 87.46 | 94.93 | 29.44 | 28.25 | 62.19 |
| PPIevo [ | 66.28 | 87.46 | 60.14 | 18.01 | 73.80 |
| LocFuse [ | 66.66 | 68.10 | 55.49 | 15.77 | 61.80 |
| CRS [ | 72.69 | 74.37 | 59.58 | 23.68 | 66.98 |
| SPAR [ | 76.96 | 80.02 | 53.24 | 24.84 | 66.63 |
| Proposed method | 91.87 | 97.42 | 48.81 | 54.62 | 73.12 |
Comparison of RP-FFT with the other existing models on the human dataset.
| Model | Acc. (%) | Spe. (%) | Sen. (%) | MCC (%) | B_Acc. (%) |
|---|---|---|---|---|---|
| SLIPPER [ | 91.10 | 95.06 | 47.26 | 41.97 | 71.16 |
| DXECPPI [ | 30.90 | 25.83 | 87.08 | 8.25 | 56.46 |
| PPIevo [ | 78.04 | 25.82 | 87.83 | 20.82 | 56.83 |
| LocFuse [ | 80.66 | 80.50 | 50.83 | 20.26 | 65.67 |
| CRS [ | 91.54 | 96.72 | 34.17 | 36.33 | 65.45 |
| SPAR [ | 92.09 | 97.40 | 33.33 | 38.36 | 65.37 |
| Proposed method | 96.28 | 97.62 | 81.48 | 76.46 | 89.55 |