| Literature DB >> 35884848 |
Ying Wang1, Lin-Lin Wang1, Leon Wong2, Yang Li3, Lei Wang1,2, Zhu-Hong You2,4.
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.Entities:
Keywords: graph convolutional networks; protein–protein interactions; random forest; self-interacting protein
Year: 2022 PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Figure 1The flowchart of SIPGCN.
The FFCV outcomes attained using SIPGCN in the human data set.
| Testing Set | Acc. | Spe. | MCC | F1 | AUC |
|---|---|---|---|---|---|
| 1 | 93.53% | 99.68% | 49.63% | 44.72% | 0.6108 |
| 2 | 93.41% | 99.84% | 41.77% | 33.62% | 0.6198 |
| 3 | 92.78% | 99.03% | 37.20% | 34.81% | 0.5841 |
| 4 | 94.10% | 99.85% | 32.09% | 22.64% | 0.5422 |
| 5 | 94.42% | 99.78% | 54.36% | 49.74% | 0.6773 |
| Average | 93.65 ± 0.64% | 99.64 ± 0.35% | 43.01 ± 9.04% | 37.11 ± 10.54% | 0.6068 ± 0.0496 |
The FFCV outcomes attained using SIPGCN in the yeast data set.
| Testing Set | Acc. | Spe. | F1 | MCC | AUC |
|---|---|---|---|---|---|
| 1 | 91.32% | 99.82% | 44.33% | 49.87% | 0.6599 |
| 2 | 91.08% | 98.84% | 35.09% | 37.04% | 0.6413 |
| 3 | 90.35% | 98.99% | 41.75% | 43.85% | 0.6838 |
| 4 | 90.11% | 98.72% | 37.56% | 39.07% | 0.6181 |
| 5 | 90.60% | 99.01% | 33.14% | 36.13% | 0.6122 |
| Average | 90.69 ± 0.50% | 99.08 ± 0.43% | 38.37 ± 4.63% | 41.19 ± 5.69% | 0.6430 ± 0.0297 |
Figure 2The ROC generated by SIPGCN in the human data set.
Figure 3The ROC generated by SIPGCN in the yeast data set.
Figure 4Learning curve trajectory generated by SIPGCN in the human and yeast data sets.
The outcomes of different classifier models in the human data set.
| Model | Testing Set | Acc. | Spe. | MCC | F1 | AUC |
|---|---|---|---|---|---|---|
| ELM | 1 | 86.88% | 93.55% | 14.33% | 21.38% | 58.32% |
| 2 | 86.99% | 92.58% | 11.95% | 19.00% | 56.66% | |
| 3 | 88.26% | 94.26% | 16.77% | 23.02% | 56.21% | |
| 4 | 86.62% | 92.72% | 11.27% | 18.56% | 55.46% | |
| 5 | 87.21% | 93.17% | 14.56% | 21.52% | 54.59% | |
| Average | 87.19 ± 0.63% | 93.26 ± 0.68% | 13.78 ± 2.21% | 20.70 ± 1.87% | 56.25 ± 1.40% | |
| KNN | 1 | 87.34% | 93.81% | 11.76% | 18.52% | 54.53% |
| 2 | 87.63% | 93.49% | 15.11% | 21.82% | 59.00% | |
| 3 | 87.17% | 93.00% | 12.81% | 19.78% | 56.21% | |
| 4 | 86.30% | 92.65% | 13.44% | 20.93% | 53.86% | |
| 5 | 87.55% | 93.62% | 13.25% | 19.96% | 56.66% | |
| Average | 87.20 ± 0.53% | 93.31 ± 0.48% | 13.27 ± 1.22% | 20.20 ± 1.25% | 56.05 ± 2.01% | |
| SIPGCN | Average | 93.65 ± 0.64% | 99.64 ± 0.35% | 43.01 ± 9.04% | 37.11 ± 10.54% | 60.68 ± 4.96% |
The outcomes of different classifier models in the yeast data set.
| Model | Testing Set | Acc. | Spe. | MCC | F1 | AUC |
|---|---|---|---|---|---|---|
| ELM | 1 | 79.18% | 87.01% | 8.93% | 20.80% | 55.50% |
| 2 | 79.82% | 85.93% | 10.53% | 21.32% | 56.27% | |
| 3 | 80.14% | 86.47% | 12.70% | 23.53% | 55.02% | |
| 4 | 80.87% | 86.96% | 17.41% | 27.88% | 59.00% | |
| 5 | 78.39% | 86.04% | 10.14% | 22.48% | 51.64% | |
| Average | 79.68 ± 0.94% | 86.48 ± 0.50% | 11.94 ± 3.35% | 23.20 ± 2.82% | 55.49 ± 2.64% | |
| KNN | 1 | 82.32% | 90.59% | 12.60% | 22.54% | 59.12% |
| 2 | 83.44% | 90.68% | 10.92% | 20.16% | 52.72% | |
| 3 | 81.75% | 90.11% | 12.21% | 22.53% | 54.78% | |
| 4 | 82.88% | 91.35% | 11.35% | 20.82% | 53.99% | |
| 5 | 83.94% | 92.07% | 9.94% | 18.70% | 54.18% | |
| Average | 82.86 ± 0.87% | 90.96 ± 0.76% | 11.40 ± 1.06% | 20.95 ± 1.64% | 54.96 ± 2.44% | |
| SIPGCN | Average | 90.69 ± 0.50% | 99.08 ± 0.43% | 41.19 ± 5.69% | 38.37 ± 4.63% | 64.30 ± 2.97% |
Figure 5The outcomes of different classifier models in the human data set.
Figure 6The outcomes of different classifier models in the yeast data set.
Comparison of SIPGCN with an AC feature model in the human data set.
| Model | Testing Set | Acc. | Spe. | MCC | F1 | AUC |
|---|---|---|---|---|---|---|
| AC | 1 | 84.12% | 90.71% | 5.05% | 13.75% | 49.99% |
| 2 | 83.94% | 89.58% | 7.97% | 16.47% | 57.12% | |
| 3 | 83.22% | 89.87% | 6.17% | 15.38% | 52.51% | |
| 4 | 85.04% | 90.15% | 9.46% | 17.20% | 55.81% | |
| 5 | 85.23% | 91.35% | 7.97% | 16.01% | 55.79% | |
| Average | 84.31 ± 0.82% | 90.33 ± 0.71% | 7.32 ± 1.72% | 15.76 ± 1.30% | 54.24 ± 2.93% | |
| SIPGCN | Average | 93.65 ± 0.64% | 99.64 ± 0.35% | 43.01 ± 9.04% | 37.11 ± 10.54% | 60.68 ± 4.96% |
Comparison of SIPGCN with the AC feature model in the yeast data set.
| Model | Testing Set | Acc. | Spe. | MCC | F1 | AUC |
|---|---|---|---|---|---|---|
| AC | 1 | 78.62% | 87.73% | 0.88% | 13.07% | 52.37% |
| 2 | 78.62% | 85.69% | 7.51% | 19.39% | 54.44% | |
| 3 | 80.63% | 87.87% | 10.10% | 20.98% | 58.29% | |
| 4 | 79.02% | 86.57% | 6.37% | 18.18% | 55.90% | |
| 5 | 80.16% | 87.09% | 10.04% | 21.09% | 55.84% | |
| Average | 79.41 ± 0.93% | 86.99 ± 0.90% | 6.98 ± 3.77% | 18.54 ± 3.29% | 55.37 ± 2.17% | |
| SIPGCN | Average | 90.69 ± 0.50% | 99.08 ± 0.43% | 41.19 ± 5.69% | 38.37 ± 4.63% | 64.30 ± 2.97% |
Comparison of accuracy among SIPGCN and previous models in the human and yeast data sets.
| Data Set | SIPGCN | SMOTE | PSPEL | RP-FFT | SPAR | LocFuse |
|---|---|---|---|---|---|---|
|
| 93.65% | 91.68% | 91.30% | 93.54% | 92.09% | 80.66% |
|
| 90.69% | 85.49% | 86.86% | 82.96% | 76.96% | 66.66% |