| Literature DB >> 35419026 |
Meijing Li1, Yingying Jiang1, Keun Ho Ryu2,3,4.
Abstract
Protein-protein interaction (PPI) prediction is meaningful work for deciphering cellular behaviors. Although many kinds of data and machine learning algorithms have been used in PPI prediction, the performance still needs to be improved. In this paper, we propose InferSentPPI, a sentence embedding based text mining method with gene ontology (GO) information for PPI prediction. First, we design a novel weighting GO term-based protein sentence representation method to generate protein sentences including multi-semantic information in the preprocessing. Gene ontology annotation (GOA) provides the reliability of relationships between proteins and GO terms for PPI prediction. Thus, GO term-based protein sentence can help to improve the prediction performance. Then we also propose an InferSent_PN algorithm based on the protein sentences and InferSent algorithm to extract relations between proteins. In the experiments, we evaluate the effectiveness of InferSentPPI with several benchmarking datasets. The result shows our proposed method has performed better than the state-of-the-art methods for a large PPI dataset.Entities:
Keywords: gene ontology; infersent; protein-protein interaction; sentence representations; text mining
Year: 2022 PMID: 35419026 PMCID: PMC8995897 DOI: 10.3389/fgene.2022.827540
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The workflow of InferSentPPI method.
FIGURE 2The workflow of the annotation axiom generation.
FIGURE 3The workflow of the protein sentence representation.
FIGURE 4The workflow of the InferSent_PN model.
The number of PPIs in seven test datasets after the preprocessing.
| Database | STRING | DIP | HPRD | DIP | |||
|---|---|---|---|---|---|---|---|
| Label | #Yeast | #Human | #Yeast | #Human | #E.coli | #H.sapiens | #M.musculus |
| Positive | 414,240 | 435,209 | 5,436 | 536 | 1,112 | 981 | 100 |
| Negative | 414,240 | 435,209 | 5,436 | 536 | — | — | — |
| Total | 828,480 | 870,418 | 10,872 | 1,072 | 1,112 | 981 | 100 |
Performance comparison of six methods on the yeast dataset from DIP.
| Method | Accuracy | Precision | Recall | F1 | AUC_ROC | AUC_PR |
|---|---|---|---|---|---|---|
| Resnik_BMA | 0.6957 |
| 0.3933 | 0.5638 | 0.8275 | 0.8779 |
| Lin_BMA | 0.7794 | 0.7911 | 0.7591 | 0.7747 | 0.8435 | 0.8434 |
| Wang_BMA | 0.7775 | 0.9265 | 0.6029 | 0.7304 | 0.8406 | 0.8815 |
| Pekar_BMA | 0.7739 | 0.9209 | 0.5993 | 0.7260 | 0.8449 | 0.8828 |
| InferSentPPI_noweight_PGAA | 0.9476 | 0.9371 | 0.9595 | 0.9481 | 0.9868 | 0.9884 |
| InferSentPPI_weight_PGAA |
| 0.9346 |
|
|
|
|
Performance comparison of six methods on the human dataset from HPRD.
| Method | Accuracy | Precision | Recall | F1 | AUC_ROC | AUC_PR |
|---|---|---|---|---|---|---|
| Resnik_BMA | 0.611 |
| 0.222 | 0.3633 | 0.7661 | 0.815,999 |
| Lin_BMA | 0.7129 | 0.6666 | 0.8518 | 0.7479 | 0.7918 | 0.785,539 |
| Wang_BMA | 0.6851 | 0.7941 | 0.5 | 0.6136 | 0.7475 | 0.799,019 |
| Pekar_BMA | 0.75 | 0.8859 | 0.574 | 0.6966 | 0.8024 | 0.791,993 |
| InferSentPPI_noweight_PGAA | 0.8796 | 0.8727 | 0.8888 | 0.8806 | 0.9540 | 0.9544 |
| InferSentPPI_weight_PGAA |
| 0.9166 |
|
|
|
|
FIGURE 5ROC curves of five PPI prediction methods on the main dataset. (A) Yeast dataset from DIP. (B) Human dataset from HPRD.
AUC_ROC of three GO Information-based methods on yeast and human dataset from STRING.
| Method | AUC_ROC | |
|---|---|---|
| STRING Yeast | STRING Human | |
| Onto2Vec | 0.7660 | 0.7593 |
| GO2Vec_mhd_goa | 0.8154 | 0.8046 |
| InferSentPPI_weight_PGAA |
|
|
Performance comparison of three methods on the DIP yeast and HPRD human datasets.
| Data | Method | Accuracy | Precision | Recall | F1 | AUC_ROC | AUC_PR |
|---|---|---|---|---|---|---|---|
| Human | DeepFE_PPI |
|
|
|
| — | — |
| InferSentPPI_noweight_PGAA | 0.8796 | 0.8727 | 0.8888 | 0.8806 | 0.9540 | 0.9544 | |
| InferSentPPI_weight_GOA | 0.9444 | 0.9166 | 0.9565 | 0.9361 |
|
| |
| Yeast | DeepFE_PPI | 0.944 |
| 0.9212 | 0.9426 | 0.9821 | 0.9854 |
| InferSentPPI_noweight_PGAA | 0.9476 | 0.9371 | 0.9595 | 0.9481 | 0.9868 | 0.9884 | |
| InferSentPPI_weight_GOA |
| 0.9346 |
|
|
|
| |
Performance (accuracy) of InferSentPPI on different independent datasets.
| Dataset | Accuracy |
|---|---|
| Yeast | 0.9522 |
|
| 0.95 |
|
| 0.8974 |
|
| 0.9073 |