| Literature DB >> 30020956 |
Linh Tran1,2, Tobias Hamp2, Burkhard Rost2,3.
Abstract
MOTIVATION: Protein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods.Entities:
Mesh:
Year: 2018 PMID: 30020956 PMCID: PMC6051629 DOI: 10.1371/journal.pone.0199988
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data sets extracted from BioGRID, DIP and IntAct.
Organism: latin name for eight model organisms sorted alphabetically; NPPIs: number of distinct physical pairs of protein-protein interactions extracted by merging the entire BioGRID, DIP, and IntAct; NPPIs with strong evidence: subset of previous column with reliable experimental evidence (according to [29]); NPPIs with strong evidence redundancy reduced: subset of previous column after removing sequence-similar pairs (HVAL > 20).
| Organism | NPPIs | NPPIs with | NPPIs with |
|---|---|---|---|
| 38,258 | 8,459 | 814 | |
| 23,105 | 5,229 | 818 | |
| 79,291 | 19,033 | 1,680 | |
| 27,119 | 8,587 | 998 | |
| 30,070 | 6,262 | 734 | |
| 4,792 | 1,312 | 239 | |
| 13,478 | 4,396 | 410 | |
| 6,698 | 1,574 | 236 | |
| 222,811 | 54,852 | 5,929 |
Whole interactome predictions.
For each organism investigated, we aggregated the data we used for training and testing, trained a final model and predicted the whole interactome of that organism. Organism: latin name for eight model organisms sorted alphabetically; Nprot: number of proteins in proteome (values taken from [28]); NpredPPI: subset of PPIs used for prediction in which both proteins are dissimilar to the proteins in the positive interactions of the training set; NprotPred: corresponding number of proteins for which NpredPPI interactions were predicted, see Eq 1 for calculation; NpredPPI novel: denotes the number of predicted PPIs for which both proteins are dissimilar to any known positive interaction, including redundant and low-quality PPIs; NpredPPI ProfPPIdb: subset with strongest predictions of previous column contained in our resource; Nprot ProfPPIdb: number of unique proteins in the PPIs published at https://rostlab.org/services/ppipair/, as well as https://figshare.com/collections/ProfPPI-DB/4141784.
| Organism | Nprot | NpredPPI | NpredPPI | NpredPPI | Nprot |
|---|---|---|---|---|---|
| 27,064 | 206,441,040 | 71,251,953 | 250,000 | 7,023 | |
| 20,137 | 142,171,953 | 83,301,778 | 200,000 | 7,041 | |
| 13,707 | 31,916,055 | 5,410,405 | 100,000 | 7,664 | |
| 4,306 | 2,729,616 | 332,520 | 40,000 | 1,341 | |
| 22,136 | 131,325,321 | 58,790,746 | 200,000 | 4,144 | |
| 5,159 | 9,041,878 | 5,622,981 | 50,000 | 3,576 | |
| 5,121 | 8,349,741 | 2,630,071 | 50,000 | 3,946 | |
| 21,330 | 211,922,578 | 174,929,160 | 200,000 | 9,265 | |
| 118,960 | 743,898,182 | 402,269,614 | 1,090,000 | 44,000 |
Summary of experimental evidences found in STRING [39].
organism: latin name for eight model organisms sorted alphabetically; NpredPPIs: number of PPIs of 1% ranked predictions; NEvidence: number of PPIs for which experimental evidences was found in at least on of the three databases used for training; NcorrectEvidence: number of PPIs with experimental evidence which were correctly classified by our approach; Accuracy: fraction of correct predictions within the predictions with experimental evidence.
| Organism | NpredPPIs | NEvidence | NcorrectEvidence | Accuracy |
|---|---|---|---|---|
| 250,000 | 1,138 | 1,138 | 100.00% | |
| 200,000 | 1,671 | 1,818 | 91.91% | |
| 100,000 | 1,763 | 2,049 | 86.04% | |
| 40,000 | 1,807 | 2,196 | 82.29% | |
| 200,000 | 0 | 0 | - | |
| 50,000 | 0 | 0 | -% | |
| 50,000 | 1,088 | 1,447 | 75.19% | |
| 200,000 | 0 | 0 | - | |
| 1,090,000 | 7,467 | 8,648 | 86.34% |