| Literature DB >> 33575553 |
Theodosios Theodosiou1, Nikolaos Papanikolaou1, Maria Savvaki1,2, Giulia Bonetto1, Stella Maxouri1,3, Eirini Fakoureli1, Aristides G Eliopoulos4, Nektarios Tavernarakis1,2, Grigoris D Amoutzias5, Georgios A Pavlopoulos6, Michalis Aivaliotis2,7,8, Vasiliki Nikoletopoulou2, Dimitris Tzamarias2, Domna Karagogeos1,2, Ioannis Iliopoulos1.
Abstract
The in-depth study of protein-protein interactions (PPIs) is of key importance for understanding how cells operate. Therefore, in the past few years, many experimental as well as computational approaches have been developed for the identification and discovery of such interactions. Here, we present UniReD, a user-friendly, computational prediction tool which analyses biomedical literature in order to extract known protein associations and suggest undocumented ones. As a proof of concept, we demonstrate its usefulness by experimentally validating six predicted interactions and by benchmarking it against public databases of experimentally validated PPIs succeeding a high coverage. We believe that UniReD can become an important and intuitive resource for experimental biologists in their quest for finding novel associations within a protein network and a useful tool to complement experimental approaches (e.g. mass spectrometry) by producing sorted lists of candidate proteins for further experimental validation. UniReD is available at http://bioinformatics.med.uoc.gr/unired/.Entities:
Year: 2020 PMID: 33575553 PMCID: PMC7671407 DOI: 10.1093/nargab/lqaa005
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Main steps of the UniReD methodology.
Sequences of primers for the yeast Gcn5p experiment
| ZWF1-S1 | GAAAGAGTAAATCCAATAGAATAGAAAACCACATAAGGCAAGcgtacgctgcaggtcgac |
| ZWF1-R3 | ATTTCAGTGACTTAGCCGATAAATGAATGTGCTTGCATTTTTCtcgatgaattcgagctcg |
| GCN5 (HATΔ) fw | CATCTTTCCATGGCTGTCATTAGGAAGCCATTGACTGTCGTAGGtttttgattccggtttctttg |
| GCN5 (HATΔ) rev | TTTAATATATCCCATCCATATACTTTTATCCAACGTGATTTCTTTagattcccgggtaataactg |
Figure 2.UniReD application. (A) UniReD’s landing page. The user may enter a mouse or human UniProt accession number and choose the granularity of the results (slidebar -MCL inflation value). (B) Cluster View: Each cluster contains the query protein and other associated proteins found in literature. Interactions which are found in external databases (BioGRID, DIP, HitPredict, UniProt) are marked. A protein may appear in more than one cluster. Cluster are ranked according GO similarity. (C) List View: All proteins predicted to interact with the query protein, along with information whether the PPIs are described in an external PPI database. Proteins are sorted according to the number of clusters they appear in. (D) KEGG pathways related to query protein.
Figure 3.Co-immunoprecipitation analysis of HEK293 co-transfected cells and mouse embryonic and adult brain tissue. (A–F) Direct interactions of CNTN2 with Sema6A (A and B), Neurofascin155 (NF155) (C and D) and Neurofascin140 (NF140) (E and F) in HEK293 co-transfected cells. Immunoprecipitation was performed with the monoclonal anti CNTN2 antibody 1c12. Western blot analysis of the lysates (Lys), G-beads used for the preclearance step (G) and immunoprecipitates (IP1c12) revealed the direct interaction of GFP-tagged CNTN2 with Sema6A-c-myc (B), NF155 (D) and NF140 (F). (G–L) Interaction of CNTN2 with MAP1B in mouse adult (G and H) and embryonic tissue (I and J), and Reelin (K and L) in embryonic tissue. Immunoprecipitation was performed with the rabbit polyclonal antibody against CNTN2, TG2. Western blot analysis of the lysates (Lys), G-beads used for the preclearance step (G) and immunoprecipitates (IPTG2) revealed the interaction of CNTN2 with MAP1B (H and J) and Reelin (L).
Figure 4.Whole brain lysates from E18 mouse embryos were fractionated to isolate mitochondria. The mitochondrial fraction was then used for co-immunoprecipitation experiments with antibodies against Qcr-1, a complex III mitochondrial protein and Necdin. Necdin physically interacts with Qcr-1, as shown.
Figure 5.Protein interaction network of ERCC4 (XPF-Q9QZD4) generated by String. This experimental study identified 306 putative PPIs that were reduced to 20 proteins after manual validation. UniReD was able to identify 18 out of 19 (94.5%) proteins selected by the researchers for experimental verification as interacting partners of XPF. One (Taf15-Q8BQ46) was excluded from the UniReD results, as UniReD requires that all proteins are reviewed and excludes large-scale experiment publications. The symbols in the inset box explain how proteins were found in UniReD. PF means protein family and means that UniReD detected proteins of the same family with the identified interacting partner. Human indicates that UniReD detected Human homologs of the interacting partner.
UniReD evaluation using different databases
| Database (DB) | Organism | Coverage |
|---|---|---|
|
| Human | 60.06% |
|
| Mouse | 68.06% |
|
| Human | 55.34% |
|
| Mouse | 66.95% |
|
| Human | 53.21% |
|
| Mouse | 96.83% |
|
| Human | 77.59% |
|
| Mouse | 76.62% |
|
| Human | 73.48% |
|
| Human | 81.57% |
|
| Human | 73.27% |
Two datasets were used: i) the Lit-BM dataset, which is a highly curated human interactome network and ii) the PrePPI dataset (the Human High Confidence set—interactions supported by at least two publications prior to August 2010 - https://honiglab.c2b2.columbia.edu/PrePPI/ref/data/human.db.hc.201008.intm). Last column represents the same evaluation against the STRING database, using its medium and high confidence text mining evidence.
Topological analysis and comparison between UniReD and STRING networks (human and mouse)
| MOUSE | ||||||||
|---|---|---|---|---|---|---|---|---|
| Inflation | Nodes | Edges | Centralization | Average #neighbors | Density | Heterogeneity | Clustering coefficient | |
| UniReD | 2.0 | 13 608 | 7 208 519 | 0.59 | 1059 | 0.078 | 1.21 | 0.742 |
| UniReD | 2.2 | 13 565 | 5 238 233 | 0.53 | 772 | 0.057 | 1.33 | 0.724 |
| UniReD | 2.5 | 13 487 | 3 335 666 | 0.44 | 494 | 0.037 | 1.48 | 0.693 |
| UniReD | 2.8 | 13 413 | 2 ,397 365 | 0.38 | 357 | 0.027 | 1.59 | 0.667 |
| UniReD | 3.0 | 13 357 | 1 982 808 | 0.36 | 296 | 0.022 | 1.67 | 0.653 |
| BioGRID | 7332 | 26 060 | 0.23 | 6 | 0.001 | 5.32 | 0.130 | |
| STRING | 21 291 | 5 972 403 | 0.27 | 561 | 0.026 | 0.92 | 0.245 | |
| HUMAN | ||||||||
| Inflation | Nodes | Edges | Centralization | Average #neighbors | Density | Heterogeneity | Clustering coefficient | |
| UniReD | 2.0 | 16 318 | 6 711 598 | 0.56 | 822 | 0.050 | 1.21 | 0.669 |
| UniReD | 2.2 | 16 253 | 4 452 816 | 0.52 | 547 | 0.034 | 1.33 | 0.643 |
| UniReD | 2.5 | 16 137 | 2 593 089 | 0.45 | 321 | 0.026 | 1.52 | 0.612 |
| UniReD | 2.8 | 16 031 | 1 759 127 | 0.39 | 219 | 0.014 | 1.66 | 0.594 |
| UniReD | 3.0 | 15 952 | 1 423 064 | 0.35 | 178 | 0.011 | 1.73 | 0.584 |
| BioGRID | 17 793 | 475 919 | 0.16 | 39 | 0.002 | 2.22 | 0.121 | |
| STRING | 19 354 | 5 879 727 | 0.36 | 607 | 0.031 | 0.87 | 0.205 | |
Comparison of the STRING database PPI predictions based on text mining evidence against the set of human PPI databases UniReD was compared for evaluation purposes
| Confidence | |||
|---|---|---|---|
| STRING DB | Medium (0.4) | High (0.7) | Highest (0.9) |
|
| 31.34% (18264/58259) | 11.99% (6991/58259) | 2.74% (1600/58259) |
|
| 63.38% (4118/6497) | 36.44% (2368/6497) | 11.95% (777/6497) |
|
| 46.7% (4626/9904) | 22.49% (2228/9904) | 7.4% (733/9904) |
|
| 32.83% (20149/61362) | 12.87% (7898/61362) | 3.07% (1886/61362) |
|
| 58.49% (2583/4416) | 33.67% (1487/4416) | 11.91% (526/4416) |
|
| 31.93% (2080/6513) | 18.68% (1217/6513) | 7.04% (459/6513) |
|
| 47.46% (10007/21083) | 23.62% (4980/21083) | 6.59% (1390/21083) |
Medium, high and highest confidence imply a score higher than 0.4, 0.7 and 0.9, respectively. The percentage corresponds to the coverage, the numbers inside the parenthesis are the number of PPIs common in both DBs and the total PPIs of each DB.
Comparison of the STRING database PPI predictions based on text mining evidence against the set of mouse PPI databases UniReD was compared for evaluation purposes
| Confidence | ||
|---|---|---|
| STRING DB | Medium (0.4) | High (0.7) |
|
| 71.81% (158/220) | 48.63% (107/220) |
|
| 35.66% (2363/6625) | 15.87% (1052/6625) |
|
| 48.7% (581/1193) | 25.56% (305/1193) |
|
| 34.35% (3635/10580) | 15.19% (1608/10580) |
Medium, high and highest confidence imply a score higher than 0.4, 0.7 and 0.9, respectively. The percentage corresponds to the coverage, the numbers inside the parenthesis are the number of PPIs common in both DBs and the total PPIs of each DB.
Comparison with Negatome 2.0
| Organism | Negatome pairs (combined) | UniReD pairs | UniReD pairs found in Negatome | UniReD pairs in Negatome but not in STRING | False positive ratio |
|---|---|---|---|---|---|
| Human | 1526 | 2593090 | 756 | 138 | 15.1% |
| Mouse | 411 | 3335667 | 239 | 52 | 23.2% |
FPR = (UN-UNS)/(N-UNS)
UN = Common pairs between UniReD & Negatome
UNS = Common pairs between UniReD & Negatome & STRING
N = Negatome pairs
This analysis resulted in a false positive rate of 15.1% for humans and 23.2% for mice.