| Literature DB >> 31996697 |
Kevin Dick1, Bahram Samanfar2,3, Bradley Barnes1, Elroy R Cober2, Benjamin Mimee4, Le Hoa Tan2, Stephen J Molnar2, Kyle K Biggar3, Ashkan Golshani3,5, Frank Dehne6, James R Green7.
Abstract
The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter- and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter- and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) inter-species predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross- and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas.Entities:
Mesh:
Year: 2020 PMID: 31996697 PMCID: PMC6989690 DOI: 10.1038/s41598-019-56895-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Inter- and Cross-Species Prediction Schemas. (A) Cross-species schema where the intra-species PPIs of a well-studied organism can be used as a proxy to predict the comprehensive interactome of another under-studied, but evolutionarily close, organism. (B) Typical inter-species prediction schema where PPIs within and/or between two well-studied organisms are used to predict the comprehensive interactome between the two organisms to identify novel putative PPIs. (C) Combination of the cross- and inter-species prediction schema where the intra-species PPIs of two well-studied organisms are used to train a model capable of predicting the comprehensive inter-species interactome of two under-studied organisms. N indicates the approximate size of the proteome.
Figure 2Visual Representation of Mathematical Notation. The comparison of two windows between two arbitrary proteins ( and ) yeilds two disjoint (inter-species) sets comprising proteins with similar subsequences to those windows. Grey proteins linked with arrows indicate known PPIs.
Figure 3Estimated Evolutionary Divergence Timeline.
Intra-Species Benchmark Results on a Medium Sized Cluster.
| Benchmark Measure | H. sapiens | A. thaliana | S. cerevisiae | |||
|---|---|---|---|---|---|---|
| PIPE4 | PIPE3 | PIPE4 | PIPE3 | PIPE4 | PIPE3 | |
| Database Size (GB) | 18 | 1.6 | 5.6 | 0.7 | 1.9 | 0.2 |
| Database Processing (s) | 3191 | 3194 | 1358 | 1325 | 263 | 255 |
| Predicted Positive Pair (s) | 0.0155 | 0.7700 | 0.0061 | 0.1103 | 0.0113 | 0.1280 |
| Predicted Negative Pair (s) | 0.0084 | 0.4447 | 0.0049 | 0.0615 | 0.0054 | 0.0448 |
| All-to-All Prediction (h) | 3.3 | 175.9 | 1.4 | 17.6 | 0.2 | 2.0 |
| Landscape Generation (s) | 0.0056 | 0.4405 | 0.0022 | 0.0586 | 0.0029 | 0.0427 |
| Total Speedup (~x) | ||||||
| Landscape Generation (~x) | ||||||
| Proteome Size, n | 20,236 | 17,226 | 6,721 | |||
All experiments run using 18 nodes with 8 threads/node.
Figure 4Example ROC and PR Curves of Many-to-One and One-to-Many Experiments. (A) depicts the ROC performance when using several organisms to evaluate Mouse PPIs. (B) depicts ROC performance using evolutionarily proximal Mouse to predict Human PPIs. (C) ranks PR performance when predicting Human PPIs when training on another organism.
Figure 5Reciprocal Perspective Increase in AUPRC using One-to-Many Cross-Species Predictions on PIPE4.
Figure 6Comparison of PR Curves between PIPE3, PIPE4, SPRINT, and SPPS with on the H. sapiens-HIV1 Inter-Species Interactions.