| Literature DB >> 30076341 |
Abstract
All protein-protein interaction (PPI) predictors require the determination of an operational decision threshold when differentiating positive PPIs from negatives. Historically, a single global threshold, typically optimized via cross-validation testing, is applied to all protein pairs. However, we here use data visualization techniques to show that no single decision threshold is suitable for all protein pairs, given the inherent diversity of protein interaction profiles. The recent development of high throughput PPI predictors has enabled the comprehensive scoring of all possible protein-protein pairs. This, in turn, has given rise to context, enabling us now to evaluate a PPI within the context of all possible predictions. Leveraging this context, we introduce a novel modeling framework called Reciprocal Perspective (RP), which estimates a localized threshold on a per-protein basis using several rank order metrics. By considering a putative PPI from the perspective of each of the proteins within the pair, RP rescores the predicted PPI and applies a cascaded Random Forest classifier leading to improvements in recall and precision. We here validate RP using two state-of-the-art PPI predictors, the Protein-protein Interaction Prediction Engine and the Scoring PRotein INTeractions methods, over five organisms: Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, and Mus musculus. Results demonstrate the application of a post hoc RP rescoring layer significantly improves classification (p < 0.001) in all cases over all organisms and this new rescoring approach can apply to any PPI prediction method.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30076341 PMCID: PMC6076239 DOI: 10.1038/s41598-018-30044-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1One-to-All Score Curve over Seven Example Yeast Proteins. The rank order distribution of all predicted scores with a given protein (i.e. one-to-all) are compared against the selected decision threshold (Global Cutoff), depicted as the grey line (score = 83.84) and determined by leave-one-out cross-validation at a specificity of 99.95%. Despite being extremely conservative, this threshold is clearly inappropriate for certain example proteins, namely YGL122C (the Nuclear polyadenylated RNA-binding protein, NAB2) as ~5,800 of the 6,717 putative protein interactions are considered positives. Conversely, YLR013W, YKL102C, YDR521W, and YLL066W-A are all predicted to have no true interactions at this threshold. Only YNL255C and YPL178W appear well matched to the global threshold with ~400 and 9 positively predicted interactions respectively, reflecting the diversity in the number of true interactions.
Figure 2One-to-All Score Curve of YJL124C with fitted LOESS curve and corresponding first (B) and second derivative (C) curves. Arrow indicates the knee of the curve.
Reciprocal Perspective Features for Protein-Protein Interaction Prediction.
| Feature Name | Notation | Feature Type | Description |
|---|---|---|---|
| Rank-XY |
| Rank | The rank order of Protein Y among all of the predictions for Protein X |
| Rank-YX |
| Rank | The rank order of Protein X among all of the predictions for Protein Y |
| Naïve Rank Order |
| Rank | As defined in Reciprocal Perspective Notation |
| Normalized Rank Order (Proteome X) |
| Rank | As defined in Reciprocal Perspective Notation |
| Normalized Rank Order (Proteome Y) |
| Rank | As defined in Reciprocal Perspective Notation |
| Adjusted Rank Order |
| Rank | As defined in Reciprocal Perspective Notation |
| Rank-Local-Cutoff-X |
| Rank | Rank order of the protein nearest to the local cutoff value of Protein X |
| Score-Local-Cutoff-X |
| Score | Score at the local cutoff value of Protein X |
| Rank-Local-Cutoff-Y |
| Rank | Rank order of the protein nearest to the local cutoff value of Protein Y |
| Score-Local-Cutoff-Y |
| Score | Score at the local cutoff value of Protein Y |
| Interaction-XY-Above-Local-X |
| Rank | Binary variable indicating whether the interaction XY is above the local cutoff of protein X |
| Interaction-YX-Above-Local-Y |
| Rank | Binary variable indicating whether the interaction YX is above the local cutoff of protein Y |
| Above-Global-Threshold |
| Score | Binary variable indicating whether the Original Score is greater than the globally determined cutoff value |
| Fold-Difference-From-Local-X |
| Fold | As defined in Reciprocal Perspective Notation |
| Fold-Difference-From-Local-Y |
| Fold | As defined in Reciprocal Perspective Notation |
A simplified feature name is defined; the notation is kept consistent with that of section Reciprocal Perspective Notation.
Summary of Training Data for the Five Organisms.
| Organism | Number of Proteins | Number of Training PPIs |
|---|---|---|
|
| 20,160 | 13,938 |
|
| 6,717 | 74,608 |
|
| 16,886 | 3,027 |
|
| 6,443 | 7,923 |
|
| 17,759 | 2,938 |
Figure 3Precision-Recall Curves of Random Forest Results. Subsets of features were used to produce each curve, highlighting their contribution to the complete feature set.
Summary of PRC-AUC and ROC-AUC (μ ± SE) following 1,000 Bootstrap Iterations for each Method, Organism, and Feature Set.
| Method | Organism | Features | PRC-AUC | ROC-AUC |
|---|---|---|---|---|
| PIPE |
| Original | 0.3915 ± 0.0002 | 0.8737 ± 0.0001 |
| RP-Enhanced | 0.4779 ± 0.0005 | 0.9510 ± 0.0001 | ||
|
| Original | 0.3155 ± 0.0001 | 0.8442 ± 0.0001 | |
| RP-Enhanced | 0.3358 ± 0.0001 | 0.9044 ± 0.0001 | ||
|
| Original | 0.2475 ± 0.0005 | 0.8351 ± 0.0004 | |
| RP-Enhanced | 0.5169 ± 0.0087 | 0.9815 ± 0.0001 | ||
|
| Original | 0.3141 ± 0.0005 | 0.8685 ± 0.0003 | |
| RP-Enhanced | 0.4250 ± 0.0024 | 0.9400 ± 0.0002 | ||
|
| Original | 0.2871 ± 0.0006 | 0.8386 ± 0.0005 | |
| RP-Enhanced | 0.4974 ± 0.0038 | 0.9806 ± 0.0001 | ||
| SPRINT |
| Original | 0.3432 ± 0.0001 | 0.8375 ± 0.0001 |
| RP-Enhanced | 0.5001 ± 0.0005 | 0.9653 ± 0.0001 | ||
|
| Original | 0.2637 ± 0.0001 | 0.7732 ± 0.0001 | |
| RP-Enhanced | 0.2935 ± 0.0001 | 0.8995 ± 0.0001 | ||
|
| Original | 0.2367 ± 0.0004 | 0.8204 ± 0.0004 | |
| RP-Enhanced | 0.4430 ± 0.0088 | 0.9822 ± 0.0001 | ||
|
| Original | 0.3101 ± 0.0004 | 0.8680 ± 0.0003 | |
| RP-Enhanced | 0.3700 ± 0.0023 | 0.9267 ± 0.0002 | ||
|
| Original | 0.2820 ± 0.0005 | 0.8346 ± 0.0004 | |
| RP-Enhanced | 0.4909 ± 0.0053 | 0.9826 ± 0.0001 |