| Literature DB >> 36011253 |
Katrisa M Ward1, Brandon D Pickett1, Mark T W Ebbert2,3,4, John S K Kauwe1, Justin B Miller2,3,5.
Abstract
Protein-protein functional interactions arise from either transitory or permanent biomolecular associations and often lead to the coevolution of the interacting residues. Although mutual information has traditionally been used to identify coevolving residues within the same protein, its application between coevolving proteins remains largely uncharacterized. Therefore, we developed the Protein Interactions Calculator (PIC) to efficiently identify coevolving residues between two protein sequences using mutual information. We verified the algorithm using 2102 known human protein interactions and 233 known bacterial protein interactions, with a respective 1975 and 252 non-interacting protein controls. The average PIC score for known human protein interactions was 4.5 times higher than non-interacting proteins (p = 1.03 × 10-108) and 1.94 times higher in bacteria (p = 1.22 × 10-35). We then used the PIC scores to determine the probability that two proteins interact. Using those probabilities, we paired 37 Alzheimer's disease-associated proteins with 8608 other proteins and determined the likelihood that each pair interacts, which we report through a web interface. The PIC had significantly higher sensitivity and residue-specific resolution not available in other algorithms. Therefore, we propose that the PIC can be used to prioritize potential protein interactions, which can lead to a better understanding of biological processes and additional therapeutic targets belonging to protein interaction groups.Entities:
Keywords: Alzheimer’s disease; coevolution; mutual information; protein interactions; protein interactions calculator; proteomics
Mesh:
Substances:
Year: 2022 PMID: 36011253 PMCID: PMC9407263 DOI: 10.3390/genes13081346
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1The results of different statistical tests when comparing protein pairs that are known to interact versus those that have not been proven to interact to show trends in the data for the vertebrate dataset. (A) A heatmap of the best possible precision scores obtained by comparing the known-to-interact and unproven-to-interact scores for each filter combination while still maintaining a recall of at least 20%. (B) A heatmap of the values of the area under the curve from comparing the known-to-interact and unproven-to-interact scores for each filter combination. (C) A heatmap of the negative log of the p-values from a two-sample t-test by comparing the known-to-interact and unproven-to-interact scores for each filter combination. Note that while each combination returned a significant p-value, some combinations were more significant than others. (D) A graph of all precision and recall values for the results from a min of 0.17 and a minimum percent above random for of 35%, which were determined to be the optimal filters due to the resulting high precisions. (E) The receiver operating characteristic (ROC) curve from comparing the known-to-interact and unproven-to-interact scores for each filter combination.
The different probability thresholds for mutual information scores for vertebrates.
| Vertebrate Mutual Information Score Thresholds | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 480.76 | 151.52 | 79.29 | 45.61 | 30.18 | 20.99 | 15.04 | 10.93 | 7.85 | 6.17 | 4.62 | 3.59 |
| % chance of | 99.68 | 98.43 | 97.32 | 95.85 | 92.23 | 89.40 | 85.31 | 80.93 | 75.51 | 69.99 | 63.81 | 55.96 |
The different probability thresholds for mutual information scores for bacteria.
| Bacteria Mutual Information Score Thresholds | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 44.40 | 32.01 | 23.93 | 20.14 | 18.26 | 17.03 | 15.00 | 13.19 | 11.07 | 10.43 | 9.46 | 8.34 | 7.09 |
| % chance of | 98.75 | 96.56 | 75.58 | 95.09 | 90.45 | 89.89 | 84.70 | 79.83 | 75.61 | 72.36 | 66.89 | 61.25 | 56.09 |
Runtimes in minutes for three different calculators, including our own, on 5 different randomly selected proteins. Though our calculator is designed for intra-protein interactions, it can easily be used for inter-protein interactions. The reverse is not true for the other calculators. Hence, this test was conducted on inter-protein relationships. Note that runtimes are approximate and based on email notifications of when jobs started and finished for BIS2Analyzer and MISTIC2.
| Protein | BIS2 Analyzer Runtime (Minutes) | MISTIC2 Runtime (Minutes) | PIC Runtime (Minutes) |
|---|---|---|---|
| LMNA | 3 | 20 | 0.43 |
| CDK2 | 0.5 | 3 | 0.12 |
| RB1 | 4 | 36 | 0.62 |
| AR | 2.5 | 30 | 0.65 |
| ELL | 13 | 6 | 0.13 |