| Literature DB >> 31302723 |
Xichun Wang1, Sergio Branciamore1, Grigoriy Gogoshin1, Andrei S Rodin2.
Abstract
Recent developments in sequencing and growth of bioinformatics resources provide us with vast depositories of protein network and single nucleotide polymorphism data. It allows us to re-examine, on a larger and more comprehensive scale, the relationship between protein-protein interactions and protein variability and evolutionary rates. This relationship has remained far from unambiguously resolved for quite a long time, reflecting shifting analysis approaches in the literature, and growing data availability. In this study, we utilized several public genomic databases to investigate this relationship in human, mouse, pig, chicken, and zebrafish. We observed strong non-linear relationship patterns (tending towards convex decreasing function shapes) between protein variability and the density of corresponding protein-protein interactions across all five species. To investigate further, we carried out stochastic simulations, modeling the interplay between protein connectivity and variability. Our results indicate that a simple negative linear correlation model, often suggested (or tacitly assumed) in the literature, as either a null or an alternative hypothesis, is not a good fit with the observed data. After considering different (but still relatively simple, and not overfitting) simulation models, we found that a convex decreasing protein variability-connectivity function (specifically, exponential decay) led to a much better fit with the real data. We conclude that simple correlation models might be inadequate for describing protein variability-connectivity interplay in vertebrates; they often tend towards false negatives (showing no more than marginal linear or rank correlation where there are in fact strong non-random patterns).Entities:
Keywords: PPI; Protein connectivity; Protein evolutionary rates; Protein variability; Protein–protein interactions; Stochastic computer simulations
Mesh:
Year: 2019 PMID: 31302723 PMCID: PMC6658588 DOI: 10.1007/s00239-019-09899-z
Source DB: PubMed Journal: J Mol Evol ISSN: 0022-2844 Impact factor: 2.395
Fig. 5Scatter plots of observed human data from STRING database (red dots) superimposed on the simulation results (green dots). a Intraspecific variability data (red dots); protein variability and connectivity are modeled independently (green dots). b Intraspecific variability data (red dots); protein variability and protein connectivity are linked via negative linear function (green dots). c Intraspecific variability data (red dots); protein variability and connectivity are linked via exponential decay function (green dots). d Human/chimpanzee ortholog protein dN values (red dots); protein variability and connectivity are linked via exponential decay function (green dots). e Human/chimpanzee ortholog protein dS values (red dots); protein variability and connectivity are linked via exponential decay function (green dots) (Color figure online)
Fig. 1Density plots (left panes) and 3D surface plots (right panes) of human intraspecific protein variability versus protein connectivity. a STRING data. b Reactome “Direct Complex” data. c Reactome “Indirect Complex” data. d Reactome “Reaction” data. e Reactome “Neighboring Reaction” data. f APID data. Straight lines in the left panes depict fitted linear models. Spearman (non-parametric, rank) correlation coefficient/statistical significance (ρ and P value) are shown for each plot
Fig. 3Density plots (left panes) and 3D surface plots (right panes) of mouse, pig, chicken, and zebrafish intraspecific protein variability versus protein connectivity (STRING data)
Fig. 2Density plots of human intraspecific protein variability versus protein connectivity, shown in the log–log scale to highlight the low variability–low connectivity areas. a STRING data. b Reactome “Direct Complex” data. c Reactome “Indirect Complex” data. d Reactome “Reaction” data. e Reactome “Neighboring Reaction” data. f APID data
Spearman correlation coefficient and statistical significance (ρ and P value) for protein variability–connectivity relationships shown in Figs. 1, 3 and Supplemental Figs. 2, 4, 6, 9
| ρ | ||
|---|---|---|
| Human intraspecific protein variability versus protein connectivity (Fig. | ||
| STRING | − 0.0092 | 0.2472 |
| Reactome direct complex | − 0.0706 | 0.0024 |
| Reactome indirect complex | − 0.0489 | 0.0247 |
| Reactome reaction | − 0.0141 | 0.4426 |
| Reactome neighboring reaction | 0.0694 | 8.8695e−05 |
| APID | 0.0383 | 1.6330e−06 |
| Human/chimpanzee ortholog protein dN/dS ratio versus protein connectivity (Supplemental Fig. 2) | ||
| STRING | − 0.1058 | 7.5334e−39 |
| Reactome direct complex | − 0.0226 | 0.3475 |
| Reactome indirect complex | − 0.0527 | 0.0184 |
| Reactome reaction | − 0.0139 | 0.4592 |
| Reactome neighboring reaction | 0.0076 | 0.6744 |
| APID | − 0.0750 | 2.5481e−20 |
| Human/chimpanzee ortholog protein dN versus protein connectivity (Supplemental Fig. 4) | ||
| STRING | − 0.0639 | 3.5349e−15 |
| Reactome direct complex | − 0.0279 | 0.2464 |
| Reactome indirect complex | − 0.0403 | 0.0714 |
| Reactome reaction | − 0.0226 | 0.2298 |
| Reactome neighboring reaction | 0.0110 | 0.5407 |
| APID | − 0.0388 | 1.8759e−06 |
| Human/chimpanzee ortholog protein dS versus protein connectivity (Supplemental Fig. 6) | ||
| STRING | − 0.0168 | 0.0393 |
| Reactome direct complex | − 0.0338 | 0.1606 |
| Reactome indirect complex | − 0.0214 | 0.3386 |
| Reactome reaction | − 0.0167 | 0.3765 |
| Reactome neighboring reaction | 0.0007 | 0.9677 |
| APID | − 0.0127 | 0.1194 |
| Mouse, pig, chicken, and zebrafish intraspecific protein variability versus protein connectivity (Fig. | ||
| Mouse | − 0.0453 | 2.5181e−07 |
| Pig | 0.0041 | 0.8967 |
| Chicken | − 0.0262 | 0.45907 |
| Zebrafish | − 0.0308 | 0.1566 |
| Mouse/rat ortholog protein variability versus protein connectivity (Supplemental Fig. 9) | ||
| dN/dS ratio | − 0.1638 | 3.7644e−92 |
| dN | − 0.1590 | 7.3377e−87 |
| dS | − 0.1036 | 1.2849e−37 |
Fig. 4Density plots (left panes) and 3D surface plots (right panes) of simulated protein variability versus protein connectivity. a protein variability and connectivity are modeled independently (see “first modeling scenario” in Methods). b protein variability and connectivity are linked via negative linear function (second modeling scenario). c protein variability and connectivity are linked via exponential decay function (third modeling scenario)
Energy Distance (ED) and Earth Mover’s Distance (EMD) between the observed (human STRING data) and simulated distributions (a =− 0.004, b =1 for the negative linear V–C function, k =1 for the exponential decay V–C function), averaged over 100 simulation replications for ED and 5 simulation replications for EMD
| Exponential decay | Negative linear | V–C uncoupled | |
|---|---|---|---|
| ED | |||
STRING Intraspecific variability | 231.70 (Fig. | 584.07 (Fig. | 576.85 (Fig. |
STRING dN | 221.77 (Fig. | 570.20 | 536.28 |
STRING dS | 227.82 (Fig. | 597.37 | 562.61 |
| EMD | |||
STRING Intraspecific variability | 116.11 (Fig. | 360.83 (Fig. | 336.99 (Fig. |
STRING dN | 126.43 (Fig. | 507.96 | 473.25 |
STRING dS | 132.03 (Fig. | 373.90 | 351.21 |