| Literature DB >> 29257115 |
Gang Hu1, Zhonghua Wu2, Vladimir N Uversky3,4, Lukasz Kurgan5.
Abstract
Some of the intrinsically disordered proteins and protein regions are promiscuous interactors that are involved in one-to-many and many-to-one binding. Several studies have analyzed enrichment of intrinsic disorder among the promiscuous hub proteins. We extended these works by providing a detailed functional characterization of the disorder-enriched hub protein-protein interactions (PPIs), including both hubs and their interactors, and by analyzing their enrichment among disease-associated proteins. We focused on the human interactome, given its high degree of completeness and relevance to the analysis of the disease-linked proteins. We quantified and investigated numerous functional and structural characteristics of the disorder-enriched hub PPIs, including protein binding, structural stability, evolutionary conservation, several categories of functional sites, and presence of over twenty types of posttranslational modifications (PTMs). We showed that the disorder-enriched hub PPIs have a significantly enlarged number of disordered protein binding regions and long intrinsically disordered regions. They also include high numbers of targeting, catalytic, and many types of PTM sites. We empirically demonstrated that these hub PPIs are significantly enriched among 11 out of 18 considered classes of human diseases that are associated with at least 100 human proteins. Finally, we also illustrated how over a dozen specific human hubs utilize intrinsic disorder for their promiscuous PPIs.Entities:
Keywords: hub proteins; human proteome; intrinsic disorder; intrinsically disordered proteins; protein-protein interactions
Mesh:
Substances:
Year: 2017 PMID: 29257115 PMCID: PMC5751360 DOI: 10.3390/ijms18122761
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Enrichment in intrinsic disorder of human hub proteins and their interactors. The x- and y-axis show the amount of disorder content of the hubs and hub interactors, respectively. Each protein-protein interaction (PPI) is mapped into this two-dimensional plane and the density of these hub-interactor pairs is represented by green isolines. For instance, 40% of these pairs occupy the lower left corner where the disorder content of both hubs and interactors is below 0.25. The density was modelled with the Epanechnikov kernel function using Mathematica software. Next, we simulated a randomized PPI network that follows the same distribution of node density, i.e., we randomly assigned interactions between the human proteins to maintain the same density profile as in the true PPI network. Coloring of the inside of the two-dimensional plane reflects a relative ratio between the density of true (dn) and randomized (dr) interactions in the PPI networks calculated as [dn(x,y)-dr(x,y)]/dr(x,y). The color scale given on the right defines values of the ratio, e.g., orange corresponds to PPIs which are 0.5 times more frequent in the true PPI network compared to the random network.
Analysis of enrichment in functional and structural characteristics of proteins. We compare hubs/hub interactors/hub interactors that exclude hubs that are associated with the disorder-enriched hub protein-protein interactions (PPIs) that have disorder content higher than expected by over 50% vs. the remaining hubs/hub interactors/hub interactors that exclude hubs. To ensure that results are statistically robust to represent diverse subpopulations of these proteins, we select 20% of proteins at random from each of the two protein sets, quantify a given characteristic for each set, and repeat this ten times. We report the average of the ten repetitions and the relative difference between averages for the two protein sets. We also evaluate the significance of the differences between these measurements. We use the t-test if the data are normal (we test normality with the Anderson-Darling test at p-value = 0.05); otherwise we use the Wilcoxon test. Bold font indicates large differences that are statistically significant (p-value < 0.001 and |relative difference| ≥ 20%).
| Type of Functional/Structural Characteristic 1 | Measure 1 | Results for Hubs Involved in Hub PPIs | Results for all Hub Interactors Involved in Hub PPIs | Results for Hub Interactors Involved in Hub PPIs that Exclude Hubs | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg for Disorder-Enriched Proteins | Avg for Remaining Proteins | Relative Difference | Avg for Disorder-Enriched Proteins | Avg for Remaining Proteins | Relative Difference | Avg for Disorder-Enriched Proteins | Avg for Remaining Proteins | Relative Difference | |||||
| Structural properties | Disorder content | 0.542 | 0.224 | <0.001 | 141% | 0.540 | 0.227 | <0.001 | 138% | 0.527 | 0.213 | <0.001 | 148% |
| Number of LIDPRs (per 1000 AAs) | 4.428 | 1.900 | <0.001 | 133% | 4.436 | 1.910 | <0.001 | 132% | 4.863 | 1.954 | <0.001 | 149% | |
| Number of disulfide links (per 1000 AAs) | 0.765 | 1.240 | <0.001 | −62% | 0.764 | 1.249 | <0.001 | −63% | 0.973 | 1.828 | <0.001 | −47% | |
| Sequence | Average protein chain length | 736.8 | 637.6 | <0.001 | 16% | 734.5 | 653.2 | <0.001 | 12% | 791.5 | 654.2 | <0.001 | 21% |
| Evolutionary conservation | Average conservation per protein | 0.920 | 1.069 | <0.001 | −16% | 0.930 | 1.053 | <0.001 | −13% | 0.965 | 1.048 | <0.001 | −8% |
| Average conservation of LIDPRs | 0.735 | 0.742 | 0.113 | −1% | 0.762 | 0.763 | 0.811 | 0% | 0.790 | 0.455 | <0.001 | 74% | |
| Functional regions | MoRF content | 0.014 | 0.008 | <0.001 | 68% | 0.014 | 0.008 | <0.001 | 67% | 0.018 | 0.012 | <0.001 | 53% |
| Number of MoRF regions (per 1000 AAs) | 1.843 | 1.090 | <0.001 | 69% | 1.834 | 1.096 | <0.001 | 67% | 2.398 | 1.577 | <0.001 | 52% | |
| Disordered protein binding (DPB) content | 0.230 | 0.092 | <0.001 | 149% | 0.231 | 0.093 | <0.001 | 148% | 0.201 | 0.074 | <0.001 | 171% | |
| Number of DPB regions (per 1000 AAs) | 8.764 | 3.665 | <0.001 | 139% | 8.784 | 3.663 | <0.001 | 140% | 8.402 | 3.270 | <0.001 | 157% | |
| Functional motifs (ELMs) | Proteolytic cleavage sites (per 1000 AAs) | 0.023 | 0.033 | 0.017 | −43% | 0.022 | 0.032 | 0.039 | −46% | 0.000 | 0.003 | 0.044 | −89% |
| Degradation sites (per 1000 AAs) | 0.014 | 0.019 | 0.051 | −31% | 0.014 | 0.018 | 0.016 | −32% | 0.003 | 0.004 | 0.342 | −40% | |
| Docking sites for catalysis (per 1000 AAs) | 0.137 | 0.074 | <0.001 | 85% | 0.137 | 0.074 | <0.001 | 85% | 0.010 | 0.005 | 0.031 | 88% | |
| Non-catalytic ligand binding sites (per 1000 AAs) | 0.278 | 0.240 | 0.026 | 15% | 0.257 | 0.234 | 0.061 | 10% | 0.119 | 0.043 | <0.001 | 174% | |
| PTM sites (per 1000 AAs) | 0.180 | 0.084 | <0.001 | 114% | 0.183 | 0.087 | <0.001 | 110% | 0.028 | 0.013 | 0.022 | 121% | |
| Targeting sites for localization (per 1000 AAs) | 0.149 | 0.053 | <0.001 | 182% | 0.146 | 0.058 | <0.001 | 151% | 0.000 | 0.008 | <0.001 | −100% | |
| PTMs | Number of PTMs (per 1000 AAs) | 413.3 | 275.3 | <0.001 | 50% | 412.5 | 275.8 | <0.001 | 50% | 410.4 | 277.9 | <0.001 | 48% |
| Number of residues with PTMs (per 1000 AAs) | 325.9 | 226.5 | <0.001 | 44% | 325.7 | 227.1 | <0.001 | 43% | 327.3 | 230.0 | <0.001 | 42% | |
| Acetylation sites (per 1000 AAs) | 9.603 | 7.954 | <0.001 | 21% | 9.481 | 7.919 | <0.001 | 20% | 8.951 | 7.759 | <0.001 | 15% | |
| ADP-ribosylation sites (per 1000 AAs) | 32.02 | 18.52 | <0.001 | 73% | 31.69 | 18.39 | <0.001 | 72% | 30.97 | 17.50 | <0.001 | 77% | |
| Amidation sites (per 1000 AAs) | 48.67 | 36.64 | <0.001 | 33% | 49.05 | 36.59 | <0.001 | 34% | 49.62 | 39.75 | <0.001 | 25% | |
| Carboxylation sites (per 1000 AAs) | 39.24 | 23.42 | <0.001 | 68% | 39.22 | 23.43 | <0.001 | 67% | 38.22 | 22.28 | <0.001 | 72% | |
| C-linked_glycosylation sites (per 1000 AAs) | 0.200 | 0.261 | <0.001 | −31% | 0.212 | 0.287 | <0.001 | −35% | 0.262 | 0.331 | 0.004 | −21% | |
| Farnesylation sites (per 1000 AAs) | 0.007 | 0.008 | 0.303 | −21% | 0.006 | 0.008 | 0.059 | −30% | 0.008 | 0.021 | 0.083 | −61% | |
| Geranylgeranylation sites (per 1000 AAs) | 0.008 | 0.010 | 0.059 | −32% | 0.007 | 0.009 | 0.045 | −40% | 0.011 | 0.031 | 0.226 | −63% | |
| GPI anchor amidation sites (per 1000 AAs) | 1.695 | 1.621 | 0.050 | 5% | 1.689 | 1.626 | 0.056 | 4% | 1.917 | 1.865 | 0.500 | 3% | |
| Hydroxylation sites (per 1000 AAs) | 26.75 | 12.56 | <0.001 | 113% | 26.98 | 13.16 | <0.001 | 105% | 26.18 | 12.99 | <0.001 | 101% | |
| Methylation sites (per 1000 AAs) | 15.41 | 10.27 | <0.001 | 50% | 15.33 | 10.27 | <0.001 | 49% | 13.98 | 9.97 | <0.001 | 40% | |
| Myristoylation sites (per 1000 AAs) | 0.012 | 0.017 | 0.002 | −39% | 0.014 | 0.013 | 0.908 | 1% | 0.021 | 0.015 | 0.222 | 41% | |
| N-linked glycosylation sites (per 1000 AAs) | 3.754 | 3.772 | 0.577 | 0% | 3.748 | 3.781 | 0.356 | −1% | 3.474 | 3.660 | 0.020 | −5% | |
| N-terminal acetylation sites (per 1000 AAs) | 0.525 | 0.517 | 0.345 | 2% | 0.542 | 0.515 | 0.035 | 5% | 0.769 | 0.797 | 0.321 | −3% | |
| O-linked glycosylation sites (per 1000 AAs) | 14.22 | 8.18 | <0.001 | 74% | 14.71 | 8.13 | <0.001 | 81% | 14.21 | 7.81 | <0.001 | 82% | |
| Palmitoylation sites (per 1000 AAs) | 1.334 | 1.734 | <0.001 | −30% | 1.342 | 1.786 | <0.001 | −33% | 1.783 | 2.751 | <0.001 | −35% | |
| Phosphorylation sites (per 1000 AAs) | 44.52 | 21.63 | <0.001 | 106% | 43.75 | 21.71 | <0.001 | 102% | 42.60 | 18.45 | <0.001 | 131% | |
| PUPylation sites (per 1000 AAs) | 1.949 | 2.603 | <0.001 | −34% | 1.920 | 2.633 | <0.001 | −37% | 1.838 | 2.436 | <0.001 | −25% | |
| Pyrrolidone carboxylic acid sites (per 1000 AAs) | 4.803 | 4.357 | <0.001 | 10% | 4.813 | 4.366 | <0.001 | 10% | 4.998 | 4.883 | 0.197 | 2% | |
| Sulfation sites (per 1000 AAs) | 3.589 | 2.291 | <0.001 | 57% | 3.603 | 2.305 | <0.001 | 56% | 3.148 | 2.026 | <0.001 | 55% | |
| SUMOylation sites (per 1000 AAs) | 6.844 | 6.484 | 0.003 | 6% | 6.876 | 6.467 | <0.001 | 6% | 6.457 | 6.682 | 0.035 | −3% | |
| Ubiquitination sites (per 1000 AAs) | 9.377 | 6.924 | <0.001 | 35% | 9.378 | 6.984 | <0.001 | 34% | 9.194 | 6.931 | <0.001 | 33% | |
Abbreviations: long intrinsically disordered protein region (LIDPR); amino acid (AA); disordered protein binding (DPB); eukaryotic linear motif (ELM); molecular recognition feature (MoRF; short DPB region); posttranslational modification (PTM).
Figure 2Number of disulfide bonds panel (A) and disorder content panel (B) in specific subcellular locations. We report the values for all human proteins (black bars), hubs (solid blue), hub interactors (solid red), and hub interactors that exclude hubs (solid green) that are associated with the disorder-enriched hub PPIs vs the remaining hubs (blue horizontal stripes), hub interactors (red horizontal stripes), and hub interactors that exclude hubs (green horizontal stripes), respectively, for each location. We consider all locations that include at least 10 proteins for each of the seven protein sets. The locations in panel (A,B) are sorted in descending order by the values of the number of disulfide bonds (disorder content) for all proteins (black bars).
Figure 3Significance of the differences in the functional and structural characteristics between hubs (on the left), all hub interactors, and hub interactors that exclude hubs (on the right) that are associated with the disorder-enriched hub PPIs when compared to the remaining hubs (on the left) and the corresponding remaining interactors (on the right). Panel (A) summarizes the results concerning structural characteristics, functional regions and motifs, evolutionary conservation and the overall abundance of PTMs. Panel (B) gives detailed results for specific types of PTMs. We reported relative differences and their statistical significance. The characteristics are sorted in descending order by their relative differences for the hubs. The characteristics are color-coded as follows: green for large (relative difference > 20%) and statistically significant (p-value < 0.001) enrichment; red for large (relative difference < −20%) and statistically significant (p-value < 0.001) depletion; and blue for lack of large and significant differences (|relative difference| < 20% or p-value over 0.001). Abbreviations: Eukaryotic linear motif (ELM); molecular recognition feature (MoRF; short disordered protein binding region); and posttranslational modification (PTM).
Analysis of enrichment in the disease associated proteins. We compare the abundance of the disease-linked proteins, measured as the number of the corresponding PPIs that include these proteins that is computed per protein, which are among the hubs and interactors involved in the disorder-enriched hub PPIs vs. their enrichment among PPIs for all human proteins. The 4296 diseases that are associated with human proteins are grouped into disease classes using the MeSH hierarchy. The classes are sorted in descending order by the value of the relative difference between the abundance among the disorder-enriched hub PPIs and all PPIs. To ensure that results are statistically robust to represent diverse subpopulations of these proteins, we select 50% of proteins at random from each of the two protein sets, quantify a given characteristic for each set, and repeat this ten times. We used a larger fraction to define the populations when compared to the results in Table 1 to accommodate the small number of proteins associated with some of the disease classes. We report the average of these repetitions and relative difference between averages for the two protein sets. We also evaluate the significance of the differences between these measurements. We use the t-test if the data are normal (we test normality with the Anderson-Darling test at p-value = 0.05); otherwise we use the Wilcoxon test.
| Disease Class (ID at the Second MeSH Level) | Average for Disorder-Enriched Proteins | Average for all Human Proteins | Relative Difference | ||
|---|---|---|---|---|---|
| ALL Diseases | 0.519 | 0.433 | <0.001 | 20% | |
| Specific disease classes | Neoplasms, including cancers (C04) | 3.150 | 0.436 | <0.001 | 622% |
| Stomatognathic Diseases (C07) | 1.833 | 0.437 | <0.001 | 320% | |
| Endocrine System Diseases (C19) | 1.594 | 0.432 | <0.001 | 269% | |
| Digestive System Diseases (C06) | 1.440 | 0.436 | <0.001 | 230% | |
| Respiratory Tract Diseases (C08) | 1.060 | 0.437 | <0.001 | 143% | |
| Female Urogenital Diseases and Pregnancy Complications (C13) | 0.855 | 0.437 | <0.001 | 96% | |
| Nervous System Diseases (C10) | 0.775 | 0.432 | <0.001 | 79% | |
| Musculoskeletal Diseases (C05) | 0.654 | 0.433 | <0.001 | 51% | |
| Hemic and Lymphatic Diseases (C15) | 0.543 | 0.434 | <0.001 | 25% | |
| Pathological Conditions, Signs and Symptoms (C23) | 0.467 | 0.434 | <0.001 | 8% | |
| Congenital, Hereditary, and Neonatal Diseases and Abnormalities (C16) | 0.456 | 0.433 | <0.001 | 5% | |
| Male Urogenital Diseases (C12) | 0.459 | 0.436 | 0.009 | 5% | |
| Immune System Diseases (C20) | 0.409 | 0.435 | 0.81 | −6% | |
| Eye Diseases (C11) | 0.373 | 0.437 | <0.001 | −17% | |
| Cardiovascular Diseases (C14) | 0.347 | 0.437 | <0.001 | −26% | |
| Nutritional and Metabolic Diseases (C18) | 0.248 | 0.433 | <0.001 | −75% | |
| Skin and Connective Tissue Diseases (C17) | 0.247 | 0.435 | <0.001 | −76% | |
| Otorhinolaryngologic Diseases (C09) | 0.162 | 0.431 | <0.001 | −165% | |
Figure 4Intrinsic disorder levels in the disordered human hubs characterized by the highest levels of disorder (A–E), highly disordered hubs characterized by the highest levels of interactability (F–J), and ordered hubs with the highest interactability levels (K–O). The disorder was annotated using the MobiDB platform [141,142,143]; disorder content is shown in red font. Each plot represents disorder tendencies in two forms—by bar plots showing location of IDPRs and by area plots showing sequence distribution of consensus disorder scores evaluated by MobiDB lite disorder predictor [143]. (A) Thyroid hormone receptor-associated protein 3 (UniProt ID: Q9Y2W1). (B) Zinc finger CCCH domain-containing protein 18 (UniProt ID: Q86VM9). (C) Scaffold attachment factor B1 (UniProt ID: Q15424). (D) Intracellular hyaluronan-binding protein 4 (UniProt ID: Q5JVS0). (E) TATA-binding protein-associated factor 2N (UniProt ID: Q92804). (F) BAG family molecular chaperone regulator 3 (UniProt ID: O95817). (G) CREB-binding protein (UniProt ID: Q92793). (H) RNA-binding protein EWS (UniProt ID: Q01844). (I) Cyclin-dependent kinase inhibitor 1 (UniProt ID: P38936). (J) Mediator of DNA damage checkpoint protein 1 (UniProt ID: Q14676). (K) Ubiquitin (UniProt ID: P0CG48; 8548 interactors). (L) Growth factor receptor-bound protein 2 (UniProt ID: P62993; 804 interactors). (M) Actin (UniProt ID: P60709; 263 interactors). (N) Protection of telomeres protein 1 (UniProt ID: Q9NUX5; 200 interactors). (O) Protein mago nashi homolog (UniProt ID: P61326; 190 interactors).
Figure 5Structural characterization of highly connected ordered hubs. (A) Solution NMR structure of human ubiquitin (PDB ID: 1XQQ) [207]. (B) Solution NMR structure of a complex between the N-terminal SH3 domain of GRB2 (residues 1–56, red ribbons) and a peptide from SOS (blue ribbons) (PDB ID: 1AZE) [210]. (C) Minimized mean solution NMR structure of a complex between the SH2 domain of human GRB2 (residues 49–168, red ribbon) and a KPFY*VNVEF peptide (blue ribbon) (PDB ID: 1BMB) [211]. (D) Minimized mean solution NMR structure of a complex between the C-terminal SH3 domain of human GRB2 (residues 159–215, red ribbon) and a ligand peptide (blue ribbon) (PDB ID: 1IO6). (E) Crystal structure of a telomeric shelterin complex between the POT1 C-terminal domain (POT1C, residues 330–634, red ribbon) and POT1-binding region (residues 254–336, blue ribbon) of the adrenocortical dysplasia protein homolog (PDB ID: 5JUN7) [212]. (F) Crystal structure of a core EJC complex containing the complex of MAGOH (full length, dark orange ribbon), Y14 (residues 66–174, light orange ribbon), eIF4AIII (full length, red ribbon), Btz (the SELOR domain, residues 137–286, two blue ribbons), a non-hydrolyzable ATP analog (AMPPNP, bound to eIF4AIII), and U15 RNA (yellow ribbon) (PDB ID: 2J0Q) [213].
Figure 6Connectivity of proteins in the human PPI network. Panel (A) summarizes the fraction of proteins (nodes in the PPI network) with a given number of PPI interactions (degree). Panel (B) gives the cumulative fraction of proteins having a degree less than the corresponding value on the x-axis. The circles (crosses) show the results for the original PPI network collected from mentha (the network where proteins were mapped to UniProt). The dashed vertical line in panel (B) shows the degree that demarcates hubs, which is defined as the 20% of the most connected nodes.