| Literature DB >> 30279585 |
Qingzhen Hou1, Raphaël Bourgeas1, Fabrizio Pucci1, Marianne Rooman2.
Abstract
The solubility of globular proteins is a basic biophysical property that is usually a prerequisite for their functioning. In this study, we probed the solubility of globular proteins with the help of the statistical potential formalism, in view of objectifying the connection of solubility with structural and energetic properties and of the solubility-dependence of specific amino acid interactions. We started by setting up two independent datasets containing either soluble or aggregation-prone proteins with known structures. From these two datasets, we computed solubility-dependent distance potentials that are by construction biased towards the solubility of the proteins from which they are derived. Their analysis showed the clear preference of amino acid interactions such as Lys-containing salt bridges and aliphatic interactions to promote protein solubility, whereas others such as aromatic, His-π, cation-π, amino-π and anion-π interactions rather tend to reduce it. These results indicate that interactions involving delocalized π-electrons favor aggregation, unlike those involving no (or few) dispersion forces. Furthermore, using our potentials derived from either highly or weakly soluble proteins to compute protein folding free energies, we found that the difference between these two energies correlates better with solubility than other properties analyzed before such as protein length, isoelectric point and aliphatic index. This is, to the best of our knowledge, the first comprehensive in silico study of the impact of residue-residue interactions on protein solubility properties.The results of this analysis provide new insights that will facilitate future rational protein design applications aimed at modulating the solubility of targeted proteins.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30279585 PMCID: PMC6168528 DOI: 10.1038/s41598-018-32988-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The folding free energy contribution of the salt bridge interaction Asp-Lys differs according to whether the potentials are derived from soluble or weakly soluble proteins. The energies are in kcal/mol, the distance d (in Å) is computed between residue side chain centroids, and the residues are separated by at least 8 residues along the chain. Distance bins containing ten occurrences or less are not drawn (see Eq. (6)).
Insolubilizing residue-residue interactions, defined by and the strict significance criteria requiring both and values to be higher than 95% of the equivalent quantities computed from randomly shuffled datasets (Sig and Sig).
| Interactions | Residue pairs |
| Sig |
| Sig |
|---|---|---|---|---|---|
| TRP-TRP | −0.412 | 0.99 | 0.181 | 0.99 | |
| TRP-PHE | −0.207 | 1 | 0.052 | 1 | |
| TYR-TRP | −0.177 | 1 | 0.038 | 0.99 | |
| TYR-PHE | −0.124 | 0.97 | 0.019 | 0.99 | |
| His- | HIS-TYR | −0.155 | 1 | 0.038 | 0.99 |
| HIS-TRP | −0.191 | 0.99 | 0.063 | 1 | |
| HIS-PHE | −0.122 | 0.96 | 0.022 | 0.95 | |
| Cation- | ARG-TRP | −0.238 | 1 | 0.074 | 1 |
| ARG-PHE | −0.120 | 0.99 | 0.017 | 0.99 | |
| ARG-TYR | −0.101 | 0.98 | 0.017 | 0.98 | |
| LYS -TRP | −0.162 | 0.97 | 0.068 | 0.98 | |
| Amino- | GLN-TRP | −0.359 | 1 | 0.135 | 1 |
| GLN-PHE | −0.128 | 1 | 0.028 | 1 | |
| ASN-PHE | −0.140 | 1 | 0.024 | 0.99 | |
| ASN-TRP | −0.183 | 1 | 0.044 | 0.98 | |
| GLN-TYR | −0.141 | 0.99 | 0.024 | 0.95 | |
| Anion- | ASP-TRP | −0.211 | 1 | 0.049 | 1 |
| Aromatic-containing | TRP-SER | −0.294 | 1 | 0.104 | 1 |
| PHE-CYS | −0.232 | 1 | 0.062 | 1 | |
| TRP-ALA | −0.206 | 1 | 0.048 | 1 | |
| TRP-PRO | −0.205 | 1 | 0.045 | 1 | |
| TYR-SER | −0.129 | 1 | 0.021 | 1 | |
| TRP-LEU | −0.192 | 1 | 0.037 | 1 | |
| TRP-GLY | −0.153 | 0.99 | 0.033 | 0.98 | |
| TRP-CYS | −0.267 | 0.99 | 0.076 | 0.97 | |
| TYR-GLY | −0.109 | 0.98 | 0.021 | 0.97 | |
| TRP-ILE | −0.114 | 0.97 | 0.024 | 0.98 | |
| His-containing | HIS-ALA | −0.108 | 1 | 0.016 | 0.98 |
| HIS-PRO | −0.124 | 0.99 | 0.021 | 0.97 | |
| HIS-LEU | −0.110 | 0.97 | 0.027 | 0.99 | |
| Arg-containing | ARG-SER | −0.152 | 1 | 0.025 | 1 |
| ARG-ARG | −0.184 | 0.99 | 0.036 | 0.99 | |
| ARG-PRO | −0.128 | 0.99 | 0.030 | 0.99 | |
| ARG-LEU | −0.084 | 0.99 | 0.008 | 0.96 | |
| ARG-CYS | −0.230 | 0.98 | 0.062 | 0.98 | |
| ARG-GLN | −0.166 | 1 | 0.033 | 1 | |
| ARG-ASN | −0.120 | 0.99 | 0.023 | 1 | |
| Asn/Gln-containing | ASN-GLN | −0.158 | 0.99 | 0.032 | 0.99 |
| GLN-CYS | −0.152 | 0.95 | 0.051 | 1 | |
| Miscellaneous | LEU-CYS | −0.195 | 1 | 0.050 | 1 |
| LEU-SER | −0.074 | 0.97 | 0.010 | 0.97 | |
| SER-SER | −0.109 | 0.96 | 0.019 | 0.95 |
Solubilizing residue-residue interactions, defined by and the strict significance criteria requiring both and values to be higher than 95% of the equivalent quantities computed from randomly shuffled datasets (Sig and Sig).
| Interactions | Residue pairs |
| Sig |
| Sig |
|---|---|---|---|---|---|
| Lys-salt bridges | LYS-GLU | 0.115 | 1 | 0.017 | 1 |
| LYS-ASP | 0.105 | 0.97 | 0.013 | 0.96 | |
| Aliphatic-aliphatic | VAL-VAL | 0.156 | 1 | 0.025 | 1 |
| ILE-ILE | 0.125 | 1 | 0.018 | 1 | |
| VAL-ILE | 0.096 | 1 | 0.010 | 1 | |
| GLY-VAL | 0.114 | 1 | 0.015 | 1 | |
| ILE-ALA | 0.072 | 1 | 0.006 | 0.97 | |
| LEU-ILE | 0.064 | 0.99 | 0.007 | 1 | |
| LEU-VAL | 0.058 | 0.99 | 0.004 | 0.96 | |
| GLY-GLY | 0.113 | 0.98 | 0.014 | 0.96 | |
| Aliphatic-containing | ILE-LYS | 0.134 | 1 | 0.026 | 1 |
| VAL-GLU | 0.120 | 1 | 0.017 | 1 | |
| VAL-THR | 0.086 | 1 | 0.010 | 0.99 | |
| GLY-ASP | 0.114 | 1 | 0.017 | 0.99 | |
| ILE-THR | 0.080 | 1 | 0.008 | 0.97 | |
| GLY-THR | 0.093 | 0.99 | 0.015 | 0.99 | |
| GLY-GLU | 0.105 | 0.99 | 0.012 | 0.96 | |
| ILE-GLU | 0.089 | 0.99 | 0.011 | 0.95 | |
| ALA-LYS | 0.095 | 0.98 | 0.013 | 0.97 | |
| VAL-PRO | 0.068 | 0.95 | 0.008 | 0.98 | |
| VAL-LYS | 0.097 | 0.95 | 0.014 | 0.98 | |
| Miscellaneous | GLU-THR | 0.153 | 1 | 0.032 | 1 |
Figure 2Residue-residue group potentials derived from datasets of soluble, aggregation-prone and all proteins (, and ). The energies are in kcal/mol, the distance d is computed between the residue side chain centroids of the smallest amino acids in the group, and the residue pairs are separated by at least 8 residues along the chain. Distance bins containing twenty occurrences or less are not drawn.
Correlation between experimental solubility, folding free energies and sequence-derived features.
| Solubility | Length | Isoelectric point | Aliphatic Index | |
|---|---|---|---|---|
| Solubility | — | −0.31 | −0.18 | 0.11 |
|
| −0.33 | −0.11 | 0.37 | |
|
| 0.20 | −0.65 | 0.12 | −0.30 |