| Literature DB >> 28790346 |
Hua Zhang1, Tao Jiang2, Guogen Shan3, Shiqi Xu4, Yujie Song4.
Abstract
Gaussian network model (GNM), regarded as the simplest and most representative coarse-grained model, has been widely adopted to analyze and reveal protein dynamics and functions. Designing a variation of the classical GNM, by defining a new Kirchhoff matrix, is the way to improve the residue flexibility modeling. We combined information arising from local relative solvent accessibility (RSA) between two residues into the Kirchhoff matrix of the parameter-free GNM. The undetermined parameters in the new Kirchhoff matrix were estimated by using particle swarm optimization. The usage of RSA was motivated by the fact that our previous work using RSA based linear regression model resulted out higher prediction quality of the residue flexibility when compared with the classical GNM and the parameter free GNM. Computational experiments, conducted based on one training dataset, two independent datasets and one additional small set derived by molecular dynamics simulations, demonstrated that the average correlation coefficients of the proposed RSA based parameter-free GNM, called RpfGNM, were significantly increased when compared with the parameter-free GNM. Our empirical results indicated that a variation of the classical GNMs by combining other protein structural properties is an attractive way to improve the quality of flexibility modeling.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28790346 PMCID: PMC5548781 DOI: 10.1038/s41598-017-07677-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Plots of the ACC values with the increased iteration number resulted in the procedures for the PSO-based parameter estimations. Panels (A), (B) and (C) show the cases with sliding window sizes of 1, 3 and 5, respectively.
The average correlation coefficients (ACCs) between the actual B’-factors and the predicted B’-factors computed by the GNM, pfGNM, RpfGNM, CN, WCN, RWCN, RSA and DsspRSA9 methods.
| Method | PDB365 | PDB607 | PDB3225 | |
|---|---|---|---|---|
| GNM-type method | GNM | 0.536(±0.2035) | 0.568(±0.1613) | 0.581(±0.1702) |
| pfGNM | 0.596(±0.1698) | 0.621(±0.1439) | 0.633(±0.1400) | |
| RpfGNM |
|
|
| |
| CN-type method | CN | 0.489(±0.1283) | 0.485(±0.0942) | 0.506(±0.1029) |
| WCN | 0.586(±0.1408) | 0.609(±0.1271) | 0.616(±0.1196) | |
| RWCN | 0.607(±0.1299) | 0.626(±0.1176) | 0.631(±0.1112) | |
| RSA-type method | RSA | 0.522(±0.10133) | 0.524(±0.0898) | 0.523(±0.0896) |
| DsspRSA9 |
|
|
| |
Note: The computations were based on three datasets PDB365, PDB607 and PDB3225. The values in parentheses represent the standard deviations of ACC values.
Figure 2Comparison of the ACC values at the chain level between the pfGNM and the proposed RpfGNM based on blind tests on the PDB607 (panel A) and PDB3225 (panel B) datasets.
The ACC values of GNM, pfGNM and RpfGNM calculated on subsets of the PDB3225 dataset according to varying sequence lengths with step size of 100.
| Range of length (L) | No. of proteins | GNM | pfGNM | RpfGNM | Mean ratio of residues with zero RSA values | Mean ratio of residues with RSA value <=25% |
|---|---|---|---|---|---|---|
| L < 100 | 338 | 0.574(±0.2186) | 0.628(±0.1857) |
| 8.18% | 42.21% |
| 100 <= L < 200 | 1010 | 0.574(±0.1828) | 0.630(±0.1461) |
| 10.83% | 49.47% |
| 200 <= L < 300 | 819 | 0.580(±0.1573) | 0.631(±0.1264) |
| 14.00% | 55.87% |
| 300 <= L < 400 | 592 | 0.582(±0.1466) | 0.631(±0.1299) |
| 15.45% | 59.22% |
| 400 <= L < 500 | 261 | 0.588(±0.1545) | 0.643(±0.1210) |
| 15.44% | 60.46% |
| 500 <= L < 600 | 109 | 0.615(±0.1407) | 0.666(±0.1234) |
| 15.88% | 62.72% |
| L >= 600 | 96 | 0.612(±0.1393) | 0.653(±0.1138) |
| 15.96% | 63.29% |
Note: The values in parentheses represent the standard deviations of ACC values over the corresponding protein subset. The mean ratios of buried residues based on RSA cutoffs of zero and 25% are also included.
Mean B’-factor values for the six tripeptide exposure patterns with RSA cutoff of 25% based on the PDB3225 dataset, where the corresponding standard deviations are also included in parentheses.
| Exposure of the central residue | Tripeptide exposure pattern | No. of residues | Mean B’-factor |
|---|---|---|---|
| Buried | bbb | 218956 | −0.621(±0.4589) |
| bbe/ebb | 178245 | −0.239(±0.6553) | |
| ebe | 75006 | 0.119(±0.8629) | |
| Exposed | beb | 74334 | −0.042(±0.7297) |
| bee/eeb | 178292 | 0.379(±0.9878) | |
| eee | 97142 | 0.902(±1.3104) |
Mean values of the actual, pfGNM-predicted and RpfGNM-predicted B’-factors for buried and exposed residues defined using RSA cutoff of 25% based on the PDB3225 dataset, where the corresponding standard deviations are also included in parentheses.
| Exposure of residues | Mean actual B’-factor | Mean pfGNM-predicted B’-factor | Mean RpfGNM-predicted B’-factor |
|---|---|---|---|
| Buried | −0.358(±0.6734) | −0.500(±0.6473) | −0.484(±0.5883) |
| Exposed | 0.476(±1.1527) | 0.664(±0.9995) | 0.643(±1.0695) |
The ACCs between the cross-correlations of residue fluctuations by GNM, pfGNM and RpfGNM on the PDB3225 datasets.
| Method | GNM | pfGNM | RpfGNM |
|---|---|---|---|
| GNM | 1 | 0.599(±0.1134)/0.808(±0.1221)a | 0.603(±0.1143)/0.814(±0.1217)a |
| pfGNM | 1 | 0.995(±0.0036) | |
| RpfGNM | 1 |
aThe left value means the distance cutoff used in GNM is 8 Ǻ, while the right value corresponds to the cutoff of 12 Ǻ used in GNM. The values in parentheses represent the standard deviations of ACC values over the corresponding dataset.
The average correlation coefficients (ACCs) between the actual B’-factors (or MD-derived B’-factors) and the predicted B’-factors computed by the GNM, pfGNM, RpfGNM, CN, WCN, RWCN, RSA and DsspRSA9 methods based on the MoDEL136 dataset.
| Method type | Method | Actual B’-factor | MD-derived B’-factor |
|---|---|---|---|
| GNM-type method | GNM | 0.568(±0.2110) | 0.657(±0.1446) |
| pfGNM | 0.611(±0.1785) | 0.664(±0.1291) | |
|
|
|
| |
| CN-type method | CN | 0.484(±0.1212) | 0.465(±0.0896) |
| WCN | 0.587(±0.1432) | 0.575(±0.1043) | |
| RWCN | 0.602(±0.1345) | 0.573(±0.1062) | |
| RSA-type method | RSA | 0.481(±0.1134) | 0.466(±0.0839) |
| DsspRSA9 | 0.617(±0.1506) | 0.600(±0.0945) |
Note: The values in parentheses represent the standard deviations of ACC values over the corresponding dataset.
The CC values for cytochrome c3 between the actual/MD-derived B-factors and the predicted B-factors by GNM, pfGNM and RpfGNM.
| Method | CC against the actual B-factors | CC against the MD-derived B-factors |
|---|---|---|
| GNM | 0.460 | 0.440 |
| pfGNM | 0.484 | 0.676 |
| RpfGNM | 0.510 | 0.712 |
Note: The actual B-factors of cytochrome c3 are extracted from a PDB structure with PDB id 1AQEA. The distance cutoff used in GNM is 8 Ǻ.
Figure 3Plots of the actual B’-factor profile (panel A) and the B’-factor profiles predicted with GNM (panel B), pfGNM (panel C), RpfGNM (panel D) and MD simulation (panel E) for cytochrome c3 (PDB: 1AQEA).
Figure 4The maps of cross-correlations of residue fluctuations for cytochrome c3 (PDB:1AQEA) computed with (A) GNM, (B) pfGNM, (C) RpfGNM, and (D) MD.