| Literature DB >> 28155647 |
Yi-Fan Liou1, Hui-Ling Huang1,2, Shinn-Ying Ho3,4.
Abstract
BACKGROUND: Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences.Entities:
Keywords: Hydrophobic spine; Knowledge discovery; Molecular dynamics simulation; Physicochemical properties; Protein folding; Solvent-accessible surface area; Support vector regression
Mesh:
Substances:
Year: 2016 PMID: 28155647 PMCID: PMC5259910 DOI: 10.1186/s12859-016-1368-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Relevant studies on real-value RSA prediction
| Reference | Year | Regression method | Features |
|---|---|---|---|
| Ahmad [ | 2003 | NN | Amino acid proportions |
| Yuan [ | 2004 | SVR | Amino acid proportions |
| Adamczak [ | 2004 | NN | PSSM |
| Wang [ | 2005 | MLR | Amino acid proportions, PSSM, and sequence length |
| Garg [ | 2005 | NN | PSSM and secondary structure |
| Nguyen [ | 2006 | Two-stage SVR | PSSM |
| Chang [ | 2008 | Two-stage SVR | enhance PSSM and sequence length |
| Iqbal [ | 2015 | Basic exact regression | PSSM, PCPs and disorder probability |
| Fan [ | 2015 | GBRT | PSSM, secondary structure, and native disorder |
| Zhang [ | 2015 | SVR | PSSM, PCPs, secondary structure, disorder probability |
| SVR-RSA | 2016 | SVR | PSSM, PCPs, and sequence length |
The PCP feature number of the predictor of RSA for each amino acid residue and MAE of each predictor
| Residue | Feature number | MAE (%)a | Residue | Feature number | MAE (%)a |
|---|---|---|---|---|---|
| A | 30 | 18.93 | L | 22 | 11.74 |
| R | 23 | 18.87 | K | 32 | 16.82 |
| N | 31 | 23.28 | M | 31 | 12.24 |
| D | 21 | 22.71 | F | 29 | 11.95 |
| C | 19 | 8.59 | P | 29 | 19.99 |
| Q | 32 | 19.51 | S | 32 | 23.17 |
| E | 29 | 20.90 | T | 12 | 21.25 |
| G | 25 | 25.19 | W | 10 | 12.17 |
| H | 14 | 18.82 | Y | 30 | 14.20 |
| I | 30 | 10.86 | V | 31 | 12.83 |
aThe MAE for 10-CV of the Sma dataset
The feature usage and a performance summary from other studies that used Barton502 as a dataset
| features | Ours | Chang, | Nguyen, | Garg, | Wang, | Yuan, | Ahmad, |
|---|---|---|---|---|---|---|---|
| PSSM | Yes | Yes | Yes | Yes | Yes | No | No |
| AAindex (PCPs) | Yes | No | No | No | No | No | No |
| sequence length | Yes | Yes | No | No | Yes | No | No |
| amino acid composition | No | No | No | No | No | Yes | Yes |
| secondary structure | No | No | No | Yes | No | No | No |
| regression tool | one-stage SVR | two-stage SVR | two-stage SVR | NN | MLR | one-stage SVR | NN |
| MAE (%) | 14.11 | 14.80 | 15.70 | 15.90 | 16.20 | 18.50 | 18.80 |
| CC | 0.69 | 0.68 | 0.66 | 0.65 | 0.64 | 0.52 | 0.48 |
aMAE and CC are from the original paper
Performance comparison among real-value RSA predictors
| amino acid | ours | Chang [ | SPINE X | SABLE | RVP-net | SARpred |
|---|---|---|---|---|---|---|
| A |
| 13.30 | 12.52 | 46.98 | 18.93 | 16.10 |
| R | 16.81 | 17.00 |
| 26.61 | 20.31 | 18.98 |
| N |
| 19.60 | 18.63 | 32.00 | 24.70 | 22.05 |
| D |
| 19.20 | 18.21 | 28.97 | 23.81 | 21.99 |
| C | 8.87 | 8.90 |
| 52.33 | 8.90 | 11.97 |
| Q |
| 17.20 | 16.34 | 27.07 | 22.29 | 19.66 |
| E |
| 17.80 | 16.73 | 27.11 | 22.28 | 21.74 |
| G |
| 19.50 | 18.53 | 35.76 | 24.48 | 21.23 |
| H | 15.87 | 15.10 |
| 33.87 | 19.37 | 16.64 |
| I |
| 8.70 | 8.51 | 61.34 | 10.56 | 12.47 |
| L |
| 9.80 | 9.80 | 57.84 | 12.11 | 13.40 |
| K | 15.77 | 15.80 |
| 22.11 | 18.31 | 18.39 |
| M | 11.32 |
| 11.46 | 53.58 | 14.22 | 14.25 |
| F | 10.05 | 10.20 |
| 55.35 | 11.72 | 13.12 |
| P | 16.69 | 17.40 |
| 29.19 | 21.51 | 19.01 |
| S |
| 18.30 | 16.78 | 35.19 | 23.05 | 19.78 |
| T | 15.87 | 16.00 |
| 35.43 | 21.58 | 17.86 |
| W | 12.17 |
| 12.31 | 52.21 | 13.43 | 14.97 |
| Y |
| 13.00 | 12.06 | 47.67 | 14.42 | 14.07 |
| V | 9.89 |
| 9.65 | 58.67 | 12.43 | 12.00 |
| win |
| 3 | 7 | 0 | 0 | 0 |
| CC |
| 0.68 |
| 0.5 | 0.51 | 0.59 |
| MAE |
| 14.8 | 14.89 | 39.22 | 19.45 | 18.07 |
*The bolds means the best results
Fig. 1The exposed hydrophobic/hydrophilic neighbor ratios of exposed hydrophobic residues as a function of the exposure degree of an α-helix
Fig. 2Illustrations of a hydrophobic spine. a. The helical wheel of an α-helix. b. The hydrophobic spine of an α-helix. The green ribbon means the α-helix. The green sticks are the side chains of the residues constituting the α-helix. Black dots outline the sphere surface of the side chain atoms
Fig. 3The structures of proteins 1MOF (a), 2WRP (b), 1MOF_I54D (c), and 2WRP_H15I (d). The yellow spheres denote the residues constituting the hydrophobic spine. The red spheres are the side chains of hydrophilic residues that interrupt the hydrophobic spine (resulting in an imperfectly hydrophobic spine)
Fig. 4The secondary structure components (shown in different colors) of proteins 1MOF, 1MOF_I54D, 2WRP, and 2WRP_H15I from 10-ns molecular dynamics simulations at the temperatures of 300, 400, and 500 K
Average a-helix contents (%) from DSSP analysis for 1mof, 1mof-I54D, 2wrp, and 2wrp-H16I at different temperatures
| whole protein | ||||||
| 300 K |
| 400 K |
| 500 K |
| |
| 1mof | 51.27 |
| 22.46 |
| 4.80 |
|
| 1mof-I54D | 43.38 | 21.81 | 5.42 | |||
| 2wrp | 65.44 |
| 48.03 |
| 4.62 |
|
| 2wrp-H16I | 61.51 | 42.41 | 13.86 | |||
| hydrophobic spine regions | ||||||
| 1mof | 30.85 |
| 11.61 |
| 3.64 |
|
| 1mof-I54D | 29.95 | 10.20 | 1.61 | |||
| 2wrp | 13.71 |
| 11.52 |
| 1.30 | 0.84 |
| 2wrp-H16I | 13.26 | 12.29 | 1.12 | |||
The boldface indicates the significant difference after Bonferroni correction