| Literature DB >> 20565847 |
Kristoffer Illergård1, Simone Callegari, Arne Elofsson.
Abstract
BACKGROUND: In water-soluble proteins it is energetically favorable to bury hydrophobic residues and to expose polar and charged residues. In contrast to water soluble proteins, transmembrane proteins face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region rich in phospholipid head-groups. Therefore, it is energetically favorable for transmembrane proteins to expose different types of residues in the different regions.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20565847 PMCID: PMC2904353 DOI: 10.1186/1471-2105-11-333
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Amino acid distribution and solvent accessibility. The incidence of three different classes of amino acids as a function of distance from the membrane center. A) The subset of residues that have accessibility lower or equal to 25% (buried). B) The subset of residues that have accessibility higher than 25% (exposed). The coloring differentiates between polar, intermediate and hydrophobic residues. The two lower figures show the log of the difference in substitution rates between the globular and membrane regions plotted against the biological hydrophobicity value for each amino acid type. In C) the rate for buried sites is shown while in D) the rate for exposed sites is shown.
Figure 2Relative substitution rate and solvent accessibility. A) The relative substitution rate as a function of Z-coordinate. Evolutionarily conserved sites have low values and variable sites have high independently on their distance from the membrane center. Sites that are solvent accessible and inaccessible are colored differently. B) The relative substitution rate as a function of relative accessibility for all residues show a linear relationship.
Benchmarking accessibility predictors
| Mem Proteins | W-S Proteins | ||||||
|---|---|---|---|---|---|---|---|
| All | Z:0-10 | Z:10-22 | Z: > 22 | Membr | non-Membr. | All | |
| MPRAP | 0.45 | 0.47 | 0.40 | 0.49 | 0.45 | 0.46 | 0.55 |
| ACCPRO | 0.41 | 0.19 | 0.34 | 0.53 | 0.20 | 0.47 | 0.63 |
| SABLE | 0.29 | 0.08 | 0.24 | 0.46 | 0.11 | 0.38 | 0.55 |
| TMX | 0.21 | 0.35 | 0.19 | 0.13 | 0.32 | 0.14 | - |
| ACCPRO+TMX | 0.39 | 0.33 | 0.30 | 0.53 | 0.31 | 0.47 | - |
| 0.12 | 0.05 | 0.14 | 0.11 | 0.08 | 0.13 | - | |
| 0.40 | 0.26 | 0.41 | 0.47 | 0.27 | 0.44 | - | |
| 0.32 | 0.11 | 0.25 | 0.46 | 0.11 | 0.36 | - | |
A comparison of the performance for different accessibility predictors using two-state predictions in water-soluble (W-S) proteins and membrane (Mem) proteins. The reported values are the Matthew correlation coefficients for identifying buried residues in a binary alphabet. Analysis was performed the entire protein (whole) or regions in the membrane either divided by Z-coordinate or by membrane definitions from OPM. Due to computational limitations only three predictors were applied on the water-soluble dataset.
Figure 3Predicting buried residues. Performance for predicting buried residues at different distances from the membrane center. A) A predictor for membrane region, TMX, and a predictor for soluble proteins, ACCPRO. B) The novel predictor MPRAP is compared to the combination of TMX and ACCPRO.
Input parameters
| Parameters | Specificity | Sensitivity | Accuracy | MCC |
|---|---|---|---|---|
| AA | 0.57 | 0.89 | 0.59 | 0.19 |
| R4S | 0.70 | 0.71 | 0.68 | 0.37 |
| Zpred | 0.60 | 0.70 | 0.60 | 0.19 |
| Zcoord | 0.56 | 0.75 | 0.56 | 0.12 |
| 0.72 | 0.70 | 0.69 | 0.39 | |
| R4S + Zpred | 0.70 | 0.77 | 0.70 | 0.40 |
| AA + Zpred | 0.63 | 0.70 | 0.62 | 0.24 |
| AA + R4S | 0.70 | 0.78 | 0.71 | 0.41 |
| AA + R4S + Zpred | 0.71 | 0.77 | 0.71 | 0.43 |
| 0.72 | 0.70 | 0.70 | 0.40 | |
| 0.73 | 0.75 | 0.72 | 0.43 | |
| 0.73 | 0.73 | 0.71 | 0.43 | |
| 0.74 | 0.74 | 0.73 | 0.44 | |
| 0.72 | 0.74 | 0.71 | 0.41 | |
| 0.73 | 0.74 | 0.72 | 0.44 | |
| 0.74 | 0.74 | 0.73 | 0.44 | |
| 0.74 | 0.74 | 0.73 | 0.44 | |
| 0.78 | 0.82 | 0.74 | 0.45 | |
| 0.76 | 0.78 | 0.74 | 0.46 | |
Different combinations of input parameters used to train a Support Vector Machine to predict surface accessibility in a two state alphabet. For each predictor the specificity, sensitivity, accuracy and the Matthew Correlation Coefficient for predicting buried residues in a binary alphabet is reported. The first five lines contain the prediction results using a single type of information, where AA is amino acid encoded using sparse encoding, R4S is the substitution rate calculated from rate4site scores, Zpred is predicted distance from membrane center, Zcoord is the real (not predicted) distance from the membrane center and PSIis PSIBLAST-PSSM. The next group of predictors was obtained using combinations of these inputs. The next two lines contain the results for two predictors using the optimal combination of inputs but other kernels than the radial-basis kernel. The last line is the performance of the final version of MPRAP, i.e. the one trained to predict absolute accessibility.
Performance of predictors using absolute numbers
| Parameters | Cc | MCC | MAE |
|---|---|---|---|
| MPRAP | 0.58 | 0.45 | 18.4 |
| SABLE | 0.40 | 0.29 | 21.9 |
| 0.18 | 0.12 | 24.3 | |
| 0.52 | 0.40 | 19.8 | |
| 0.41 | 0.32 | 21.6 |
Performance of the final version of MPRAP and other predictors that predict relative surface area.
Assessing the quality of protein structures
| PDB | Protein | Resolution | Accuracy | MCC |
|---|---|---|---|---|
| EmrE | 3.8 | 0.51 | -0.09 | |
| EmrE | 3.7 | 0.48 | -0.19 | |
| MsbA | 4.5 | 0.55 | -0.06 | |
| MsbA | 3.8 | 0.55 | 0.10 | |
| MsbA | 4.2 | 0.63 | 0.26 | |
| MsbA | 3.2 | 0.81 | 0.64 | |
| MsbA | 3.0 | 0.85 | 0.69 | |
Agreement between predicted and structurally derived accessibility on six PDB-structures. The four structures at the top are published structures that have been removed from the database due to discovered anomalies. The two structure at the bottom are recent structures of proteins from the same protein families that are present in PDB.
Identification of interface residues
| Buried | 74% | 26% |
| Interface | 51% | 49% |
| Exposed | 21% | 79% |
Fraction of residues predicted by MPRAP to be buried (<25% accessibility), B, or exposed, E, among the buried, interface and exposed residues.
Figure 4Identification of interface residues. Identification of interface residues among residues exposed in a single protein chain. For all these residues MPRAP was used to predict its accessibility. At a given MPRAP cutoff the fraction of all interface residues predicted to have accessibility less than the cutoff is plotted against the fraction non-interface residues above this cutoff.