| Literature DB >> 19922660 |
Tracey Bray1, Pedro Chan, Salim Bougouffa, Richard Greaves, Andrew J Doig, Jim Warwicker.
Abstract
BACKGROUND: The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Screenshot showing the required user input fields. A user can either input a pre-existing PDB code and whether to use the asymmetric or biological unit structure or upload their own PDB-style structure file. All fields are compulsory.
Figure 2Screenshot of an example results output for SitesIdentify. The output for 1j2c (rat heme oxygenase-1) when submitted using the charge-based method and a 10Ǻ radius. The list of active site residues is truncated for display purposes.
Figure 3An example of highlighted residues in an alternative predicted site. The biological unit structure for 2af4 (phosphotransacetylase) is a homodimer and identical active sites are present on both chains. SitesIdentify identifies only one site (in red), but the annotation is transformed onto the other chain in order to identify the other active site (shown in purple).
Figure 4An example of differential site prediction between asymmetric and biological unit structures. The active site predicted for the asymmetric unit of 1b6t (phosphopantetheine adenylyltransferase) is reasonably close to the bound ligand shown in part A. The biological unit is formed by a cyclical arrangement of the asymmetric unit and when SitesIdentify is run on this structure it incorrectly identifies the central void as the enzyme active site (part B).
Functional site prediction tools included in the comparison analysis.
| Application | Method Category | Description | Reference |
|---|---|---|---|
| Uniform charge method | CF | A uniform charge weighting is applied to each Cα atom on the protein and the electrostatic potential (Finite Difference-Poisson-Boltzmann calculation with no dielectric boundary) is sampled at points on a 2Ǻ grid across the protein volume. The peak potential indicates the position of the predicted active site. | Bate and Warwicker, (2004) |
| Conservation method | SC, CF | As for the above method, except that the charge weightings applied across the protein are replaced with conservation weights derived from normalised sequence profile scores reflecting the amino acid diversity, the stereochemical diversity and the gap occurrence. | Greaves and Warwicker, (2005) |
| SC | Consurf calculates the degree of evolutionary conservation for each residue in a structure and gives them an integer score from 1 to 9, with 9 being the most conserved residues. A graphical representation of the structure is then coloured according to these residue conservation scores, which allows visual identification of highly conserved patches that are predicted to be functional sites. | Landau et al. (2005) | |
| SC | Predicts active sites by identifying clusters of residues that have higher than usual evolutionary restraint. Evolutionary constraint was identified by three measures: 1) whether there was a higher degree of evolutionary conservation than expected at a position, 2) whether environment specific substitution tables made weak predictions of the amino acid substitution patterns, and 3) residues that have spatially conserved positions when structures of proteins within the same family are superimposed. | Chelliah et al. (2004) | |
| HP | The active site residues are predicted to be those with the highest hydrophobic deficiency score. This is the difference between the expected hydrophobicity and the observed hydrophobicity value for each residue. The expected hydrophobicity of a residue is determined by a residues relative position to the theoretically most hydrophobic point in the protein. The observed hydrophobicity is a combination of the hydrophobicity value of that residue and the effect on the residues position of other sidechains around it. | Brylinksi et al. (2007) | |
| CF | Non-bonded interaction energies are calculated by placing a 3D grid over the whole protein and then evaluating the interaction energy between the protein and a methyl group at each point on the grid. The positions of the probes on the grid that gave the best interaction energies were then spatially clustered to identify groups of close probes. These clusters are then assigned a single interaction energy based on the energies of their member probes. The clusters are then ranked by their representative interaction energy and the highest ranked cluster is predicted as the active site. | Laurie and Jackson (2005) | |
| TM | PDBSiteScan takes 3D fragments of a protein structure and compares them to 3D structure fragments of known active sites. The known active sites structures are held in a collection called PDBSite that is formed from annotation in the PDB SITE field and also REMARK 800 fields. Results were discounted if they compared to annotation held for the test protein. | Ivanisenko et al. (2004) | |
| CF | PASS (Putative Active Site Spheres) is essentially a geometric cleft-finding method. The shape, volume and depth of the cleft determine which clefts are predicted as active site clefts. | Brady and Stouten (2000) | |
| CP | Thematics identifies ionisable residues with unusually perturbed titrations curves. Active sites are predicted where two or more of these ionisable residues form a cluster in 3D space. | Wei et al. (2007) |
A description of the seven tools used in this analysis along with a brief description of each method. Method categories are as follows: CF = cleft-finding, SC = sequence conservation, HP = hydrophobicity, TM = structural template matching, CP = chemical properties.
Prediction accuracies achieved for each functional site prediction method.
| Method | Absolute Recall Rate | Relative Recall Rate | Average Distance between Predicted and Real Centroid (Å) |
|---|---|---|---|
| SitesIdentify | |||
| Uniform charge method | 47.6% | 63.0% | 11.2 |
| Conservation method | 56.9% | 74.7% | 9.4 |
| Consurf | 58.6% | 78.2% | 8.2 |
| Crescendo | 46.9% | 63.8% | 10.3 |
| FOD | 39.7% | 56.1% | 10.6 |
| QSiteFinder | 40.1% | 53.0% | 13.0 |
| PDBSiteScan | 28.1% | 38.4% | 15.5 |
| PASS | 36.6% | 49.3% | 14.8 |
| Thematics | 35.8% | 48.9% | 13.5 |
The absolute and relative recall rates achieved along with the average distance between real and active site centroids for each method.
Figure 5Comparison of distances between the real centroid and the predicted centroid for each method. The cumulative percentage of the set that have differences between the real and predicted active site centroids at each distance are shown for each method.
Figure 6Comparison of distances between the real centroid and the predicted centroid for Consurf and SitesIdentify run on monomer structures. The cumulative percentage of the set that have differences between the real and predicted active site centroids at each distance are shown for both methods.