| Literature DB >> 21079665 |
Cristina Marino Buslje1, Elin Teppa, Tomas Di Doménico, José María Delfino, Morten Nielsen.
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Average performance in terms of the AUC and AUC01 values of the three methods: Max-Freq, Shannon, and Kullback-Leibler described to measure conservation.
| Conservation measure | Max-Freq | Shannon | Kullback-Leibler | |||
| AUC | AUC01 | AUC | AUC01 | AUC | AUC01 | |
| Raw | 0.874 | 0.458 | 0.880 | 0.464 |
| 0.485 |
| C | 0.870 | 0.461 | 0.876 | 0.465 | 0.890 |
|
| L | 0.857 | 0.380 | 0.852 | 0.371 | 0.877 | 0.437 |
| Cl | 0.847 | 0.353 | 0.837 | 0.335 | 0.868 | 0.411 |
Each measurement is applied under four conditions defined by sequence weighting using clustering (c); pseudo count correction using low counts (l), the combination of the two (cl), and no correction (raw). In bold is highlighted the method with the highest performance for each performance measure.
Figure 1Histogram over predictive performance of the raw KL scores as a function of the number of sequences in the MSA.
The number of Pfam entries in each sequence bin is 9, 9, 36, 66, and 314, respectively.
Figure 2Identification of catalytic residues using four different prediction scores.
Plotted is the Cα representation of the PDB entry 1D4C representing the Pfam PF00890 entry. Catalytic residues are encircled in green. The four different prediction scores are shown A) KL Conservation, B) Proximity conservation (pC), C) proximity MI (pMI) and D) Catalytic likeliness score (Cls). Highlighted with black circles are the predicted false positive residues: 47, 39, 15 and 4 respectively. The prediction scores are represented in blue to red scale (blue: lowest; red: highest). Molecular graphics image was produce with UCSF Chimera package. (University of California, San Francisco).
Optimal parameters and average predictive performance in terms of AUC and AUC01 for the two combined prediction methods including only one proximity measure.
| Method | KL+pMI | KL+pC |
| Parameters | wMI = 0.8±0.0 | wC = 0.6±0.0 |
| DMI = 7.9±0.2 | DC = 8.0±0.0 | |
| Zthr = 5.5±0.32 | ||
| AUC | 0.922 | 0.910 |
| AUC01 | 0.574 | 0.562 |
KL+pMI is the method combining KL conservation with the pMI mutual information measure. KL+pC is the method combining KL conservation with the pC conservation measure. wMI is the relative weight on pMI, DMI is the proximity distance threshold for the pMI measure, Zthr is the MI Z-score threshold, wC is the relative weight on pC, and DC is the proximity distance threshold for the pC measure. Parameters and standard deviations were identified using five-fold cross validation as described in Materials and Methods.
Sensitivity of the catalytic residue identification methods at different specificity thresholds.
| Sensitivity | ||||||
| Specificity | KL | pMI | pC | KL+pMI | KL+pC | Cls |
| 0.99 | 0.222 | 0.122 | 0.159 | 0.300 | 0.282 |
|
| 0.95 | 0.544 | 0.375 | 0.423 | 0.646 | 0.637 |
|
| 0.90 | 0.716 | 0.560 | 0.604 | 0.802 | 0.774 |
|
| 0.85 | 0.798 | 0.666 | 0.703 | 0.861 | 0.835 |
|
KL is the Kullback-Leibler conservation score, pMI is the proximity averaged mutual information score. pC is the proximity averaged conservation score, KL+pMI is the combined score of KL and pMI, KL+pC is the combined score of KL and pC, and Cls is the Catalytic likeliness score, The sensitivity is determined as an average over the 434 CSA families at the different specificity thresholds. In bold is highlighted the best performing method at each specificity level.