| Literature DB >> 18450811 |
Abstract
MOTIVATION: Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18450811 PMCID: PMC2718669 DOI: 10.1093/bioinformatics/btn214
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Alignment column filter behavior on five example columns
The five example columns contain two specificity groups. The empty set symbol, ∅, indicates that the first two columns do not pass any filters. The strictest filters that the third, fourth and fifth columns pass are (respectively) the low-overlap, one-group-conserved and all-groups-conserved filters.
Enrichment of column amino acid patterns near ligands
| Filter | ≤ 5Å from | > 5Å from | |
|---|---|---|---|
| ligand | ligand | ||
| Low-overlap (ℒ) | 0.066 (106) | 0.012 | |
| One-group-conserved (𝒪) | 0.125 (2196) | 5.577 | |
| All-groups-conserved (𝒜) | 0.034 (669) | 8.814 |
Each row gives the fraction of positions ≤ 5Å and >5Å from ligands having the given pattern. The raw count of each pattern is given in parentheses. Conserved positions were removed prior to the enrichment analysis, and each position is counted only for the most specific filter it passes. P-values were calculated from the hypergeometric distribution. Positions passing the one-group-conserved and all-groups-conserved filters are significantly enriched near ligands. Significant enrichment is shown in bold.
Comparison of amino acid distributions
| Charged AA | Non-charged Polar AA | Other AA | |
|---|---|---|---|
| All positions | 0.24 | 0.24 | 0.52 |
| Catalytic sites | 0.66 | 0.25 | 0.09 |
| Putative SDPs | 0.24 | 0.31 | 0.45 |
Putative SDPs are more likely to be a non-charged polar residues than a residue chosen at random. Catalytic sites do not exhibit this bias; instead they are more charged.
Comparison of secondary structure distributions
| α-helix | β-sheet | Loop | |
|---|---|---|---|
| All positions | 0.41 | 0.22 | 0.37 |
| Catalytic sites | 0.28 | 0.22 | 0.50 |
| Putative SDPs | 0.27 | 0.21 | 0.52 |
Putative SDPs are much more likely to be in loop regions than would be expected by chance. Catalytic sites show a similar secondary structure bias.
Average fraction of non-conservative (relative to each partition) amino acid differences between specificity groups by position type
| Amino acid partition | Different between groups | |
|---|---|---|
| Putative SDPs | All positions | |
| Polarity | 0.656 | 0.418 |
| Size | 0.642 | 0.450 |
| Hydrophobicity | 0.376 | 0.279 |
| Charge | 0.369 | 0.274 |
Each row gives the fraction of all amino acid pairs between specificity groups that differ under the given amino acid property partition. All properties are significantly less conserved between specificity groups in putative SDPs than over all positions.
Fig. 1Box plots for the SDP prediction methods on the putative SDPs in the EC–Pfam dataset ordered by average minimum. Each box shows the average over all alignments of the five-number summary (the minimum, lower quartile, median, upper quartile and maximum) for a method. Lower averages indicate better performance. The simple GroupSim outperforms the previous methods in this evaluation, and GroupSim+ConsWin improves on it.
Fig. 2PR curves for representative SDP prediction methods on the putative SDPs from the EC–Pfam dataset. The simple GroupSim is competitive with the other methods; SDPpred is the only method that substantially outperforms it. GroupSim+ConsWin outperforms all methods. All methods improve when the conservation window heuristic is applied (see Supplementary Material).