| Literature DB >> 20923553 |
Zoltán Simon1, Margit Vigh-Smeller, Agnes Peragovics, Gábor Csukly, Gergely Zahoránszky-Kohalmi, Anna A Rauscher, Balázs Jelinek, Péter Hári, István Bitter, András Málnási-Csizmadia, Pál Czobor.
Abstract
BACKGROUND: Various pattern-based methods exist that use in vitro or in silico affinity profiles for classification and functional examination of proteins. Nevertheless, the connection between the protein affinity profiles and the structural characteristics of the binding sites is still unclear. Our aim was to investigate the association between virtual drug screening results (calculated binding free energy values) and the geometry of protein binding sites. Molecular Affinity Fingerprints (MAFs) were determined for 154 proteins based on their molecular docking energy results for 1,255 FDA-approved drugs. Protein binding site geometries were characterized by 420 PocketPicker descriptors. The basic underlying component structure of MAFs and binding site geometries, respectively, were examined by principal component analysis; association between principal components extracted from these two sets of variables was then investigated by canonical correlation and redundancy analyses.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20923553 PMCID: PMC2972294 DOI: 10.1186/1472-6807-10-32
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
List of PDB codes of the applied 154 proteins
| 13gs | 1dug | 1hvr | 1n5u | 1rwx | 1yb5 | 2axm | 2g5r |
| 1a3b | 1e51 | 1ig3 | 1nhz | 1s1d | 1ytv | 2axn | 2g72 |
| 1aj0 | 1ewf | 1j3j | 1nrg | 1s2c | 1z57 | 2az5 | 2gwh |
| 1aj6 | 1exa | 1j8u | 1of1 | 1s3v | 1zcm | 2b2u | 2h7j |
| 1apy | 1ezf | 1jmo | 1okc | 1sr7 | 1zd3 | 2bat | 2ipx |
| 1aq1 | 1f0x | 1k0e | 1opb | 1sz7 | 1zid | 2bka | 2iwz |
| 1auk | 1f5f | 1kfy | 1oq5 | 1t46 | 1zsq | 2bm2 | 2jis |
| 1b2y | 1fcy | 1ki0 | 1oth | 1t65 | 1zsx | 2bxs | 2oaz |
| 1b3d | 1fj4 | 1kpg | 1p0p | 1uae | 1zx0 | 2c67 | 2ozu |
| 1bj4 | 1fkd | 1ksp | 1p60 | 1uhl | 1zxm | 2cbz | 2p0a |
| 1bj5 | 1g3m | 1kvo | 1ph0 | 1uze | 1zy7 | 2cca | 2p54 |
| 1blc | 1g9v | 1l7z | 1qh5 | 1v97 | 2a1h | 2cjz | 2pk4 |
| 1bwc | 1gkc | 1lo6 | 1qkm | 1w6k | 2a3i | 2cmd | 3fap |
| 1bzm | 1hck | 1lpb | 1qon | 1x9d | 2a5d | 2cmw | 3nos |
| 1c5o | 1hcn | 1lpg | 1r1h | 1x9n | 2aax | 2d0t | |
| 1cjf | 1hrn | 1lxi | 1r5l | 1xap | 2aeb | 2f4j | |
| 1cjy | 1hso | 1mf8 | 1r9o | 1xkk | 2afw | 2f6q | |
| 1d3g | 1hsz | 1mp8 | 1rbp | 1xpc | 2ag4 | 2fbr | |
| 1dfv | 1ht0 | 1mzs | 1ro9 | 1xzx | 2aid | 2fvv | |
| 1dkf | 1hur | 1n52 | 1rsz | 1y6a | 2avd | 2fy3 |
Explained variances of PCA Factors obtained from the MAF Matrix
| Factor | Explained | Cumulative |
|---|---|---|
| 0.1816 | 0.1816 | |
| 0.0768 | 0.2584 | |
| 0.0574 | 0.3158 | |
| 0.0382 | 0.3539 | |
| 0.0322 | 0.3861 | |
| 0.0309 | 0.4171 | |
| 0.0247 | 0.4417 | |
| 0.0236 | 0.4653 | |
| 0.0197 | 0.4850 | |
| 0.0181 | 0.5032 | |
| 0.0169 | 0.5200 | |
| 0.0164 | 0.5364 | |
| 0.0147 | 0.5511 | |
| 0.0139 | 0.5650 | |
| 0.0127 | 0.5777 | |
| 0.0123 | 0.5900 | |
| 0.0118 | 0.6018 | |
| 0.0113 | 0.6131 | |
| 0.0107 | 0.6239 | |
| 0.0105 | 0.6344 | |
| 0.0100 | 0.6443 | |
| 0.0089 | 0.6533 | |
| 0.0087 | 0.6619 | |
| 0.0082 | 0.6702 | |
| 0.0080 | 0.6781 | |
| 0.0078 | 0.6860 | |
| 0.0073 | 0.6933 | |
| 0.0070 | 0.7003 | |
| 0.0069 | 0.7072 | |
| 0.0068 | 0.7139 | |
| 0.0064 | 0.7203 | |
| 0.0063 | 0.7266 | |
| 0.0061 | 0.7327 | |
| 0.0059 | 0.7386 | |
| 0.0058 | 0.7444 | |
| 0.0056 | 0.7500 | |
| 0.0053 | 0.7553 | |
| 0.0052 | 0.7605 | |
| 0.0051 | 0.7656 | |
| 0.0050 | 0.7706 |
The first 40 factors obtained from the factor analysis of the MAF profiles of 154 target proteins are displayed. 30 factors were retained in accordance with the average variance criterion (i.e., explaining individually more than 1/154 = 0.65% of the total variance). They explain cumulatively 71.4% of the total variance.
Figure 1Number of salient loadings across the 30 PCA factors of the MAF matrix. 30 factors were obtained from the matrix of the Molecular Affinity Fingerprints (MAFs) of target proteins by principal component analysis (PCA). The number of salient loadings (i.e., loadings with a value of ≥ 0.4 or ≤ -0.4) varied between 10 and 35 for the individual factors, indicating a simple factor structure since the number of variables in the original MAF matrix was 1,255.
Explained variances of PCA Factors obtained from the PocketPicker descriptor matrix
| Factor | Explained | Cumulative |
|---|---|---|
| 0.3847 | 0.3847 | |
| 0.2359 | 0.6206 | |
| 0.0818 | 0.7024 | |
| 0.0544 | 0.7568 | |
| 0.0524 | 0.8091 | |
| 0.0377 | 0.8469 | |
| 0.0257 | 0.8726 | |
| 0.0180 | 0.8906 | |
| 0.0136 | 0.9042 | |
| 0.0120 | 0.9162 | |
| 0.0100 | 0.9262 | |
| 0.0078 | 0.9340 | |
| 0.0074 | 0.9414 | |
| 0.0057 | 0.9471 | |
| 0.0049 | 0.9520 | |
| 0.0041 | 0.9561 | |
| 0.0038 | 0.9599 | |
| 0.0029 | 0.9628 | |
| 0.0029 | 0.9657 | |
| 0.0028 | 0.9685 | |
| 0.0025 | 0.9709 | |
| 0.0022 | 0.9732 | |
| 0.002 | 0.9752 | |
| 0.0019 | 0.9771 | |
| 0.0017 | 0.9788 | |
| 0.0016 | 0.9804 | |
| 0.0015 | 0.9819 | |
| 0.0012 | 0.9831 | |
| 0.0012 | 0.9843 | |
| 0.0011 | 0.9854 | |
| 0.0009 | 0.9863 | |
| 0.0009 | 0.9872 | |
| 0.0008 | 0.9880 | |
| 0.0007 | 0.9888 | |
| 0.0007 | 0.9895 | |
| 0.0006 | 0.9901 | |
| 0.0006 | 0.9908 | |
| 0.0006 | 0.9913 | |
| 0.0005 | 0.9919 | |
| 0.0005 | 0.9924 |
The first 40 factors obtained from the factor analysis of the geometric features of the binding sites of 154 target proteins are shown. 13 factors were retained in accordance with the average variance criterion (i.e., explaining > 1/154 = 0.65% of the total variance). Cumulatively, they explain 94.1% of the total variance.
Figure 2Number of salient loadings across the 13 PCA factors of the PocketPicker descriptor matrix. 13 factors were obtained from the matrix of geometric features of the binding sites of target proteins by PCA. The number of salient loadings (i.e., loadings with a value of > 0.4 or ≤ -0.4) varied between 42 and 75 for the individual factors which reflect a simple factor structure since the original PocketPicker descriptor matrix contained 405 variables.
Figure 3Superimposed Scree plots based on the MAF fingerprints and the PocketPicker descriptors. Cumulative variance explained by the PCA factors for the geometric descriptor matrix based on PocketPicker (red circle) saturates much faster than the cumulative variance for the MAF profiles (black square), suggesting that the MAF matrix has more complex structure. The first 40 factors of both matrices are plotted.
Canonical correlations and component structure for canonical factor pairs between the MAF and PocketPicker Matrices
| Canonical | Canonical R | F-statistic | p | Structure of Canonical | |
|---|---|---|---|---|---|
| MAF | PocketPicker | ||||
| 0.87 | 2.17 | < 0.0001 | 6, 12, -19 | 5, 8, 9, 10, 11, 12 | |
| 0.84 | 1.74 | < 0.0001 | -7, -15,-16, 28, -30 | 1, 2, -12 | |
| 0.77 | 1.34 | = 0.0004 | -8, 9, 18 | -1, 2, 5, -12 | |
Canonical correlation analysis between the PCA factors of the MAF profiles of target proteins and the geometric characteristics of their respective binding sites indicated a statistically significant association for 3 pairs of canonical factors. PCA factors of the MAF and the PocketPicker matrices with salient canonical loading (> 0.25 or < -0.25) are shown for each of these canonical factor pairs. (Negative signs indicate negative loading.)
Results of the canonical redundancy analysis
| Variance of the MAF Variables Explained by | |||||
|---|---|---|---|---|---|
| Canonical | Their Own Canonical | Canonical | The Opposite | ||
| Proportion | Cumulative | Proportion | Cumulative | ||
| 0.0333 | 0.0333 | 0.7638 | 0.0255 | 0.0255 | |
| 0.0333 | 0.0667 | 0.7122 | 0.0237 | 0.0492 | |
| 0.0333 | 0.1000 | 0.5852 | 0.0195 | 0.0687 | |
| 0.0333 | 0.1333 | 0.4275 | 0.0142 | 0.0830 | |
| 0.0333 | 0.1667 | 0.3403 | 0.0113 | 0.0943 | |
| 0.0333 | 0.2000 | 0.2952 | 0.0098 | 0.1041 | |
| 0.0333 | 0.2333 | 0.2362 | 0.0079 | 0.1120 | |
| 0.0333 | 0.2667 | 0.1811 | 0.0060 | 0.1181 | |
| 0.0333 | 0.3000 | 0.1238 | 0.0041 | 0.1222 | |
| 0.0333 | 0.3333 | 0.1168 | 0.0039 | 0.1261 | |
| 0.0333 | 0.3667 | 0.0833 | 0.0028 | 0.1288 | |
| 0.0333 | 0.4000 | 0.0180 | 0.0006 | 0.1294 | |
| 0.0333 | 0.4333 | 0.0129 | 0.0004 | 0.1299 | |
| 0.0769 | 0.0769 | 0.7638 | 0.0588 | 0.0588 | |
| 0.0769 | 0.1538 | 0.7122 | 0.0548 | 0.1135 | |
| 0.0769 | 0.2308 | 0.5852 | 0.0450 | 0.1586 | |
| 0.0769 | 0.3077 | 0.4275 | 0.0329 | 0.1914 | |
| 0.0769 | 0.3846 | 0.3403 | 0.0262 | 0.2176 | |
| 0.0769 | 0.4615 | 0.2952 | 0.0227 | 0.2403 | |
| 0.0769 | 0.5385 | 0.2362 | 0.0182 | 0.2585 | |
| 0.0769 | 0.6154 | 0.1811 | 0.0139 | 0.2724 | |
| 0.0769 | 0.6923 | 0.1238 | 0.0095 | 0.2819 | |
| 0.0769 | 0.7692 | 0.1168 | 0.0090 | 0.2909 | |
| 0.0769 | 0.8462 | 0.0833 | 0.0064 | 0.2973 | |
| 0.0769 | 0.9231 | 0.0180 | 0.0014 | 0.2987 | |
| 0.0769 | 1.0000 | 0.0129 | 0.0010 | 0.2997 | |
Proportion of the variance of PCA factor sets (yielded by the MAF and the PocketPicker matrices, respectively) explained by the canonical variates obtained from the same and from the other matrix, respectively. According to the canonical correlation analysis, the first 3 canonical variables reached significance.
Figure 4Visual summary of the results of canonical correlation between the MAF and PocketPicker descriptor matrices. A. Three statistically significant canonical factor pairs were obtained with the correlation values of 0.87, 0.84 and 0.77, respectively. Canonical correlation (R value) for each factor pair is shown in the middle part. Representative molecules for the MAF factors are shown on the left panel (orange and blue background for positive and negative salients, respectively). Distribution of PocketPicker salients is shown on the right panel. The six different buriedness levels are represented by the letters A-F, with F representing the highest level of buriedness while distance parameters were collected into three groups (1-7 Å, 8-14 Å, 15-20 Å). Orange and blue colors stand for the positive and negative salients, respectively. White blocks represent the absence of a given descriptor pair within a given distance. See text for the details. Abbreviations: BZDs: benzodiazepines; Morph.: morphine derivatives; Barb.: barbiturates; PPIs: proton pump inhibitors; Phen: phenotiazines; TCAs: tricyclic antidepressants. B. Shapes of protein binding pockets represented with high scores among the first three canonical factor pairs. Positive and negative salients are represented by orange and blue boxes. Binding site shapes are represented with colored balls positioned in a 1Å-spaced grid with deeper blue representing a higher level of buriedness. Protein surfaces were removed for better view of the binding pockets in most cases excluding flat, surface sites e.g. 2pk4. Proteins of the positive salients of factor III have narrow, deep binding pockets while negative salients contain shallow, small pockets (1aj6, 1apy) and wide, extensive binding sites (2fvv, 3fap). Factor II proteins can be described as having binding sites of medium size and width. Based on the distribution of salient loadings of PocketPicker variables, factor I proteins do not form a coherent group. Elongated (1d3g), branching (1zsx, 2p0a) and bulky binding sites (2cca) belong to this factor.