| Literature DB >> 19534738 |
Helena Strömbergsson1, Gerard J Kleywegt.
Abstract
BACKGROUND: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19534738 PMCID: PMC2697636 DOI: 10.1186/1471-2105-10-S6-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Number of PDB chains bound to each ligand. The number of non-redundant PDB chains is plotted for ligand 10–1000 in the structural dataset. All ligands in complex with more than 100 chains (red dotted line) were checked manually.
Ligand descriptors
| Abbreviation | Description |
| MW | molecular weight |
| Sv | sum of atomic van der Waals volumes |
| Se | sum of atomic Sanderson electronegativites |
| Sp | sum of atomic polarizabilities |
| Mv | mean atomic van der Waals volume |
| Me | mean atomic Sanderson electronegativity |
| nAT | number of atoms |
| nSK | number of non-hydrogen atoms |
| nBT | number of bonds |
| nBO | number of non-hydrogen bonds |
| nBM | number of multiple bonds |
| ARR | aromatic ratio |
| nCIC | number of rings |
| RBN | number of rotatable bonds |
| RBF | rotatable bond fraction |
| nDB | number of double bonds |
| nAB | number of aromatic bonds |
| nC | number of carbon atoms |
| nN | number of nitrogen atoms |
| nO | number of oxygen atoms |
| nX | number of halogens |
| nBnz | number of benzene rings |
| nCar | number of aromatic carbon atoms |
| nRCONH2 | number of primary amides |
| nROH | number of aliphatic hydroxyl groups |
| nArOH | number of aromatic hydroxyl groups |
| nHDon | number of hydrogen bond donors |
| nHAcc | number of hydrogen bond acceptors |
| Ui | unsaturation index |
| Hy | hydrophilic factor |
| AMR | Ghose-Crippen molar refractability |
| TPSA(Tot) | topological polar surface area |
| ALOGP | Ghose-Crippen octanol-water partition coefficient |
| LAI | Lipinski alert index |
Results from PCA models on the PDB and DrugBank dataset.
| Protein | 147 | 0.695 | 0.554 | 10 |
| Ligand | 35 | 0.798 | 0.729 | 4 |
| Ligand + Protein | 182 | 0.638 | 0.552 | 10 |
This table contains the results obtained from a principal component analysis on the dataset, described by protein or ligand descriptors in isolation, and by the combination of protein and ligand descriptors. The number of descriptors (#descriptors) is displayed along with the fraction of explained (R2X) and predicted (Q2) variation captured by the components (#components).
Figure 2PCA model projections. This figure shows scatter plots of first three principal components (c1, c2, and c3) of the PDB (blue) and DrugBank (red) interaction datasets. For each PCA model, the goodness of fit (R2X) for the three principal components is shown. The plots are based on (A) protein descriptors, (B) ligand descriptors, and (C) protein and ligand descriptors.
Nearest-neighbor-based overlap between datasets.
| Protein | DrugBank | 20 | 80 |
| Protein | PDB | 94 | 6 |
| Ligand | DrugBank | 19 | 81 |
| Ligand | PDB | 93 | 7 |
| Protein-ligand | DrugBank | 39 | 61 |
| Protein-ligand | PDB | 86 | 14 |
This table contains the percentage nearest-neighbor (NN) in the tree PCA models based on protein descriptors, ligand descriptors and protein-ligand descriptors. The NN overlap is reported for DrugBank vs. PDB and vice versa.
Figure 3DrugBank cross-interaction study. The percentage captured cross interactions is plotted against the number of checked neighbours. The blue data series was computed from the protein-ligand PCA model and the red series was computed from the protein PCA model.
Figure 4A cross interaction case study of P41594 in complex with acamprosate. The five nearest neighbours of the complex of human metabotropic gluatamate receptor 5 (P41594) and acamprosate, according to our pretein-ligand model. The protein name, percentage sequence identity to P41594, Tanimoto score of its ligand and acamprosate as well as the nearest neighbour distance between the two complexes is reported for each neighbour.