| Literature DB >> 24763918 |
Praveen Anand1, Deepesh Nagarajan, Sumanta Mukherjee, Nagasuma Chandra.
Abstract
Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein- ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of ∼68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC). Database URL: http://proline.biochem.iisc.ernet.in/PLIC.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24763918 PMCID: PMC3998096 DOI: 10.1093/database/bau029
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.PLIC database workflow. The flowchart illustrates the different steps involved in the construction of the PLIC database. All the protein–ligand complexes are downloaded from the PDB, and binding sites (comprising all the residues that are within 4.5 Å of any ligand atom) are extracted. Only the biologically relevant ligands are selected that resulted in 84 846 binding sites. An exhaustive all-versus-all comparison of these 84 846 binding sites is performed using PocketMatch, and a binding site similarity network is constructed at a PMAX cutoff of 0.8. Network-based clustering of binding sites is performed using the MCL algorithm to obtain clusters of similar binding sites. All the different attributes that are calculated for the interactions within the clusters along with computational tools that were used to derive them are mentioned in the box.
Attributes of interaction
| Attribute type | Attribute name | Computational tools used |
|---|---|---|
| Binding site descriptors | Pocket volume |
fpocket LPC |
| Number of alpha spheres | ||
| Mean alpha sphere radius | ||
| Proportion of apolar alpha spheres | ||
| Mean local hydrophobic density | ||
| Hydrophobicity scores | ||
| Volume score | ||
| Charge score | ||
| Proportion of polar atoms | ||
| Alpha sphere density | ||
| Max. distance between COM and alpha sphere | ||
| Max. theoretical shape complementarity | ||
| Observed shape complementarity | ||
| Normalized shape complementarity | ||
| Binding site environment and binding energetics | Autodock score |
Autodock4.1 EasyMIFs SiteHound |
| Electrostatic score | ||
| Hydrogen bond score | ||
| van der Waal score | ||
| Desolvation score | ||
| Torsional score | ||
| Average methyl probe (CMET) interaction energy | ||
| Total CMET interaction energy | ||
| Total CMET interaction grids | ||
| Total CMET interaction clusters | ||
| Average phosphate probe (OP) interaction energy | ||
| Total OP interaction energy | ||
| Total OP interaction grids | ||
| Total OP interaction clusters | ||
| Average hydroxy probe (OA) interaction energy | ||
| Total OA interaction energy | ||
| Total OA interaction grids | ||
| Total OA interaction clusters | ||
| Average aromatic probe (CR1) interaction energy | ||
| Total CR1 interaction energy | ||
| Total CR1 interaction grids | ||
| Total CR1 interaction clusters | ||
| Ligand–protein contacts | Hydrogen bonds |
Ligplot+ LPC |
| Aromatic | ||
| Hydrophobic | ||
| Destabilizing | ||
| Donor water molecules at the binding site | ||
| Acceptor water molecules at the binding site |
All the attributes that are calculated for each of the interactions present in the database along with the computational tools used to derive them have been listed in this table.
Figure 2.The EER of the PLIC database. The EER of different data types in PLIC is shown. The database consists of 13 tables, and the relationship between these tables is depicted here. The logical partition indicating the type of information is highlighted and labeled with different colors.
Figure 3.Database statistics. (A) The frequency of different ligand-binding sites present in the database is represented in the form of a histogram. The most populated ligands are labeled along with their frequencies. (B) The number of interactions present per CATH superfamily is depicted in the form of a histogram. The CATH superfamilies associated with most number of ligands are labeled. The pie charts depict the distribution of different (C) enzyme classes and (D) SCOP classes present in the database.
Figure 4.PLIC database server. (A) Snapshot of the query page for the PLIC database. (B) The page displaying the results of the query in the tabular form containing information about the name of the binding site, protein, ligand, UniprotID, EC number and CATH superfamily ID. (C) The results page displayed after a specific binding site name is clicked. The results page consists of Jmol plug-in for visualization of interactions, clusters indicating high-energy interaction zones for different probes, alignment of binding sites within the cluster, similar sites with PocketMatch scores, cluster information and various attributes associated with the interaction. (D) Barplot illustrating the distribution of various residues within the binding site environment of the cluster and box plots indicating the variations observed in different attributes of interactions within the cluster are displayed on the cluster analysis page.