| Literature DB >> 35227289 |
Ammar Ammar1, Rachel Cavill2, Chris Evelo3, Egon Willighagen3.
Abstract
A key concept in drug design is how natural variants, especially the ones occurring in the binding site of drug targets, affect the inter-individual drug response and efficacy by altering binding affinity. These effects have been studied on very limited and small datasets while, ideally, a large dataset of binding affinity changes due to binding site single-nucleotide polymorphisms (SNPs) is needed for evaluation. However, to the best of our knowledge, such a dataset does not exist. Thus, a reference dataset of ligands binding affinities to proteins with all their reported binding sites' variants was constructed using a molecular docking approach. Having a large database of protein-ligand complexes covering a wide range of binding pocket mutations and a large small molecules' landscape is of great importance for several types of studies. For example, developing machine learning algorithms to predict protein-ligand affinity or a SNP effect on it requires an extensive amount of data. In this work, we present PSnpBind: A large database of 0.6 million mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow. It provides a web interface to explore and visualize the protein-ligand complexes and a REST API to programmatically access the different aspects of the database contents. PSnpBind is open source and freely available at https://psnpbind.org .Entities:
Keywords: AutoDock Vina; Binding affinity; Binding pocket; Database; Mutation effect; REST API; SNP; Virtual screening
Year: 2022 PMID: 35227289 PMCID: PMC8886843 DOI: 10.1186/s13321-021-00573-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
List of databases related to SNPs effect analysis and visualization
| MSV3d [ | 2016, Not downloadable, (web only) | Mutated structures built using MODELLER. The website also contains conservation and physio-chemical changes. SwissVar and dbSNP are the main sources of SNPs |
| PinSnps [ | 2013 | Exploring the impact of SNPs on Protein Domains and Complexes |
| LS-SNP/PDB [ | 2009, Not available anymore | |
| G23D [ | 2016, Not downloadable, (web only) | Used software: SCCCOMP and SCWRL for Side chain modeling, JSmol for molecular graphics, I-mutant and FoldX for thermostability prediction |
| SNPs3D [ | 2008, Not downloadable | SNP impact on protein structure and function. A Support Vector Machine (SVM) model was used to find the separation pattern between a set of diseases and non-deleterious SNPs. The resulting pattern was then validated using a different set of diseases and non-deleterious SNPs. |
| SAAPdb [ | No longer maintained | A newer project SAAPdap/SAAPpred is available - analysis pipeline for examining the structural effects of mutations/prediction of pathogenicity. |
| SNP2Structure [ | Not available anymore | |
| PhyreRisk [ | 2019, Not downloadable (web only) | A dynamic web application to bridge genomics, proteomics and 3D structural data to guide interpretation of human genetic variants. |
| toposnp [ | 2019, Databases are up to date, Not downloadable, (web only) | Topographic mapping of Single Nucleotide Polymorphism |
| coliSNP [ | Not available anymore | |
| StructMAn [ | 2016, Not downloadable (web only) | Annotation of non-synonymous single-nucleotide polymorphisms (nsSNPs) in the context of the structural neighbourhood of the resulting amino acid variations in the protein. |
Fig. 1Methodology workflow. Steps 1, 2 and 3 filter the data from the main sources and map them together. Step 4 and 5 prepare the selected protein PDBs and their mutated versions for docking. Step 6 prepares the ligands. Step 7 performs the docking
Example list of CASF human proteins variants selected from the UniProt variants dataset
| O14757 | p.Leu92Phe | 3jvr | L | F | 92 | A |
| O14757 | p.Phe93Val | 3jvr | F | V | 93 | A |
| O14757 | p.Ile96Val | 3jvr | I | V | 96 | A |
| O14757 | p.Gly101Cys | 3jvr | G | C | 101 | A |
| O14965 | p.Gly140Ala | 3up2 | G | A | 140 | A |
Fig. 2UpSet plot showing the availability of mutation types across the selected PSnpBind proteins. X-axis shows the number of proteins having the corresponding intersection between the mutation types. Y-axis shows the number of proteins having each mutation type
Fig. 3Gromacs energy minimization flowchart
Summary of selected protein structures, mutations, ChEMBL selected ligands, and the number of dockings ordered by the PDB ID
| # | # | # | ||
|---|---|---|---|---|
| P00749 | 1owh | 38 | 1901 | 72225 |
| P11309 | 2c3i | 18 | 1240 | 22316 |
| P18031 | 2hb1 | 18 | 419 | 7531 |
| P03372 | 2pog | 13 | 7017 | 91214 |
| P00918 | 2weg | 22 | 1013 | 22281 |
| P00742 | 2y5h | 33 | 667 | 22010 |
| P07900 | 3b27 | 21 | 1954 | 41023 |
| P10275 | 3b5r | 83 | 466 | 38671 |
| P39086 | 3fv1 | 43 | 345 | 14782 |
| O14757 | 3jvr | 11 | 631 | 6933 |
| P24941 | 3pxf | 10 | 505 | 5044 |
| P37231 | 3u9q | 20 | 606 | 12114 |
| P56817 | 3udh | 5 | 2127 | 10635 |
| O14965 | 3up2 | 18 | 895 | 16109 |
| P00734 | 3utu | 27 | 1796 | 48492 |
| P03951 | 4crc | 50 | 690 | 34496 |
| Q16539 | 4dli | 9 | 1320 | 11878 |
| P23458 | 4e5w | 9 | 1090 | 9801 |
| P39900 | 4gr0 | 38 | 1090 | 41419 |
| Q9H2K2 | 4j21 | 17 | 3295 | 56001 |
| O60674 | 4jia | 7 | 848 | 5930 |
| Q08881 | 4m0y | 19 | 169 | 3197 |
| P00519 | 4twp | 26 | 795 | 20662 |
| O60885 | 4wiv | 3 | 917 | 2747 |
| P04637 | 5a7b | 160 | 113 | 17996 |
| Q9Y233 | 5c28 | 13 | 352 | 4567 |
| Total | 731 | 32261 | 640074 |
Fig. 4Docking performance—Duration versus number of torsion angles
Fig. 5Docking performance—CPU usage versus number of torsion angles
Fig. 6PSnpBind web interface
Fig. 7ligand contacts visualization using Jmol. The figure shows the nearest contacts of the ligand. The disks indicate where the van der Waals radii of atoms overlaps. The colors indicate how close the contact is: yellow = close, orange = touching, and red = overlapping
Fig. 8JSON-LD markup example for a PSnpBind protein page, the shemas.org and bioschemas.org vocabularies are used to describe the protein, providing information about the structure, sequence, taxon and IDs
Summary of the FAIR principle and their implementation status in PSnpBind
| F1. (Meta)data are assigned a globally unique and persistent identifier | Yes | Internal UUID is generated for each instance of proteins, mutations, ligands and dockings. The database as a whole. the web application, and the libraries made for executing the steps of the workflow are all preserved through Zeonodo with a DOI assigned to each of them. |
| F2. Data are described with rich metadata | Yes | All instances are annotated and well described from the relevant sources (PDB, UniProt, NCBI Taxon and ChEMBL) |
| F3. Metadata clearly and explicitly include the identifier of the data they describe | Yes | |
| F4. (Meta)data are registered or indexed in a searchable resource | Yes | The dataset will be submitted to re3data.org and Google Dataset |
| A1. (Meta)data are retrievable by their identifier using a standardised communications protocol | Yes | HTTP(S) protocol is used with a REST API for all communications with the server |
| A2. Metadata are accessible, even when the data are no longer available | Yes | In progress |
| I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. | Yes | JSON-LD is used to describe main protein entities. The REST API adopts the OpenAPI specification v3 and it is described using Swagger. |
| I2. (Meta)data use vocabularies that follow FAIR principles | Yes | The structured markup (JSON-LD) uses the schema.org and bioschema.org vocabularies. |
| R1. (Meta)data are richly described with a plurality of accurate and relevant attributes | Yes | License, usage and provenance info are all provided. |