Literature DB >> 25685322

ABS-Scan: In silico alanine scanning mutagenesis for binding site residues in protein-ligand complex.

Praveen Anand¹, Deepesh Nagarajan¹, Sumanta Mukherjee², Nagasuma Chandra¹.

Abstract

Most physiological processes in living systems are fundamentally regulated by protein-ligand interactions. Understanding the process of ligand recognition by proteins is a vital activity in molecular biology and biochemistry. It is well known that the residues present at the binding site of the protein form pockets that provide a conducive environment for recognition of specific ligands. In many cases, the boundaries of these sites are not well defined. Here, we provide a web-server to systematically evaluate important residues in the binding site of the protein that contribute towards the ligand recognition through in silico alanine-scanning mutagenesis experiments. Each of the residues present at the binding site is computationally mutated to alanine. The ligand interaction energy is computed for each mutant and the corresponding ΔΔG values are calculated by comparing it to the wild type protein, thus evaluating individual residue contributions towards ligand interaction. The server will thus provide a ranked list of residues to the user in order to obtain loss-of-function mutations. This web-tool can be freely accessed through the following address: http://proline.biochem.iisc.ernet.in/abscan/.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 25685322 PMCID： PMC4319546 DOI： 10.12688/f1000research.5165.2

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Currently (as of April 3, 2014) [1] there exist more than 72000 experimentally determined protein structures complexed with small molecule ligands, providing an extensive data resource on protein binding sites. These binding sites vary in size ranging from six to thirty residues depending upon the size and the nature of the ligand. In most cases, the contribution of the individual amino acids towards the binding of a given ligand is not well understood. A well-established method of demonstrating the importance of a residue at the site is to create point mutants through site-directed mutagenesis [2]. Efforts towards characterization of entire functional site include tools such as alanine scanning mutagenesis (ASM) [3] where each residue is mutated to an alanine and its effect on the function is evaluated. ASM is indeed a well-used technique in experimental biology and has been successfully applied to the problems of protein folding and stability [4], protein-protein [5, 6], and protein-ligand [7] interactions. The experimental success of this technique has resulted in further developments, including high-throughput and low-cost variants [8], greatly expanding its reach. Yet, given the time, cost and effort required for carrying out experimental biochemistry, a large majority of proteins are yet to be studied through this method. Due to availability of a variety of structural bioinformatics tools, it is now feasible to carry out alanine scanning mutagenesis computationally [9]. Spurred by the successes and widespread adoption of the ASM technique, various computational resources now exist for in-silico alanine scanning. Prominent examples include Modeller [10] and the Rosetta software suite [11]. However, most packages are command-line oriented and are out of reach for researchers. Alanine scanning webservers with intuitive user interfaces such as Robetta webserver [12], the Rosetta Design web-server [13], ROSIE [14], FOLDX [15], BeATMuSiC [16], DrugScore PPI [17] exist for the problems of protein folding, protein stability and protein-protein interactions. Although, there are workflows to evaluate ligand-binding energetics which require significant computational time and setup through free-energy calculations involving Molecular Mechanics/Generalized Born Surface Area method (MM-GBSA) [18– 20], there is however, no intuitive web-tool available for analyzing alanine-scanning mutations of small-molecule binding site residues in real time. A common requirement for an experimental biochemist is to identify which amino acids to mutate in the protein to generate loss-of-function mutants. A web-tool to cater to that specific need will therefore be highly useful. The analysis will also provide deep insights into critical residues for interaction, residue pairs or sets that when mutated will abolish ligand binding and provide analytical insights for lead refinement in the process of drug discovery, as well as understand drug resistance due to mutations. We present a computational workflow and webserver, Alanine Binding Site-Scan (ABS-Scan), for automated alanine-scanning mutagenesis of protein-ligand interface residues. The workflow combines the libraries of widely used software packages including Modeller [10] for site-specific alanine mutagenesis and Autodock [21] for energetic evaluation of protein-ligand complexes.

Workflow

This workflow allows a user to submit a protein-ligand complex of their interest ( Figure 1). The user is provided with an option of selecting a distance cut-off to define the binding site around a specific ligand for which, in-silico alanine scanning mutagenesis is carried out. Once the input parameters are obtained, the Modeller library is used to perform site-specific mutagenesis on all selected residues, coupled with steps of energy minimization [22]. This consists of initial steps of conjugate gradient (200 iterations with minimum atom shift of 0.001Å), followed by 200 steps of molecular dynamics simulation with steepest descent carried out at different temperatures. The initial restraints for the mutated model are derived from the wild-type protein structure. The analysis and results derived from alanine scanning mutagenesis relies on two assumptions: (a) The introduced point mutation does not drastically change the structure of the protein and (b) the mode of ligand interaction in point mutant is the same in comparison to wild-type complex. Care is taken to ensure that there are no steric clashes between the protein/ligand atoms during the process of minimization. The quality of the protein structures generated is estimated through Discrete Optimized Protein Energy (DOPE) score [23], a statistical potential score that is calculated for each of the mutant. This scoring scheme is based on the improved reference consisting of non-interacting atom pairs in a homogenous sphere with radius dependent on sample native structure. The score therefore reflects the feasibility of interactions and the compactness of the modeled structure.

Figure 1.

ABS-Scan workflow.

Flowchart depicting various steps involved in ABS-Scan.

ABS-Scan workflow.

Flowchart depicting various steps involved in ABS-Scan. Each mutated structure, will then be scored by using Autodock 4.1 force field [21], to calculate the energetics of a protein-ligand complex. The force-field is used here only to score the pose of protein-ligand interaction and no docking is performed. By default, ‘ check_hydrogens’ flag is kept ‘on’ while preparing the receptor and Gasteiger charges are used for proteins and ligand. The contribution from a protein residue is determined by difference in interaction score of mutant and wild-type protein (∆∆G value). These results are graphically presented to the user, along with a ranked list of residues in the given site that could be experimentally explored for site-directed mutagenesis. A Jmol applet displays protein-ligand interactions with residues colored according to the computed extents of contribution towards interaction, while a table simultaneously displays inter-molecular energy scores. We also provide a help-section explaining the results along with selected examples.

Validation and case studies

We evaluate the significance of ∆∆G score used to assess the contribution of individual residues at the binding site by systematically analyzing two different datasets. The first dataset was derived from CSAR Community Structure-Activity Resource (CSAR - www.csardock.org/). Decoys in this dataset contain artificial docked complexes of protein with ligands having similar chemical properties to native ligands, but known not to interact with the protein. The protocol could be successfully applied on 288 of 343 protein-ligand native and decoy complexes. The distribution of average ∆∆G scores obtained through ABS-Scan analysis for residues in the binding site for decoy dataset is seen to be different from the native protein-ligand complexes ( Figure 2A & B). An average ∆∆G score of 0.395 was obtained for the native protein-ligand complexes. The second dataset we used to obtain an estimate of ∆∆G score is derived from PDBbind database [24] and comprises 195 protein-ligand complexes (PDBbind core dataset). Around 135 of these protein-ligand complexes could be successfully processed using ABS-Scan workflow. In this case, an average ∆∆G score of 0.387 was observed for each mutated residue at the binding site. Hence, to determine the sensitivity of ABS-Scan, a cut-off of 0.5, which is a more stringent value, is chosen. ABS-Scan is seen to effectively discriminate between the decoy and the native complexes of CSAR dataset (p-value ~0.004 calculated with Student’s t-test) in ~67% of the cases (∆∆G ≥ 0.5). This clearly indicates that residues important for ligand interaction can be identified through this protocol ( Figure 2C). The detailed results of ∆∆G scores obtained for each of the mutation produced at the binding site for both these datasets can be accessed from the web-resource - http://proline.biochem.iisc.ernet.in/abscan/validation.

Figure 2.

ABS-Scan Sensitivity.

ABS-Scan Sensitivity.

( A) The average ∆∆G score per residue distribution from the cognate and decoy protein-ligand complexes of CSAR dataset. ( B) The scatter plot displaying the average ∆∆G score for native and the corresponding decoy complexes from the CSAR dataset. ( C) Boxplot showing the difference in the % of the residues in the binding site of cognate and decoy complexes having a predicted ∆∆G score ≥ 0.5. A suitable dataset for validation would be one that reports binding affinities for both wild-type and mutant proteins with same ligand, performed in a uniform experimental environment, for large number of proteins. Although such a dataset exists for protein-protein alanine scanning mutagenesis [12, 25], there are none reported for protein-ligand interactions. In order to compare the predictions of ABS-Scan with the experimentally reported alanine-scanning mutations, a methodical search was carried out to mine all the experimental results available in literature on alanine-scanning mutagenesis of residues at the binding site. Advanced search option in PDB was used for this purpose. All the PUBMED extracts were scanned for the term - "alanine scanning". The above search criteria mentioned yielded 126 structure hits with 56 citations. The list of entries obtained, was further pruned to remove biologically irrelevant ligands, metal ions and modified residues. The list of 79 entities/binding sites that we finally obtained can be accessed at http://proline.biochem.iisc.ernet.in/abscan/validation. Alanine scanning could be successfully undertaken for 54 of these structures. On an average, atleast two residues per binding site were predicted to have ∆∆G score ≥ 0.5. The details of the dataset and the ranked lists of residues in the order of their contribution to ligand binding identified for all the complexes is made available to the community - http://proline.biochem.iisc.ernet.in/abscan/validation. Each of the above experiments involving alanine-scanning mutagenesis reports different mutant evaluation scores. The measures reported to test the fitness of the mutants include various attributes such as K d, K a, k cat/K M (for enzymes), specific substrate /product assays etc. These measures cannot be normalized to derive values having uniform units for direct comparison. We describe three such examples here, each with different experimentally reported mutant evaluation scores and the predicted ∆∆G values for the same as case studies to highlight the heterogeneity associated with the data. A study on testosterone binding site of rat 3-alphahydroxysteroid dehydrogenase (PDBID: 1AFS) by Heredia et al. [26] reports that binding site residue in direct contact with the ligand influences the rate determining step of the enzymatic reaction. In this case, the alanine scanning experiments performed on the residues in the binding site that recognize progesterone and testosterone reports the Kd values. The ABS-Scan analysis performed on 3-alpha hydroxysteroid dehydrogenase in complex with both testosterone and progesterone also predicted the residues W227 (∆∆G score = 1.43; Kd = 10.7±1.2), Y310 (∆∆G score = 1.31; Kd = 9.20±0.94), L54(∆∆G score = 0.5696; Kd = 7.24±0.79) to be important for ligand recognition. A good correlation was observed (0.829 for testosterone and 0.704 for progesterone) between the reported Kd value of the mutants and the corresponding predicted ∆∆G score. A two-dimensional alanine scanning mutations were performed to understand the structure-function relationship between vitamin-D receptor (PDBID: 1IE9) and vitamin-D analogs by Shimizu et al. [27]. Since there was no structural information available for the analogs complexed to vitamin-D receptor, four of the vitamin-D analogs were docked on the receptor at the vitamin-D native binding site using Rosetta 3.4 docking protocol [28]. All the poses obtained were analyzed using ABS-Scan to determine the residues crucial for interaction of particular ligands. Since this is a nuclear receptor protein, a transcriptional activity assay was used in original study to evaluate the effect of mutants generated. The effect of each vitamin-D receptor mutant was measured by the downstream transactivation assay that quantifies luciferase activity under the influence of VDR (Vitamin-D Receptor promoter) promoter sequence. In this case, if the mutation affects the binding of ligand, correspondingly the expression of luciferase would reduce by a factor that can be quantified. A good negative correlation was also observed with all the four analogs complexed to vitamin D-receptor and atleast four residues - L233, W286, R274 and H397, important for interaction with all the analogs had ∆∆G score > 0.5. L233 and W286, present in H3 (helix 3) and β sheet are reported to have hydrophobic interactions with B and C rings of the ligand whereas R274 present in H4 (helix 4) is observed to have hydrogen bond interaction with 1α-OH group of the ligand. A similar study was carried out on human trimethyl-guanosine synthase enzyme (Tgs1) that converts m 7G caps (7-methyl guanosine caps) to 2,2,7-trimethylguanosine (TMG) caps. In the original study [29] around 37 point mutations were introduced into human Tgs1 (PDBID: 3GDH) to study the interaction profile with mGTP (7-methyl guanosine tri-phosphate) and AdoMet (S-adenosyl methionine). The fitness of mutants generated in this case was evaluated by using the methyltransferase assay that determines the percentage of methylation by quantifying the levels of m 7GDP to m 2,7GDP. The residues - R807 and K646, reported to be the most affected mutants, are also predicted by ABS-Scan to be essential, with the highest predicted ∆∆G score of 3.63 and 3.39 respectively. These positively charged residues (R807 and K646) are observed to interact with α and β phosphate groups of m 7GTP. The π -cation stacking observed between W766 and the m 7G was also predicted to be crucial (∆∆G score of 2.66) and correspondingly no methylated products were detected for this mutant through methyl-transferase assay. The details of the case-studies described above along with the results of the analysis can be accessed on the example section of the web-tool - http://proline.biochem.iisc.ernet.in/abscan/examples.

Implementation

The web-server was implemented using hypertext preprocessor (PHP). Autodock, Modeller and Pymol libraries have been used for modeling the mutation and evaluating the energetics. Integration of these back-end libraries for presentation as a functional and intuitive user interface is accomplished using Shell, Python, Java, HTML and PHP scripts. The web-server is platform independent and will run on any machine having internet access with browser installed. For the advanced users, a command-line interface in the form of a single python script can be accessed from github repository ( ). The script has been tested on Intel 2.83 GHz quad-core system running 32 bit linux OS(Ubuntu 12.04) with Modeller [10], MGL AutodockTools [30] & Pymol ( http://pymol.org) installed. For the web-server d3.js library has been used for displaying the plots. Jmol Applet has been used to visualize the protein-ligand interaction.

Input

The input required for the server is the structure of a protein-ligand complex in PDB format. Users can either provide the four-letter PDBID or upload the PDB structure file of the complex. An option is provided to define the cut-off distance and select the ligand to obtain binding site residues which would be mutated to alanine for evaluating the interaction energetics. A default distance cut-off of 4.5 Å is set to select all the residues whose atoms lie within this distance from any ligand atom. In some the cases, metal ions [31] and water molecules are observed to play a crucial role in stabilizing the interactions [32]. A major problem involved in incorporating the ligand metal ion in ABS-Scan worflow is fixing the charge parameter as metal atoms can have different ionic states (Ex. Fe 2+, Fe 3+ etc.) which is important for evaluating energetics. Enumerating all important structural water molecules involved in the ligand interaction is also highly dependent on the resolution of the crystal structure. Hence, an advanced option is provided to the user for uploading the PDBQT format of the ligand, to account for cases where the ligand contains unusual atom types, metal ions or uses bridge-water molecules for interaction. For practical purposes, the bridge water molecules can be considered to be the part of ligand and these can be incorporated into the pdbqt file of the ligand. As an example, ABS-Scan analysis was carried out on protein lysine methyltransferase (PDB: 3S7B) complexed with S-adenosyl methionine [33] through four bridge water molecules. These four bridge-water molecules can be incorporated into the ligand pdbqt file and uploaded with the help of an advanced option provided on the server. The protocol correctly identified GLU135 and ASN182 as significant contributors to ligand binding through formation of water bridges. The output can be accessed through the example section of the web-server.

Output

All the results produced by ABScan can be visualized interactively on the web-server. Jmol Applet is used to visualize the contribution of residues towards ligand interaction ( Figure 3).

Figure 3.

ABS-Scan interactive display.

ABS-Scan interactive display.

Snapshot explaining the Jmol applet output on the ABScan server. The individual residues are colored in red to blue gradient depending upon the contribution towards the ligand interaction as predicted by ABScan ∆∆G score. Options to visualize the different kinds of interaction - polar, hbonds etc. is also provided. d3.js library has been utilized to plot the predicted ∆∆G values and subcomponents of the energetic scores reported by Autodock4 ( Figure 4). An option is provided to download publication quality images in SVG/PDF/PNG formats. Twitter bootstrap java library is used for framework development on the webserver. An option is also provided to download the raw files containing individual mutants in PDB format, ∆∆G scores in the raw CSV format along with autodock energy scores.

Figure 4.

ABS-Scan energy plots.

( A) ∆∆G values reported for each of the alanine mutation performed for the residues present at the binding site. The residues are ordered according to their contribution/∆∆G values. ( B) The different energy component of autodock interaction score plotted for each of the alanine mutant produced at the binding site.

ABS-Scan energy plots.

Conclusions

ABS-Scan webserver can provide valuable insights on molecular recognition involving protein-ligand interactions. Experimentally determined protein-ligand structures can be studied to understand individual residue contributions towards ligand binding. Modeled complexes can also be submitted to infer the feasibility of the interaction. We believe that ABS-Scan would add one more dimension to the analysis of binding sites in proteins, comparison of various ligand interactions and be of importance to researchers performing ASM studies.

Software availability

Software access

http://proline.biochem.iisc.ernet.in/abscan/

Latest source code

https://github.com/praveeniisc/ABS-Scan

Source code as at the time of publication

https://github.com/F1000Research/ABS-Scan/releases/tag/V1.0

Archived source code as at the time of publication

http://dx.doi.org/10.5281/zenodo.12806 [34]

Software license

ABS-Scan is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. No more comments. This version of the manuscript has been improved on several lines and the reply by the authors are acceptable. The paper is approved. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. In their manuscript entitled " ABS–Scan: In silico alanine scanning mutagenesis for binding site residues in protein–ligand complex", the authors have discussed the development of an in silico workflow for the prediction of important residues for ligand recognition in a given protein-ligand complex. In our opinion, the ABS-Scan tool has been designed and validated satisfactorily. The online web-server is intuitive, fast and easy to use. In the present version of the paper and the software tool, the authors have incorporated the recommendations provided by the previous referees, many of which in our opinion are relevant. We have the following suggestions. The authors state that 288 of the total 343 protein-ligand complexes from the CSAR dataset, 135 of the 195 protein-ligand complexes from the PDBbind dataset and 54 of the 79 experimental datasets could be processes by ABS-Scan. Although, the authors mention in their responses to referee comments of version 1 of the manuscript, that ABS-Scan rejected some protein-ligand complexes due to “unusual atom types or missing protein/ligand atoms or unusual convention for ligand atoms”, it would probably be helpful for users to determine the kind of protein-ligand complexes that are suitable/unsuitable for prediction using ABS-Scan if authors could discuss this aspect in a sufficiently detailed manner. Was there anything in particular or common in the rejected complexes from the three datasets used for validation? It would be great if they could possibly provide the list of the rejected complexes and the reason for rejection, similar to what they have done for the protein-ligand complexes which were used for validation of the ABS-Scan tool. It is usually seen that short peptide ligands are present as a separate chain in the protein structure files. Particularly for these cases, there is no option in the web-interface of ABS-Scan to enter the other chain as ligand. Can you please elaborate how your software tool handles such protein-ligand complexes? While providing the ΔΔG values for the mutated residues, we think it would be a useful if the ABS-Scan server also provides some indication of evolutionary conservation of the residue. This would allow the users to translate ABS-Scan results from already known or docked protein-ligand complexes to other homologs of the same protein family. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors have now incorporated all the suggestions I made in my 1st review. In fact the revised web application is able to include the solvent molecules. I think that the revised version is now ready to be indexed in PubMed. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This paper reports an attempt to develop an original tool that simulates alanine scanning mutagenesis to probe residues involved in the process of ligand recognition in proteins. More precisely, the work describes the development of a work flow that implements known methodologies for homology modeling of alanine single-point mutants of a protein and for molecular docking. Even though, this can be viewed as a methodological paper. We have some serious concerns regarding this work. The authors claim that they performed a "validation" of their tool on a dataset that comprises "79 entries" carefully selected from PDB (also cf point 2 below). Their evaluation is based on finding a correlation between docking scores with experimentally determined binding affinities. In their paper, the authors provide evidence of this validation by providing results of "Experimental correlation" for only one example (Figure 2) which relates to binding of rat 3-alpha-hydroxysteroid dehydrogenase (PDB: 1AFS) to testosterone and progesterone. Since they must have it, clearly, the authors should provide their evaluation of this correlation on all "79 entries". I would expect at least that they provide a new Figure 2 that comprises all data points coming from these "79 entries" to sustain their claim and help readers to evaluate the global performance of their tool. They attempted to provide few additional results on their website ( http://proline.biochem.iisc.ernet.in/abscan/validation). It is more confusing because the results provided for the vitamin D receptor (PDB: 1IE9) is not about binding affinities but "translational activity". I'm here suggesting that detailed data for all mutations taken from all "79" entries are provided to the community in the form of a table or downloadable flat or excel-type file. The amount of independent PDB entries in their dataset is not 79. In fact, in some of PDB entries, multiple ligands were observed. Surprisingly, they consider these as separate entries. So their data is redundant with respect to the proteins. When generating homology models for protein variants, even if these are single point mutants, assessment of the quality of the models is a critical step. Selecting best models may not be that trivial. The authors need to clarify how they implement in their work flow the assessment of the quality of the models and consequently, what criteria they used for selecting the best models (and how many of them) that will be subjected to molecular docking. Regarding the alanine scanning procedure, there are issues regarding the treatment of alanine and proline. They should both be discarded from the alanine scanning protocol: alanine is already present in the structure while proline is not suitable for mutations because of the major protein backbone rearrangements that should be performed to properly mutate it. For such a tool, it is at stake to evaluate its performance using different homology modeling and molecular docking methods. The rational behind the choice of Modeler over other methods like Rosetta is not indicated. Likewise, the reason why Autodock and not Dock etc or even Autodock Vina is not explained. The efficiency of molecular docking using AutoDock is also dependent on the docking protocol used. In such an automated "screen", care should be taken about the preparation of the receptor, the ligand and the grid. For example, are the ligands kept flexible ? In the manuscript, there are no indications about how the authors dealt with this central issue. The authors are encouraged to describe precisely and discuss their docking protocol. According to the AutoDock 4.0 article, the median error range in energy estimation for any protein-ligand evaluation is 1.5-2.0 kcal/mol. In their study, the ∆∆G differences for ligand binding between mutant and native forms of the proteins are far below 2.0 kcal/mol. Thus, it is difficult to rank the mutants. Also, how the authors chose the 0.5 kcal/mol ∆∆G threshold is not clear. There is no discussion how this threshold compares with the intrinsic limits in precision of AutoDock. The definition of ligand in the tool is problematic. In case of oligo or polysaccharides, the carbohydrate residues are erroneously considered separately. For example, in the 1J84 entry from PDB, the carbohydrate-binding module (CBM) is bound to cellotretraose, a 1,4-β-D-glucan composed of four ß-D-glucose residues linked by ß-1,4 osidic linkages. When this PDB entry is submitted to ABS-Scan, it erroneously splits the oligomer into smaller entities that correspond to the chemical IDs of its constituents (BGC 401, 402, 403, 404). This is a serious flaw in their software. While it is common to see people to reuse available codes, the authors do not properly cite the source of their codes they posted on Github and used for providing a complete service to the community: at least 80% of the “alanine_scanning.py” code comes from either MODELLER examples ( http://salilab.org/MODELLER/wiki/Mutate_model) or AutoDock code ( http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.Utilities24.compute_AutoDock41_score-pysrc.html). We have read this submission. We believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above. We thank the reviewers for their time and effort. There were some useful suggestions, which we have incorporated but do not agree with all the points raised. A detailed point-by-point response is given below. The authors claim that they performed a "validation" of their tool on a dataset that comprises "79 entries" carefully selected from PDB (also cf point 2 below). Their evaluation is based on finding a correlation between docking scores with experimentally determined binding affinities. In their paper, the authors provide evidence of this validation by providing results of "Experimental correlation" for only one example (Figure 2)which relates to binding of rat 3-alpha-hydroxysteroid dehydrogenase (PDB: 1AFS) to testosterone and progesterone. Since they must have it, clearly, the authors should provide their evaluation of this correlation on all "79 entries". I would expect at least that they provide a new Figure 2 that comprises all data points coming from these "79 entries" to sustain their claim and help readers to evaluate the global performance of their tool. They attempted to provide few additional results on their website (http://proline.biochem.iisc.ernet.in/abscan/validation). It is more confusing because the results provided for the vitamin D receptor (PDB: 1IE9) is not about binding affinities but "translational activity". I'm here suggesting that detailed data for all mutations taken from all "79" entries are provided to the community in the form of a table or downloadable flat or excel-type file. A suitable dataset for validation would be one that reports binding affinities for both wild-type and mutant proteins with same ligand, performed in a uniform experimental environment, for large number of proteins. Although such a dataset exists for protein-protein alanine scanning mutagenesis for eg., Rosetta alanine scanning), there are none reported for protein-ligand interactions. Since no such dataset was available to us, we systematically extracted PDB entries of ligand bound complexes and the corresponding binding sites in them that contained information about experimental alanine-scanning mutagenesis. However, the manner in which the effects of mutagenesis are reported in these differ significantly. While differences in ligand binding strengths (K a or K d values) are reported for some, changes in catalytic efficiencies are reported for some others. For some others, reporter assays are given which indicate capability of the downstream process more qualitatively. Hence it is difficult to perform a systematic comparison from these with the ∆∆G values calculated from our tool in this study. Nevertheless from this dataset, some examples were hand-picked, corresponding primary literature were read and known residue importances obtained, which were then compared with the predicted ones from our tool. In any case, ABS-Scan analysis has been successfully performed on 54 (the remaining 25 cases were not processed by default steps due to unusual atom types in proteins/ligands) complexes, which provide the extent of contribution to ligand binding of each residue in each site, in the form of a ranked list of residue-wide ∆∆G values. All this information has been made available to the community, through our webserver - ( http://proline.biochem.iisc.ernet.in/abscan/validation). Besides this, given the lack of systematic reports of experimental data, validation can only be performed to understand the significance of the ∆∆G scores calculated from our tool. For this, we have taken two large datasets (a) protein complexes with native ligands versus decoy ligands from and (b) list of well curated with precise binding site definitions for known protein-ligand complexes used for benchmarking docking algorithms. From both of these, ∆∆G scores are in the range of 0.5 was significant. a) A fresh dataset derived from PDB-Bind core dataset consisting of 195 protein-ligand complexes, which has been developed for the purposes of benchmarking docking algorithms (Kim et al., 2004, Huang et al., 2008). Of the 195, 135 could be processed successfully for preparation of the protein-ligand complexes for analysis. (The others that could not be included, are likely to contain either unusual atom names or types or missing protein/ligand atoms or unusual convention for ligand atoms and hence could not be processed). b) A dataset of 343 protein-ligand complexes, each with a native and a decoy ligand. 288 structures out of 343 could be successfully evaluated. (Here again the others were omitted due to difficulties in automatic protein/ligand preparation). In the process, since ABSscan has been run for all these complexes, information about key contributing residues is generated for each of them. This has been made available through the webserver. Residue-wise contribution is obtained and presented in a ranked order for each complex, thus providing a ready resource of important residues for ligand binding. The results of these can be accessed from the validation section on the webserver – http://proline.biochem.iisc.ernet.in/abscan/validation The amount of independent PDB entries in their dataset is not 79. In fact, in some of PDB entries, multiple ligands were observed. Surprisingly, they consider these as separate entries. So their data is redundant with respect to the proteins. These reflect independent binding sites (with bound ligands). As can be expected, some proteins have multiple sites with different ligands, making it necessary to consider them separately. Hence 79 sites are unique and come from 46 PDB entries. In the original manuscript, the dataset of 79 was never meant to reflect ‘ unique PDB entries’. In any case we refer to them now as ‘binding site entities’ to reflect this more clearly. When generating homology models for protein variants, even if these are single point mutants, assessment of the quality of the models is a critical step. Selecting best models may not be that trivial. The authors need to clarify how they implement in their work flow the assessment of the quality of the models and consequently, what criteria they used for selecting the best models (and how many of them) that will be subjected to molecular docking. Model quality has been considered as part of the modelling pipeline itself. Given the scale of the study, it is practical to generate one model for each mutant, but care is taken to ensure that it is optimal and free of errors in terms of bad contacts or atomic clashes. The optimization protocol used consists of 200 iterations of conjugate gradient, followed by molecular dynamic simulation for 4fs and simulated annealing with 200 iterations at different temperatures (This is the default protocol suggested in Model_mutate.py of Modeller - http://salilab.org/modeller/wiki/Mutate%20model). The initial restraints for generation of the model is derived from the wild-type structure itself. Assumptions necessary for modelling point mutations introduced through alanine-scanning mutagenesis protocol at the binding sites are that (a) they are unlikely to change the overall structure of the protein drastically and (b) the ligand moiety roughly retains the same conformation in comparison with the wild-type complex to interact with the mutated structure. Since modelling protocols have been well established for a long time now, we did not see the need for adding this information explicitly in the original MS. In any case, based on the reviewers suggestion, this information has been added to the revised version. Normalized DOPE scores are reported for both the native and mutant structures. DOPE refers to ‘Discrete Optimized Protein Energy’ and is a statistical potential which checks for the feasibility of the observed interactions. Protein structures with lower DOPE scores (typically in negative range -1.5 to -2.5 for experimentally solved structures) can be considered to be of good quality ( Shen and Sali ). Regarding the alanine scanning procedure, there are issues regarding the treatment of alanine and proline. They should both be discarded from the alanine scanning protocol: alanine is already present in the structure while proline is not suitable for mutations because of the major protein backbone rearrangements that should be performed to properly mutate it. This required addition of simple screens to filter out these residues from consideration for alanine scanning, which has been done. Changes have been made to both the source code and the web-tool now. Glycine mutations are also filtered out. For such a tool, it is at stake to evaluate its performance using different homology modeling and molecular docking methods. The rational behind the choice of Modeler over other methods like Rosetta is not indicated. Likewise, the reason why Autodock and not Dock etc or even Autodock Vina is not explained. The goal of our study is not to develop a modelling algorithm or a new parameter for building models. The most widely used tool for homology modelling – Modeller, which we have currently included in the workflow, has about 1500 citations. Currently there are more than 50 tools for homology modeling -( http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software) and roughly the same number of tools for protein-ligand docking ( http://en.wikipedia.org/wiki/Docking_%28molecular%29). The precise reason for choosing ‘Modeller’ or ‘Autodock’ is perhaps because of our own experience in using these tools along with availability of extensive documentation, tutorials and ease of implementation. Moreover, both these libraries had python bindings available and hence could be merged into a single script using python. In future, we plan to develop a pymol plugin for the same. A simple bash script for processing the protein-ligand complex to determine the interaction energy using ROSETTA force fields has also been included in the github repository. This again, is only for the advanced users and we might incorporate it in the future versions of the pipeline. The efficiency of molecular docking using AutoDock is also dependent on the docking protocol used. In such an automated "screen", care should be taken about the preparation of the receptor, the ligand and the grid. For example, are the ligands kept flexible? In the manuscript, there are no indications about how the authors dealt with this central issue. The authors are encouraged to describe precisely and discuss their docking protocol. We would like to clarify here that there is no docking performed in the whole exercise. We only score the complex in the given conformation using the force fields. By default, through prepare_receptor4.py and prepare_ligand4.py Gasteiger charges and polar hydrogens are added while evaluating the interaction energy. This has been mentioned in the manuscript: “Each mutated structure, will then be scored by using Autodock 4.1 force field, to calculate the energetics of a protein-ligand complex. The contribution from the residue is then determined by calculating the difference in interaction score of the mutant and the wild-type protein (∆∆G value).” According to the AutoDock 4.0 article, the median error range in energy estimation for any protein-ligand evaluation is 1.5-2.0 kcal/mol. In their study, the ∆∆G differences for ligand binding between mutant and native forms of the proteins are far below 2.0 kcal/mol. Thus, it is difficult to rank the mutants. Also, how the authors chose the 0.5 kcal/mol ∆∆G threshold is not clear. There is no discussion how this threshold compares with the intrinsic limits in precision of AutoDock. The median error range of the energy estimation reported in AutoDock 4.0 article is for the total ∆G score between the experimental and predicted values, whereas in this case it is for individual residue contributions. The distribution of the ∆∆G values obtained for the decoy and cognate ligands from the CSAR dataset ( http://www.csardock.org/) was used to define a cut-off of 0.5. This has also been validated on PDBbind core dataset ( http://www.pdbbind-cn.org/). Figures 3A and 3B have been added along with explanations in the manuscript. We believe that intrinsic limits on precision of Autodock scoring would not be a major concern as both the wild type and the mutant are evaluated using the same scoring scheme and the cut-off has been chosen on basis of native protein-ligand complexes in CSAR and PDBbind datasets. The definition of ligand in the tool is problematic. In case of oligo or polysaccharides, the carbohydrate residues are erroneously considered separately. For example, in the 1J84 entry from PDB, the carbohydrate-binding module (CBM) is bound to cellotretraose, a 1,4-β-D-glucan composed of four ß-D-glucose residues linked by ß-1,4 osidic linkages. When this PDB entry is submitted to ABS-Scan, it erroneously splits the oligomer into smaller entities that correspond to the chemical IDs of its constituents (BGC 401, 402, 403, 404). This is a serious flaw in their software. This is not really a 'problem' and is an established work-around to avoid long computation and hence long waiting time for the user. All this does is to split peptides or oligosaccharides into individual moieties (typically for a peptide, each amino acid is considered as a moiety and for an oligosaccharide, each monosaccharide is considered as a moiety), as per the convention currently followed by PDB. How can this be a ‘serious flaw’? It does not, in any manner, influence the results. Many other tools for protein-ligand interaction analysis such as LPC (Ligand-protein contacts, Ligplot+, Ligplus etc.) also track ligands through such residue identifiers. However, an advanced option has now been added to provide the range of the ligand residue numbers to be considered as a single moiety during the entire protocol. For example, now a residue range 401-404 can be provided for 1J84 instead of a single residue number to consider the whole oligocomplex as single ligand. The script has also been accordingly modified in github. While it is common to see people to reuse available codes, the authors do not properly cite the source of their codes they posted on Github and used for providing a complete service to the community: at least 80% of the “alanine_scanning.py” code comes from either MODELLER examples (http://salilab.org/MODELLER/wiki/Mutate_model) or AutoDock code (http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.Utilities24.compute_AutoDock41_score-pysrc.html). We have indeed already cited all the tools used in the manuscript to which source codes are linked. In any case, these references are now highlighted in the source code also. Both Autodock and Modeller are released under the GNU public license, making their source code freely usable to all interested parties. Moreover, these are the primary source and codes are not extracted from any third-party tools. The purpose of putting it on Github is to be completely open about the details of the protocol and make our work fully accessible to anyone interested. We would again like to remind the reviewer here that source-code is used only by an advanced user. The reviewer may be aware of the time and effort involved in producing a web-application interface that is embedded with visualization features. This has been done with the belief that it will save precious time for researchers who do not have the expertise or the interest in installation and handling command-line interfaces for such tools. We initially proposed this as a web-tool, but since that section is no longer available in F1000Research, we submitted it as a software tool. The manuscript 'ABS–Scan: In silico alanine scanning mutagenesis for binding site residues in protein–ligand complex' reports development of a web server for performing in silico alanine scanning mutations for studying protein-small molecule interactions. It further validates the tool by taking a list of already published Alanine scanning data along with the X-ray crystallographic structures of the relevant protein-ligand complexes. ABS-Scan provides a user-friendly web interface and will be very much useful for experimentalists to assess the outcome of mutations designed to study protein-ligand (small molecule) interactions. Overall the web tool is well explained in the manuscript. My main concern is the method for energy calculations, authors used to predict the individual contribution upon mutation. It does not include waters and metals. As it is well established that many of the protein-ligand interactions are water mediated and therefore water plays very important role in determining the specificity as well as affinity. Authors could use energy function which includes waters and metals as well otherwise the current version could only be used for protein-ligand complexes in which water/metal atoms have been shown to play any role in stabilizing the ligand in the binding pocket. Thus in my opinion, I think it could be indexed with inclusion of an updated energy function. Alternatively, its sole applicability for protein-ligand interaction without involvement of solvent molecules should be mentioned in the conclusion. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. We thank the reviewer for going through our manuscript and finding the work useful. These are indeed valid points and have been addressed in the revised version. Bridge water molecules do play an important role in the protein-ligand interactions. One has to take into account the resolution of the protein structure to determine the confidence of the placed water molecules. Hence an advanced option is provided wherein these water molecules when present at the site can be considered to be a part of the corresponding ligand moiety. The user can upload his/her own pdbqt file for the ligand with the appropriate water molecules added to it. An example of protein lysine methyltransferases complexed with S-adenosyl methionine has been described in the manuscript. The corresponding pqbqt file of the ligand can be downloaded from the example section. The results of these can also be accessed from example section of the web-server.

33 in total

Review 1. Combinatorial alanine-scanning.

Authors: K L Morrison; G A Weiss
Journal: Curr Opin Chem Biol Date: 2001-06 Impact factor: 8.822

2. Protein structure modeling with MODELLER.

Authors: Narayanan Eswar; David Eramian; Ben Webb; Min-Yi Shen; Andrej Sali
Journal: Methods Mol Biol Date: 2008

3. A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library.

Authors: Pankaj C Jain; Raghavan Varadarajan
Journal: Anal Biochem Date: 2013-12-09 Impact factor: 3.365

4. g_mmpbsa--a GROMACS tool for high-throughput MM-PBSA calculations.

Authors: Rashmi Kumari; Rajendra Kumar; Andrew Lynn
Journal: J Chem Inf Model Date: 2014-06-19 Impact factor: 4.956

5. Truncation and alanine-scanning mutants of type I adenylyl cyclase.

Authors: W J Tang; M Stanzel; A G Gilman
Journal: Biochemistry Date: 1995-11-07 Impact factor: 3.162

6. DrugScorePPI webserver: fast and accurate in silico alanine scanning for scoring protein-protein interactions.

Authors: Dennis M Krüger; Holger Gohlke
Journal: Nucleic Acids Res Date: 2010-05-28 Impact factor: 16.971

7. Comprehensive in silico mutagenesis highlights functionally important residues in proteins.

Authors: Yana Bromberg; Burkhard Rost
Journal: Bioinformatics Date: 2008-08-15 Impact factor: 6.937

8. Two-dimensional alanine scanning mutational analysis of the interaction between the vitamin D receptor and its ligands: studies of A-ring modified 19-norvitamin D analogs.

Authors: Masato Shimizu; Keiko Yamamoto; Mika Mihori; Yukiko Iwasaki; Daisuke Morizono; Sachiko Yamada
Journal: J Steroid Biochem Mol Biol Date: 2004-05 Impact factor: 4.292

9. Serverification of molecular modeling applications: the Rosetta Online Server that Includes Everyone (ROSIE).

Authors: Sergey Lyskov; Fang-Chieh Chou; Shane Ó Conchúir; Bryan S Der; Kevin Drew; Daisuke Kuroda; Jianqing Xu; Brian D Weitzner; P Douglas Renfrew; Parin Sripakdeevong; Benjamin Borgo; James J Havranek; Brian Kuhlman; Tanja Kortemme; Richard Bonneau; Jeffrey J Gray; Rhiju Das
Journal: PLoS One Date: 2013-05-22 Impact factor: 3.240

10. BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations.

Authors: Yves Dehouck; Jean Marc Kwasigroch; Marianne Rooman; Dimitri Gilis
Journal: Nucleic Acids Res Date: 2013-05-30 Impact factor: 16.971

14 in total

1. Molecular dynamics simulations and structure-based network analysis reveal structural and functional aspects of G-protein coupled receptor dimer interactions.

Authors: Fotis A Baltoumas; Margarita C Theodoropoulou; Stavros J Hamodrakas
Journal: J Comput Aided Mol Des Date: 2016-06-27 Impact factor: 3.686

2. Prediction of protein targets of kinetin using in silico and in vitro methods: a case study on spinach seed germination mechanism.

Authors: Sivakumar Prasanth Kumar; Vilas R Parmar; Yogesh T Jasrai; Himanshu A Pandya
Journal: J Chem Biol Date: 2015-05-12

3. Improving RNA modification mapping sequence coverage by LC-MS through a nonspecific RNase U2-E49A mutant.

Authors: Beulah Solivio; Ningxi Yu; Balasubrahmanyam Addepalli; Patrick A Limbach
Journal: Anal Chim Acta Date: 2018-08-07 Impact factor: 6.558

4. Design of a heme-binding peptide motif adopting a β-hairpin conformation.

Authors: Deepesh Nagarajan; Sujeesh Sukumaran; Geeta Deka; Kiran Krishnamurthy; Hanudatta S Atreya; Nagasuma Chandra
Journal: J Biol Chem Date: 2018-04-25 Impact factor: 5.157

5. Functional, electrophysiological and molecular docking analysis of the modulation of Cav 1.2 channels in rat vascular myocytes by murrayafoline A.

Authors: S Saponara; M Durante; O Spiga; P Mugnai; G Sgaragli; T T Huong; P N Khanh; N T Son; N M Cuong; F Fusi
Journal: Br J Pharmacol Date: 2015-12-23 Impact factor: 8.739

6. Novel inhibitors targeting Venezuelan equine encephalitis virus capsid protein identified using In Silico Structure-Based-Drug-Design.

Authors: Sharon Shechter; David R Thomas; Lindsay Lundberg; Chelsea Pinkham; Shih-Chao Lin; Kylie M Wagstaff; Aaron Debono; Kylene Kehn-Hall; David A Jans
Journal: Sci Rep Date: 2017-12-18 Impact factor: 4.379

7. From in silico to in vitro: a trip to reveal flavonoid binding on the Rattus norvegicus Kir6.1 ATP-sensitive inward rectifier potassium channel.

Authors: Alfonso Trezza; Vittoria Cicaloni; Piera Porciatti; Andrea Langella; Fabio Fusi; Simona Saponara; Ottavia Spiga
Journal: PeerJ Date: 2018-05-02 Impact factor: 2.984

8. Mutational Analysis of Sigma-1 Receptor's Role in Synaptic Stability.

Authors: Daniel A Ryskamp; Vladimir Zhemkov; Ilya Bezprozvanny
Journal: Front Neurosci Date: 2019-09-19 Impact factor: 4.677

9. Interaction between TNF and BmooMP-Alpha-I, a Zinc Metalloprotease Derived from Bothrops moojeni Snake Venom, Promotes Direct Proteolysis of This Cytokine: Molecular Modeling and Docking at a Glance.

Authors: Maraisa Cristina Silva; Tamires Lopes Silva; Murilo Vieira Silva; Caroline Martins Mota; Fernanda Maria Santiago; Kelly Cortes Fonseca; Fábio Oliveira; Tiago Wilson Patriarca Mineo; José Roberto Mineo
Journal: Toxins (Basel) Date: 2016-07-20 Impact factor: 4.546

10. mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance.

Authors: Douglas E V Pires; Tom L Blundell; David B Ascher
Journal: Sci Rep Date: 2016-07-07 Impact factor: 4.379