Literature DB >> 28830545

A protocol for CABS-dock protein-peptide docking driven by side-chain contact information.

Mateusz Kurcinski¹, Maciej Blaszczyk¹, Maciej Pawel Ciemny^1,2, Andrzej Kolinski¹, Sebastian Kmiecik³.

Abstract

BACKGROUND: The characterization of protein-peptide interactions is a challenge for computational molecular docking. Protein-peptide docking tools face at least two major difficulties: (1) efficient sampling of large-scale conformational changes induced by binding and (2) selection of the best models from a large set of predicted structures. In this paper, we merge an efficient sampling technique with external information about side-chain contacts to sample and select the best possible models.
METHODS: In this paper we test a new protocol that uses information about side-chain contacts in CABS-dock protein-peptide docking. As shown in our recent studies, CABS-dock enables efficient modeling of large-scale conformational changes without knowledge about the binding site. However, the resulting set of binding sites and poses is in many cases highly diverse and difficult to score.
RESULTS: As we demonstrate here, information about a single side-chain contact can significantly improve the prediction accuracy. Importantly, the imposed constraints for side-chain contacts are quite soft. Therefore, the developed protocol does not require precise contact information and ensures large-scale peptide flexibility in the broad contact area.
CONCLUSIONS: The demonstrated protocol provides the extension of the CABS-dock method that can be practically used in the structure prediction of protein-peptide complexes guided by the knowledge of the binding interface.

Entities: Chemical Disease Species

Keywords: Flexible docking; Molecular docking; Protein–peptide complexes; Protein–peptide interactions

Mesh：

Substances：
Peptides
Proteins

Year: 2017 PMID： 28830545 PMCID： PMC5568604 DOI： 10.1186/s12938-017-0363-6

Source DB: PubMed Journal: Biomed Eng Online ISSN： 1475-925X Impact factor: 2.819

Background

The prediction of protein–peptide complexes is a demanding modeling challenge, particularly when significant conformational changes occur in the binding process. The modeling of large-scale dynamics during binding cannot be effectively performed with standard simulation tools of all-atom resolution. A significant speed-up in flexible docking simulations can be achieved using coarse-grained protein models [1]. The CABS-dock is a method based on a coarse-grained model that is one of the most effective approaches to the simulations of large conformational changes during protein binding [1-3]. The CABS-dock is available as a web server [4-6]. The method doesn’t use any knowledge about peptide structure or a peptide binding site. Additional information on the protein–peptide interaction interface (obtained from experiments or theoretical predictions) may significantly improve the docking accuracy [7]. For example, the majority of state-of-the-art protein–peptide docking tools, like Rosetta FlexPepDock [8] or HADDOCK [9], follow the data-driven docking paradigm. The Rosetta FlexPepDock method enables selection of the “anchoring residue”, a residue that will be constrained during simulation on a given anchoring position. On the other hand, the HADDOCK approach uses so-called “ambiguous interaction restraints” that label receptor residues as “active” or “passive” in peptide binding. In the CABS-dock method, the most intuitive way to introduce information about protein–peptide contact(s) is to apply distance constraint(s) on a chosen residue pair during the simulation. The side-chain contact information may be derived either directly from structural experiments or with bioinformatics tools. The possible approaches include binding site prediction [10], similarity based docking [11] or analysis of protein sequence co-evolution [12]. In this work, we present a strategy for incorporating the information on protein–peptide side-chain interactions into the CABS-dock procedure. The developed protocol for docking driven by side-chain contact information leads to a significant improvement in modeling accuracy as compared with CABS-dock docking in the default mode.

Methods

CABS model

The CABS-dock uses a CABS coarse-grained protein model for flexible docking simulations. The main features of the CABS model (described in detail elsewhere [13] and also in recent review [1]) are summarized below: Coarse-grained representation of molecules: each amino acid residue is represented by three pseudo-atoms: Carbon Alpha (Cα), carbon Beta and the Side-chain. To mimic the peptide bond, the fourth center of interactions is defined in the geometrical center of the virtual Cα–Cα bond. Positions of the Cα atoms are restricted to the cubic lattice, whereas other pseudoatoms are placed off the lattice. Statistical force field: the energy of the complex models is related to the frequency of interactions observed in already solved structures available in the PDB [14]; Sampling of the configurational space is controlled by the Replica Exchange Monte Carlo scheme. Such a design of the CABS model leads to significant simulation speed-up, by three to four orders of magnitude with regard to all-atom molecular dynamics. At the same time, reasonable resolution of modeled structures is preserved, as coarse-grained models may be easily rebuilt to realistic all-atom representation. The CABS model was successfully applied to a variety of modeling tasks including: protein structure prediction [15, 16], simulations of folding mechanisms [17-20], flexibility of globular proteins [21, 22] and modeling of protein–protein and protein–peptide complexes [23-28].

CABS-dock docking procedure

The pipeline of the CABS-dock method for protein–peptide docking [4-6] is presented in Fig. 1. The modeling procedure consists of four steps: (1) initial setup, (2) coarse-grained simulation, (3) model selection and (4) model refinement.

Fig. 1

CABS-dock pipeline. The pipeline shows CABS-dock in the default mode (without any contact information) and additional input information used in the contact driven mode (marked in orange)

Initial setup

In the initial setup the receptor structure is translated into coarse-grained representation. Subsequently, ten copies of the peptide in random conformations are generated for the replica exchange method and also transformed into coarse-grained representation. As in the default docking mode, random peptide conformations are randomly scattered around the receptor at distances up to 20 Å from the receptor molecular surface. The information about the side-chain contacts between the receptor and the peptide is transformed into soft distance restraints imposed on the modeled molecules.

Simulation

The coarse-grained simulation of the system is carried out in ten copies at different temperatures, with the exchange of coordinates between copies every given number of simulation cycles. The peptide molecule is fully flexible during the docking simulation. In the contact-driven mode of the CABS-dock method, we introduced a simple contact potential described by the following formula: where is the observed distance between pseudoatoms representing side chains, is the distance below which the potential vanishes and s is the slope of the potential line. This potential is also depicted in Fig. 2. Its role is to draw the ligand molecule to the binding site, but not to contribute to the final conformational energy of the complex. As in the default CABS-dock modeling mode [4, 5] the receptor molecule is also flexible, both on the side-chain and backbone level, but kept in near native conformation by distance restraints.

Fig. 2

A simple attractive potential for side-chain contact. The potential introduces an energetic penalty (E) that is dependent on the distance (D) between pseudoatoms representing selected side chain contact (see also Eq. 1)

Model selection

CABS-dock simulation provides 10,000 alternative models of the complex. From this set the 1000 top scored complexes (with the lowest CABS interaction energy) are selected for the next step. Final selection is done by clustering the 1000 models using the k-medoid procedure with k = 10 and ligand RMSD (root mean square deviation of peptide coordinates after superposition of receptor molecules) as the measure of model similarity. The medoids from each cluster are selected for the next step as 10 top ranked models. The ranking from 1st to 10th is based on cluster density values (number of cluster models divided by their average difference within a cluster). Figure 3 shows consecutive stages of model selection.

Fig. 3

CABS-dock predictions for the 3d1e complex. The image shows CABS-dock 3d1e predictions in two docking modes: default mode (left column, without using any information about the binding interface) and contact information mode (right column, using information about a single side-chain contact). The upper panels show sets of 10,000 models. The middle panels show sets of 1000 top scored models. The lower panel shows sets of 10 top scored models obtained in both docking modes. Peptide models obtained in default and contact information mode are colored in orange and cyan, respectively. The peptide model with the lowest RMSD (from the contact information mode) is shown in green, the peptide from the experimental complex in magenta, and the receptor residue belonging to the side-chain contact used in the docking is marked in red. Ligand-RMSD between this model and the experimental peptide structure is 1.76 Å

Refinement

Finally, 10 top ranked models are reconstructed to all-atom representation. For this task, CABS-dock method uses an automated Modeller procedure [29].

Results and discussion

We tested the developed protocol (for driving CABS-dock docking with side-chain(s) contact information) on several protein–peptide complexes from previous CABS-dock tests (without contact information, default docking settings) [4, 5]. The results, together with comparison (default docking vs. docking with contact information) are presented in Table 1. In each case, a single protein–peptide contact for driving the docking was chosen randomly (see Table 2). The parameters of the attractive potential for side-chain contacts (see Eq. 1) were set as: D = 5.0 Angstroms, s = 1.0. Like in our previous CABS-dock tests [4, 5], preferred secondary structures of the peptides were taken from the native structures of the complexes.

Table 1

Comparison of CABS-dock docking performance without (default) vs. with information about a randomly selected contact

	Docking without contact information (default CABS-dock settings)				Docking driven by random contact information
PDB	RMSD^10k	RMSD^1k	RMSD¹⁰⁰	RMSD¹⁰	RMSD^10k	RMSD^1k	RMSD¹⁰⁰	RMSD¹⁰
2v3s	2.42	2.42	3.48	8.89	1.30	1.37	1.65	1.77
2vj0	2.09	2.96	4.12	3.91	2.71	3.00	3.00	3.40
2zjd	2.35	2.69	2.79	4.60	1.77	2.03	2.03	3.06
3bfq	10.20	11.53	14.22	13.48	1.47	1.47	2.34	2.89
3bu3	6.87	7.06	7.71	8.86	3.62	4.45	5.33	5.47
3bwa	2.42	2.75	3.17	3.94	2.00	2.32	2.36	3.55
3cvp	4.52	4.67	8.47	10.14	2.37	2.98	3.91	4.29
3d1e	4.39	6.59	8.18	18.82	1.76	1.76	2.01	1.76
3d9t	3.56	4.34	7.25	10.06	1.96	2.69	3.42	3.72

The table shows RMSD values showing the lowest RMSD value from: 10,000 CABS-dock models (RMSD10k), 1000 top-scored CABS-dock models (RMSD1k), 100 top-scored CABS-dock models (RMSD100), 10 top-scored CABS-dock models (RMSD10)

Table 2

Input data for CABS-dock protein–peptide docking using information about side-chain contacts

PDB	Receptor chain	Peptide sequence	ID of contact residues
PDB	Receptor chain	Peptide sequence	Receptor	Peptide
2v3s	B	GRFQVT	449	4
2vj0	A	PKGWVTFE	782	3
2zjd	A	GGDDDWTHLS	35	9
3bfq	G	ADSTITIRGYVRDNR	117	5
3bu3	A	YNPYPEDYGDIEIG	1181	11
3bwa	A	FPTKDVAL	66	1
3cvp	A	NRASKL	557	4
3d1e	A	GQLGLF	364	1
3d9t	B	ATPFQE	307	4

Single contacts were randomly selected from native contacts (defined using 5 Å distance cut-off based on positions of heavy atoms)

Comparison of CABS-dock docking performance without (default) vs. with information about a randomly selected contact The table shows RMSD values showing the lowest RMSD value from: 10,000 CABS-dock models (RMSD10k), 1000 top-scored CABS-dock models (RMSD1k), 100 top-scored CABS-dock models (RMSD100), 10 top-scored CABS-dock models (RMSD10) Input data for CABS-dock protein–peptide docking using information about side-chain contacts Single contacts were randomly selected from native contacts (defined using 5 Å distance cut-off based on positions of heavy atoms) For most of the docking cases, we noted significant improvement (see Table 1). One of the cases (PDB ID: 3d1e) is shown in Fig. 3. For this test case, in the default CABS-dock mode (without any contact information), the accuracy of predictions in the set of ten top-scored models was very low (RMSD10 was 18.82 Å, the peptides are shown in orange). Side-chain contact information enables restraining the conformational sampling of a peptide to the broad neighborhood of the contact. This resulted in the selection of 10 top-scored peptides that were much closer to the binding site than in the default docking mode. Another docking example of 3bfq complex is presented in Fig. 4. In this case, a much longer peptide (15 residues) was docked. As compared to docking in the default mode, the use of contact information enabled significant improvement of the docking accuracy, however, there is still room for improvement. Namely, the lowest RMSD model out of the 10,000 models is much more accurate than that out of the 10 top-scored models, which is also the case for other modeled complexes (see Table 1). CABS-dock top-scored predictions for all testing cases are presented in Fig. 5.

Fig. 4

Fig. 5

CABS-dock top-scored predictions. Peptide models with the lowest RMSD among 10 top-scored models are shown in green, peptides from the experimental complex in magenta, and the receptor residue belonging to the side-chain contact used in the docking is marked in red

CABS-dock predictions for the 3bfq complex. The image shows comparison of the experimental peptide pose (in magenta, taken from the 3bfq complex) with CABS-dock models using contact information: the best from ten top-scored models (in green, RMSD10 = 2.89 Å) and the best from 10,000 models (in cyan, RMSD10k = 1.47 Å). Additionally, the best model from 10 top-scored models without contact information (default mode) is shown (in orange, RMSD10 = 10.02 Å). The receptor residue belonging to the side-chain contact used in the docking is marked in red CABS-dock top-scored predictions. Peptide models with the lowest RMSD among 10 top-scored models are shown in green, peptides from the experimental complex in magenta, and the receptor residue belonging to the side-chain contact used in the docking is marked in red

Conclusions

The accurate characterization of protein–peptide interfaces is important for understanding the molecular basis of life and rational design of peptide therapeutics [30]. Also, the lessons learnt from protein–peptide molecular docking can be extremely valuable in addressing important questions regarding the modeling of protein–protein interactions [31, 32]. In this work we demonstrated how very sparse and easily accessible data may improve structure prediction of protein–peptide complexes with the CABS-dock method. We introduced a simple protocol that transforms information about expected protein–peptide contacts into soft restraints, which enable extensive sampling of the peptide conformational space in a large area around the defined contact. Further development of the protocol will provide a promising tool for high-throughput studies, incorporated into a publicly available CABS-dock server. Our protocol can be easily combined with other bioinformatics tools, for contact prediction, or with experimental data [7]. The former is an especially promising approach as there already are numerous methods that could be incorporated in such a pipeline (for example binding site prediction tools [7, 33, 34]). Additional improvements can be achieved using better scoring and selection procedures that would be able to fish out the best accuracy peptide models out of a large set of CABS-dock predictions. This can be done in various ways, for example, using external force-fields (e.g. all-atom molecular dynamics [5]) or machine learning approaches [35].

34 in total

1. Folding pathway of the b1 domain of protein G explored by multiscale modeling.

Authors: Sebastian Kmiecik; Andrzej Kolinski
Journal: Biophys J Date: 2007-09-21 Impact factor: 4.033

2. Information-driven modeling of protein-peptide complexes.

Authors: Mikael Trellet; Adrien S J Melquiond; Alexandre M J J Bonvin
Journal: Methods Mol Biol Date: 2015

3. Modeling of protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking.

Authors: Maciej Blaszczyk; Mateusz Kurcinski; Maksim Kouza; Lukasz Wieteska; Aleksander Debinski; Andrzej Kolinski; Sebastian Kmiecik
Journal: Methods Date: 2015-07-10 Impact factor: 3.608

4. Sequence co-evolution gives 3D contacts and structures of protein complexes.

Authors: Thomas A Hopf; Charlotta P I Schärfe; João P G L M Rodrigues; Anna G Green; Oliver Kohlbacher; Chris Sander; Alexandre M J J Bonvin; Debora S Marks
Journal: Elife Date: 2014-09-25 Impact factor: 8.140

5. Theoretical study of molecular mechanism of binding TRAP220 coactivator to Retinoid X Receptor alpha, activated by 9-cis retinoic acid.

Authors: Mateusz Kurcinski; Andrzej Kolinski
Journal: J Steroid Biochem Mol Biol Date: 2010-04-14 Impact factor: 4.292

6. Can self-inhibitory peptides be derived from the interfaces of globular protein-protein interactions?

Authors: Nir London; Barak Raveh; Dana Movshovitz-Attias; Ora Schueler-Furman
Journal: Proteins Date: 2010-11-15

7. Mechanism of Folding and Binding of an Intrinsically Disordered Protein As Revealed by ab Initio Simulations.

Authors: Mateusz Kurcinski; Andrzej Kolinski; Sebastian Kmiecik
Journal: J Chem Theory Comput Date: 2014-06-10 Impact factor: 6.006

8. CABS-fold: Server for the de novo and consensus-based prediction of protein structure.

Authors: Maciej Blaszczyk; Michal Jamroz; Sebastian Kmiecik; Andrzej Kolinski
Journal: Nucleic Acids Res Date: 2013-06-08 Impact factor: 16.971

9. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site.

Authors: Mateusz Kurcinski; Michal Jamroz; Maciej Blaszczyk; Andrzej Kolinski; Sebastian Kmiecik
Journal: Nucleic Acids Res Date: 2015-05-05 Impact factor: 16.971

10. A unified conformational selection and induced fit approach to protein-peptide docking.

Authors: Mikael Trellet; Adrien S J Melquiond; Alexandre M J J Bonvin
Journal: PLoS One Date: 2013-03-13 Impact factor: 3.240

4 in total

1. FBP21's C-Terminal Domain Remains Dynamic When Wrapped around the c-Sec63 Unit of Brr2 Helicase.

Authors: Jana Sticht; Miriam Bertazzon; Lisa M Henning; Jan R Licha; Esam T Abualrous; Christian Freund
Journal: Biophys J Date: 2018-11-29 Impact factor: 4.033

2. Protein-peptide docking using CABS-dock and contact information.

Authors: Maciej Blaszczyk; Maciej Pawel Ciemny; Andrzej Kolinski; Mateusz Kurcinski; Sebastian Kmiecik
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622

Review 3. Modeling of Protein Structural Flexibility and Large-Scale Dynamics: Coarse-Grained Simulations and Elastic Network Models.

Authors: Sebastian Kmiecik; Maksim Kouza; Aleksandra E Badaczewska-Dawid; Andrzej Kloczkowski; Andrzej Kolinski
Journal: Int J Mol Sci Date: 2018-11-06 Impact factor: 5.923

4. Immunoinformatics Analysis of SARS-CoV-2 ORF1ab Polyproteins to Identify Promiscuous and Highly Conserved T-Cell Epitopes to Formulate Vaccine for Indonesia and the World Population.

Authors: Marsia Gustiananda; Bobby Prabowo Sulistyo; David Agustriawan; Sita Andarini
Journal: Vaccines (Basel) Date: 2021-12-09

4 in total