Literature DB >> 24896119

Structure prediction of the second extracellular loop in G-protein-coupled receptors.

Sebastian Kmiecik¹, Michal Jamroz¹, Michal Kolinski².

Abstract

G-protein-coupled receptors (GPCRs) play key roles in living organisms. Therefore, it is important to determine their functional structures. The second extracellular loop (ECL2) is a functionally important region of GPCRs, which poses significant challenge for computational structure prediction methods. In this work, we evaluated CABS, a well-established protein modeling tool for predicting ECL2 structure in 13 GPCRs. The ECL2s (with between 13 and 34 residues) are predicted in an environment of other extracellular loops being fully flexible and the transmembrane domain fixed in its x-ray conformation. The modeling procedure used theoretical predictions of ECL2 secondary structure and experimental constraints on disulfide bridges. Our approach yielded ensembles of low-energy conformers and the most populated conformers that contained models close to the available x-ray structures. The level of similarity between the predicted models and x-ray structures is comparable to that of other state-of-the-art computational methods. Our results extend other studies by including newly crystallized GPCRs.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24896119 PMCID： PMC4052351 DOI： 10.1016/j.bpj.2014.04.022

Source DB: PubMed Journal: Biophys J ISSN： 0006-3495 Impact factor: 4.033

Introduction

G-protein-coupled receptors (GPCRs) constitute the largest and the most versatile family of membrane-bound receptors. They interact with very diverse sets of ligands including neurotransmitters, hormones, amino acids, lipids, odorants, ions, fatty acids, and peptides. In response to stimuli, the receptor undergoes a series of structural rearrangements (1) allowing signal transduction across the plasma membrane and its further propagation inside the cell. Because GPCRs play key roles in a variety of signaling cascades that control many cellular processes and are related to numerous diseases (2), they are very important targets for pharmacological intervention (3). It is estimated that ∼40% of drugs currently in clinical use target these receptor proteins (4,5). Significant effort is devoted to determine human GPCR structures and function (6), which may lead to the discovery of new potent drugs with higher receptor subtype selectivity (and thus fewer side effects). Thanks to the recent progress in crystallization techniques, structural coverage of GPCRs has experienced an exponential growth trend (6). However, the gap between the number of experimentally derived crystal structures and all known GPCR sequences (potential new drug targets) remains large (sequences of >800 GPCRs are now identified (7)). This makes computational methods a reasonable and promising alternative for the determination of receptor atomic structures. All GPCRs share a common architecture of a seven-helix bundle spanning the cell membrane. This region shows the highest sequential conservation among all members of the GPCR family. The seven transmembrane helices (TMHs) are linked by intra- and extracellular loop regions. The loop regions present significant structural diversity even between closely related receptor subtypes (8). The most interesting GPCR region for structure-based drug design is the ligand interaction and recognition site located in the cavity created by surrounding TMHs and extracellular loops (ECLs). Over the last decade, the ECLs have gained increasing interest due to their important functional roles in ligand binding, activation, and regulation of GPCRs (9). The accurate prediction of ECLs is critical for the construction of models applicable in drug design efforts (8,9). The low sequence similarity and lack of suitable templates makes homology modeling methods inappropriate for this purpose. Different computational protocols have been applied to the prediction of ECL structures in different GPCRs (10–14). Most of them showed that short ECLs (5–7 residues) can be predicted with very good accuracy (with root mean-square deviations (RMSDs) lower than 1 Å when compared to the crystal structures). In contrast, the prediction of long (or so called super-long ECLs, having over 15 residues) presents a challenging task for contemporary modeling tools. The second ECL (ECL2) that connects TMH4 and TMH5 is the longest and the most divergent of the three ECLs. The functional importance of ECL2 has been demonstrated in many studies. For instance, ECL2 has been shown to play an important role in binding both allosteric and orthosteric ligands (8), receptor function and signaling (15,16). Mutagenesis studies also confirmed that the ECL2 region is responsible for the receptor subtype selectivity of signaling molecules (17,18) and its alteration may transform an antagonist to act as an agonist (19). Moreover, a particular ECL2 conformation is probably required for preserving proper receptor-ligand interaction, e.g., disruption of the disulfide bond stabilizing the short α-helix present in ECL2 of the adrenergic receptor decreased ligand affinity 1000-fold (20). In addition, long scale molecular dynamics (MD) simulation of the adrenergic receptor suggested that the ECL2 region is responsible for preliminary interaction with small molecules entering the binding site (21). The importance of ECL2 for receptor activation was also highlighted by the identification of point mutations conferring constitutive activity of the C5a receptor (22) and the thrombin receptor (23). In this work, we present results of ECL2 structure prediction for 13 subtypes of GPCRs (representing all receptor subtypes with available crystal structures at a time when this study was initiated). The following receptors were selected for ECL2 restoration: Adenosine receptor A2a (A2AR), Beta-1 adrenergic receptor (β1AR), Beta-2 adrenergic receptor (β2AR), C-X-C chemokine receptor type 4 (CXCR4), Dopamine D3 receptor (D3R), Delta-type opioid receptor (DOR), Muscarinic acetylcholine receptor M2 (M2R), Muscarinic acetylcholine receptor M3 (M3R), Mu-type opioid receptor (MOR), Nociceptin receptor (NOP), Neurotensin receptor type 1 (NTR1), Rhodopsin (RHO), and Sphingosine 1-phosphate receptor 1 (S1PR). For each receptor, we chose one crystal structure from the Protein Data Bank (PDB) database showing the highest resolution and complete representation of extracellular loops (see Table 1 for receptor details). Of importance, in our modeling we used no information about the crystal structure of any extracellular element (including ECL1, ECL2, and ECL3), except constraints on disulfide bridges.

Table 1

Description of 13 GPCR receptor structures modeled in this study

Receptor name	PDB ID	Receptor description	Species
A2AR	4EIY	Adenosine receptor A2a	Homo sapiens
β1AR	2Y00	Beta-1 adrenergic receptor	Meleagris gallopavo
β2AR	2RH1	Beta-2 adrenergic receptor	Homo sapiens
M2R	3UON	Muscarinic acetylcholine receptor M2	Homo sapiens
M3R	4DAJ	Muscarinic acetylcholine receptor M3	Rattus norvegicus
CXCR4	3ODU	C-X-C chemokine receptor type 4	Homo sapiens
D3R	3PBL	Dopamine D3 receptor	Homo sapiens
NTR1	4GRV	Neurotensin receptor type 1	Rattus norvegicus
DOR	4EJ4	Delta-type opioid receptor	Mus musculus
NOP	4EA3	Nociceptin receptor	Homo sapiens
MOR	4DKL	Mu-type opioid receptor	Mus musculus
RHO	1U19	Rhodopsin	Bos taurus
S1PR	3V2W	Sphingosine 1-phosphate receptor 1	Homo sapiens

Methods

In Fig. 1, we present a pipeline of the loop modeling procedure employed in this work. The procedure consists of three major modeling steps: 1), exploring the conformational space by the CABS model; 2), reconstruction to all-atom representation; and 3), selection of resulting model(s).

Figure 1

Pipeline of the loop modeling procedure.

CABS model

CABS (C-Alpha, Beta, and Side chain) is a versatile protein modeling tool based on coarse-grained structure representations and the Monte Carlo dynamics sampling scheme. CABS has been extensively tested in numerous structure prediction exercises, including successful participation in CASP experiments (CASP, Critical Assessment of protein Structure Prediction, a community-wide blind test of structure prediction approaches). In the CASP6 edition the Kolinski-Bujnicki group, employing the CABS-based modeling strategy, scored as the best, or the second best, depending on the evaluation method (24,25). The CABS modeling tool was also productive in the ab initio prediction of protein loops (26) or missing fragments (27), high-resolution structure prediction (28), modeling of protein-protein complexes (29,30), or large biomolecular systems (31,32). Taken together, those tests demonstrated that the CABS approach is competitive, or even superior, to other state-of-the-art structure prediction tools especially in difficult modeling cases (typically when large protein fragments need to be predicted with little or no support from evolutionary or experimental data). Recently, the CABS approach for the ab initio and consensus-based prediction of protein structure has been made available as a CABS-fold web server (33). The CABS model is described in detail elsewhere (34). Here, we give only a brief summary. The major components of the CABS model (protein representation, force field, and sampling scheme) have been designed for the efficient simulation of real proteins. The CABS protein representation has been reduced to up to four atoms per residue: alpha carbon, beta carbon, and two pseudoatoms: center of mass of the side chain and center of the virtual alpha carbon-alpha carbon bond. The CABS force field employs knowledge-based potentials derived from statistical analysis of known protein structures (deposited in the PDB) and a model of main-chain hydrogen bonds. Solvent effects are accounted for in an implicit way through the knowledge-based potentials. The CABS dynamics is simulated by a random series of local micromodifications controlled by the asymmetric Metropolis scheme of the Monte Carlo method. Of importance, the long series of such micromodifications describes well near-native dynamics (35,36) or entire protein folding mechanisms (37–39). Detailed analysis of CABS dynamics, together with its comparison to MD simulation and other computational tools, is provided in the work (36). The resolution of CABS predictions enables fast reconstruction to realistic all-atom models. Thus, the CABS model can be easily merged with all-atom modeling tools into multiscale modeling procedures benefiting from coarse-grained efficiency and atomic-level accuracy (40,41).

CABS setup and modifications for the present study

The required CABS input files were prepared using the Bioshell package (42). CABS simulations started from random conformations of EC loops. TM fragments of receptor structures were restrained to x-ray structure (using distance restraints on alpha carbons). For each receptor, two independent CABS runs were conducted, each generating 2000 models. Therefore, in total, 4000 CABS-generated models for each receptor were used in the next modeling steps. The CABS model performs very well for a large fraction of globular proteins (24), but the statistical potential needs some corrections for specific systems. In the generic force field the CYS-CYS side chain contact potential reflects the statistical average for bonded and unbonded pairs (34). For the systems studied in this work we assume knowledge of bonded CYS pairs, the CYS-CYS statistical potential has been assumed to be 0, whereas on the bonded pairs we imposed strong distance restrains. This way possible artificial energy biases toward the more than binary CYS contacts have been eliminated. In the original force field of CABS the interaction distance of side chains was derived for single domain globular proteins. In this application we slightly reduced the effective width and stiffness of amino acids from loop-forming sequences (d1 = 0.5 and d2 = 1.5, see a detailed description of the original force field in (34).). This way, we perhaps slightly decreased the accuracy of the discrete representation of low energy folded structures, enabling, however, efficient transitions between various local minima.

Reconstruction to all-atom representation and selection of model(s)

In general, the reconstruction to all-atom representation involved a three step procedure: i), reconstruction of the backbone chain based on the alpha-carbon trace; ii), reconstruction of side-chain positions based on the backbone chain; iii), short optimization and refinement protocol. In more detail, in the first step CABS-generated trajectories (in the C-alpha format) were reconstructed to backbone representation using the BBQ tool (43). The prepared loop conformations were inserted into the native crystal structure, and loop side chains were reconstructed with SCWRL3 (44). In the next step, each model was optimized with the DOPE force field (45) using MODELER by a comparative modeling procedure (using previous step models as templates). Loop side chains were again optimized with SCWRL3 (44). The constructed models were subjected to energy calculation and structural clustering. All-atom energy was evaluated with GROMACS software (46) using single point energy computation. Structural clustering was performed with the K-means algorithm (using ClusCo software (47)). RMSDs of loops were calculated using CSB (48) on loop fragments, after superimposition of the whole model onto the native/reference structure (excluding loop atoms).

Selection of ECLs

The ECL fragment boundaries were selected based on examination of the secondary structure of TM domains in receptor crystal structures (x-ray structures are listed in Table 1). The first and the last amino acid of the ECLs were considered as the one not involved in the TM helices hydrogen bond network. Table 2 lists sequences of the ECLs restored in this study for 13 GPCRs.

Table 2

ECLs restored in this study for 13 GPCRs

Receptor name	PDB ID	Loop	Loop sequence	Loop length	Residue numbering
A2AR	4EIY	ECL1	FCA	3	70–72
		ECL2	PMLGWNNCGQPKEGKNHSQGCGEGQVACLFEDVV	34	139–172
		ECL3	PDCSHA	6	260–265
β1AR	2Y00	ECL1	TWLW	4	105–110
		ECL2	WWRDEDPQALKCYQDPGCCDFVT	23	181–203
		ECL3	RDLV	4	317–320
β2AR	2RH1	ECL1	MWTF	4	98–101
		ECL2	WYRATHQEAINCYANETCCDFFT	23	173–195
		ECL3	DNLI	4	300–303
M2R	3UON	ECL1	YWPL	4	88–91
		ECL2	VRTVEDGECYIQFFS	15	168–182
		ECL3	APCI	4	414–417
M3R	4DAJ	ECL1	RWAL	4	132–135
		ECL2	KRTVPPGECFIQFLS	15	212–226
		ECL3	DSCI	4	517–520
CXCR4	3ODU	ECL1	NWYF	4	101–104
		ECL2	NVSEADDRYICDRFYP	16	176–191
		ECL3	IIKQ	4	269–272
D3R	3PBL	ECL1	GGVWNF	6	93–98
		ECL2	FNTTGDPTVCSIS	13	172–184
		ECL3	QTCHV	5	356–360
NTR1	4GRV	ECL1	HPWAF	5	133–137
		ECL2	GLQNRSGDGTHPGGLVCTPIV	21	209–229
		ECL3	DEQW	4	336–339
DOR	4EJ4	ECL1	TWPF	4	103–106
		ECL2	VTQPRDGAVVCMLQFPS	17	188–204
		ECL3	DINRR	5	288–292
MOR	4DKL	ECL1	TWPF	4	132–135
		ECL2	TTKYRQGSIDCTLTFSH	17	207–223
		ECL3	TIPE	4	307–310
NOP	4EA3	ECL1	FWPF	4	115–118
		ECL2	SAQVEDEEIECLVEIPT	17	190–206
		ECL3	VQPS	4	290–293
RHO	1U19	ECL1	YFVF	4	102–105
		ECL2	WSRYIPEGMQCSCGIDYYTPHEET	24	175–198
		ECL3	GSDF	4	280–283
S1PR	3V2W	ECL1	GATTYKL	7	106–112
		ECL2	WNCISALSSCSTVLPLY	17	182–198
		ECL3	KVKTCDILFR	10	283–292

Secondary structure prediction

The CABS modeling procedure can be supported with additional information about the expected types of secondary structures. CABS uses different sets of statistical potentials for protein fragments with assigned secondary structure (three predefined potential types are available: H for α-helical conformations, E for extended conformations, and C for coil-like conformations). These different sets of potentials are mainly responsible for controlling distances between respective alpha carbons (Cαn - Cαn+2 and Cαn - Cαn+4 pairs, for a detailed description of the CABS force field see (34)). Therefore, to enhance the accuracy of final predictions, we enriched the input data by theoretical predictions of ECL2 secondary structure. The input predictions were obtained as a consensus from three web server tools (predicting secondary structure from sequence): PSIPRED (49), Jpred 3 (50), and PSSpred (51) (see Table S2 in the Supporting Material for consensus secondary structure prediction). In our experience, a correct secondary structure input can significantly improve prediction results, although input mistakes can have serious consequences on the final outcome (input overpredictions of the regular secondary structure are more dangerous for the quality of the results than underpredictions (52,53)).

Results

Comparison of modeling results with experimental crystal structures

In Fig. 2, we present a summary of structure prediction results of ECL2 loops (for details of the modeling procedure, see Methods). The figure shows the lowest RMSD values obtained by the CABS model (red bars) and RMSDs of CABS-generated models selected according to all-atom energy values (blue shadowed bars), and structural clustering (green shadowed bars). The results of the selection are presented for a single top-scored model (the lowest energy one, LE; or representing the largest cluster, LC), but also for the lowest RMSD models observed within a set of top-scored models (10 or 100). According to Nikiforovich et al. (12,54), the lowest RMSD out of a set of top-scored models may be a more adequate measure of prediction accuracy than RMSD of a single top-scored model (12,54). This is because crystal structures capture single conformation only of highly mobile ECL loops, but not necessarily the most biologically relevant one. Therefore, in Fig. 2 we report the lowest RMSD values observed within the sets of 10 or 100 of the lowest energy models (LE10 or LE100) and the sets of 10 or 100 representatives of the largest clusters (LC10 or LC100).

Figure 2

Results of predictions of the second extracellular loop (ECL2). GPCR receptors (with ECL2 residue length in brackets) are marked on the horizontal axis. For each receptor, the color bars show RMSD (in Å from crystal structures) of models selected according to different criteria. Red bars correspond to the lowest RMSD model generated by CABS. Blue shadow bars correspond to models selected based on energy scores: model with the lowest energy (LE), model showing the lowest RMSD from the 10 lowest energy models (LE10) and model showing the lowest RMSD from the 100 lowest energy models (LE100). Green shadow bars correspond to models selected based on structural clustering: model representing the largest cluster (LC), model showing the lowest RMSD from the representatives of 10 largest clusters (LC10), model showing the lowest RMSD from the representatives of 100 largest clusters (LC100). The detailed values are given in Table S3 and Table S4. To see this figure in color, go online.

As presented in Fig. 2, the best RMSD models obtained by CABS are within an RMSD range of 1.9–4.7 Å from their crystal structures (depending on GPCR). These models represent the lowest RMSD value (RMSDBEST) observed in a trajectory of 4000 snapshots generated by CABS for each GPCR target. As already mentioned previously, we attempted to reduce the number of alternative predictions (from 4000 to 1 or 10 or 100) using well-tested selection procedures: all-atom energy scoring after short minimization in the all-atom force-field (28) and structural clustering (24,26) (see Methods for details). We applied an energy scoring and minimization procedure similar to the one that proved efficient in the discrimination of medium-accuracy homology models (RMSD range of 2–3 Å from the native) from low-accuracy homology models of globular proteins (see Fig. 4 in (28).). Analysis of RMSDs of the lowest energy models (RMSDLE) shows that in most GPCR cases RMSDLE values are disappointingly higher than corresponding RMSDBEST values. Taken together this indicates that the energy evaluation of GPCR loops is a more demanding task than that of homology models of globular proteins in (28). On the other hand, the values of the lowest RMSDs from the set of 10 or 100 selected models (see RMSDLE10 and RMSDLE100 in Fig. 2) are in most receptor cases close to the RMSDBEST values.

Figure 4

Models showing large-scale movements of the second extracellular loop. The lowest energy models of DOR (a) and β2AR (b) are presented in blue and superimposed on crystal structures shown in gray. Crystal structure fragments of transmembrane helices (TMH3 and TMH4) are also visualized. To see this figure in color, go online.

In addition to energy scoring, we applied structural clustering as an alternative approach to model selection. Using a clustering method, which proved to be useful in previous structure prediction tasks (24,26), we attempted to select a single representative model and sets of models (10 or 100, similarly as by energy scoring). As shown in Fig. 2, in most GPCR cases representative models of the largest cluster have substantially higher RMSD values (RMSDLC) than RMSDBEST. However, in two GPCR cases (CXCR4 and RHO) the representatives of the largest cluster have the lowest RMSD among the representatives of the 10 largest clusters (for CXCR4 RMSDLC = RMSDLC10 = 3.56 Å, and for RHO RMSDLC = RMSDLC10 = 5.11 Å, see Fig. 2). In summary, results of the selection of models using structural clustering were on average comparable (slightly inferior) to those of energy scoring. Namely the average RMSD values (for the entire GPCR set) were the following: RMSDBEST = 3.15 Å, RMSDLE = 5.84 Å, RMSDLE10 = 4.3 Å, RMSDLE100 = 3.73 Å, RMSDLC = 6.47 Å, RMSDLC10 = 4.41 Å and RMSDLC100 = 3.63 Å (for the description of RMSD superscripts see Fig. 2, legend). All the predicted models are available for download from http://biocomp.chem.uw.edu.pl/GPCR-loop-modeling/). The model evaluation presented previously was based on comparison with the highest resolution x-ray structure of each receptor subtype (see Table 1). Furthermore, we extended the comparison to all additional crystal structures of each GPCR subtype when available (from the GPCRSD database (55), see their list in Table S5). Calculated RMSD values showed no qualitative differences from those reported previously (see Table S6). In addition, we estimated the conservation of ECL2 structure among all available x-ray structures using previously chosen highest resolution structures as reference structures (see Table S5). The highest RMSD value = 2.5 Å was observed for M2R with a bound agonist, whereas most of the analyzed structures showed very low RMSD values < 1 Å. Furthermore, visual inspection of superimposed x-ray structures indicated very small differences in ECL2 conformation among the same receptor subtypes.

Analysis of example models

One of the most accurate predictions of ECL2 was obtained for two opioid receptors (DOR and MOR) and the CXCR4 receptor (see Fig. 2). For these receptors, all ECL2s formed two β-strands connected with a tight β-turn. Resulting loops resembled native-like conformations with high accuracy (see ECL2 prediction for the CXCR4 receptor, Fig. 3 a).

Figure 3

Example predictions superimposed on crystal structures. Predicted models of the second extracellular loop are shown in red, together with the first and third extracellular loops colored in green, and the reference crystal structure shown in gray. The following models are presented: (a) CXCR4 model, representative of the largest cluster (RMSDLC = 3.56 Å), (b) β1AR model, from the set of 10 lowest energy models (RMSDLE10=3.57 Å), (c) D3R model, from the set of 10 lowest energy models (RMSDLE10=2.88 Å), (d) A2AR model, from the set of 10 lowest energy models (RMSDLE10=5.88 Å). Visualizations of models for all receptor cases are shown in Fig. S1 and Fig. S2. To see this figure in color, go online.

Our calculations reproduced the structure of ECL2 for two receptors (M2R and M3R) with good accuracy (RMSDLC10 = 3.70 Å and 3.89 Å, respectively). The lowest energy structure for ECL2 in the NTR1 receptor highly resembled its crystal structure; however, the entire loop fragment was tilted toward TMH4, resulting in high RMSD (8.82 Å). NTR1 was crystallized with bound peptide NTS interacting with ECL2 and ECL3. Ligand-receptor interaction may alter the structure and orientation of ECLs (all ligand-ECLs interactions present in the 13 receptor crystal structures used in this study are listed in Table S7). Note that ligand-receptor interactions were not taken into account during the modeling procedure, which may result in a different ECL2 orientation in the lowest energy models when compared to x-ray structures. The best NTR1 loop structure observed in the trajectory yielded low RMSD (2.99 Å). In the case of two adrenergic receptors (β1AR and β2AR) resulting loops also adopted native-like conformation and a short α-helix was formed as seen in crystal structures. Nevertheless, the position of the short α-helix deviated among the resulting models when compared to the crystal structures. The differences in the localization of the short α-helix were probably related to the high mobility of this long receptor loop (see Fig. 3 b). The shortest predicted loop (13 residues) of the D3R receptor showed no secondary structure elements, resembling coil-like conformation. The predicted structure yielded RMSDLE10 = 2.88 Å and was in good agreement when compared to its crystal-derived form (see Fig. 3 c). For S1PR, NOP, and RHO receptors our predictions were much less accurate and scoring methods (energy evaluation and structural clustering) were not able to point toward loop structures sufficiently resembling conformations of the crystal structures. The lowest RMSD values observed in the trajectory were equal to 4.4 Å, 4.47 Å, and 4.2 Å, respectively. In the case of RHO, more accurate prediction may require simulating the presence of the N-terminal domain, which was not accounted for in our calculations. The N-terminus provides additional stabilization for the ECL2 conformation as seen in crystal structure. Note that rhodopsin is a photoreceptor protein with a covalently bound ligand (retinal) buried deep in the binding site, whereas other GPCRs interact with diffusible ligands. When analyzing loop conformations we have to keep in mind a different role of ECL2 of rhodopsin, which forms a stable hydrophobic lid covering the binding site. The longest predicted ECL2 (34 AA residues) for the A2AR receptor yielded RMSDLE10 = 5.88 Å. The predicted loop fragment (PRO:139 to ALA:165) differed from its native conformation by the absence of a short two-turn α-helix. The presence of the short helix was also not indicated in the input of secondary structure prediction (a coil type of secondary structure was assigned for the helix fragment, see Table S2). Therefore, in the A2AR case, a more accurate input of secondary structure may be helpful to generate models closer to the crystal structure. The remaining part of predicted ECL2 in A2AR (CYS:166 to VAL:172), in the vicinity of the ligand binding site, was in good agreement when compared to its crystal structure (one helical turn was created, see Fig. 3 d).

Discussion

Comparison with other structure prediction studies

In Table 3, we present comparison of our results to others in the literature. The comparison is based on two studies carried out by Goldfeld et al. (11) and Nikiforovich et al. (12). These studies, to the best of our knowledge, represent the most extensive and up-to-date reports concerning the restoration of ECL2 loops (performed for four GPCRs, as these were all crystallographically available GPCRs in 2009/2010 when those studies were carried out). We do not compare our results with ECL2 prediction made during homology modeling because the prediction of loops in homology models is a more difficult task than its restoration in crystal structures (56).

Table 3

Comparison of our results to others in the literature

Receptor name (loop length)	Our data		Data of other authors
Receptor name (loop length)	RMSD^BEST (Å)	RMSD^LE (Å)	RMSD^BEST (Å)	RMSD^LE (Å)	Reference, table, comments
A2AR (34)	4.7	5.9	5.9a	10.2a	(12), Table VI
			4.8a	4.8a	(12), Table VI, with inserted SS bonds
			4.8a	4.4a	(11), Table 1
β1AR (23)	3.3	5.8	4.3	6.4	(12), Table VI
β1AR (23)	3.3	5.8	4.3	1.6	(11), Table 1
β2AR (23)	3.4	5.5	3.8	7.4	(12), Table VI
β2AR (23)	3.4	5.5	3.8	2.2	(11), Table 1
RHO (24)	4.2	8.2	4.7	8.4	(12), Table VI
RHO (24)	4.2	8.2	4.7	3.4	(11), Table 1

RMSDs (root mean-square deviation in Å to the crystal structure) for the second extracellular loop are listed: RMSDBEST – representing the best model obtained and RMSDLE – representing the lowest energy model.

Our results are comparable to those of Nikiforovich et al. (12) and to those of Goldfeld et al. (11) in the case of A2Ar.

Note that for the A2AR case we used crystal loop structure (PDB ID: 4EIY) of 34 residues for computing RMSD values, whereas the other authors used crystal structure (PDB ID: 3EML) of 27 residues in which 7 residues were missing.

As shown in Table 3, our results are comparable to the other authors, except the lowest energy predictions (RMSDLE) of Goldfeld et al. (11) for β1AR, β2AR, and RHO receptors that matched the corresponding crystal structures with excellent RMSD values. It is worth emphasizing that our results and also those by Nikiforovich et al. (12) were obtained using a much less sophisticated modeling procedure (i.e., coarse-grained sampling combined with energy scoring that does not incorporate water or the lipid membrane). In our study, a single prediction took no longer than 0.5 h of single CPU time. More sophisticated methodologies (relying on a more precise system representation, like in the Goldfeld et al. (11) study) are computationally much more demanding. For instance, the prediction of the A2AR loop in the Goldfeld study took 145 days of single CPU time. A direct comparison of the performance of GPCR loop modeling procedures is hampered by differences in the experimental data used in the calculations (see the discussion on the comparison of Goldfeld et al. (11) and Nikiforovich et al. (12) results in PNAS letters (54) and (57)). In the paragraphs below, we outline important details of our modeling procedure and differences between our calculations and others. First, the definition of loop regions differs between studies. For cases presented in Table 3, we defined slightly shorter (typically by three residues) or slightly longer loop lengths (by two residues in the case of A2AR than Goldfeld et al. (11)). Similar differences in loop lengths exist between the Goldfeld et al. (11) and Nikiforovich et al. (12) studies. Because the differences are not large (compare Table 2 vs. Table 2 in (11) vs. Table VI in (12)), we believe they should not have any significant impact on prediction accuracy. Second, for the A2AR case, in the Goldfeld et al. (11) and Nikiforovich et al. (12) studies, the calculations of RMSD values did not involve the ECL2 fragment between residues 149 and 155 (which is missing in the 3EML crystal structure used in the calculations); thus, only 27 residues were involved. On the contrary, we used a complete 34 residue fragment (from the 4EIY crystal structure); therefore, the RMSD comparison is not straightforward. Third, our modeling procedure involved simulation of all EC loops (EC1, EC2, EC3) at the same time, using no knowledge of x-ray loop structure (except constraints on disulfide bridges). In contrast, in Goldfeld et al. (11), each single individual loop was obtained with the other loops fixed in their x-ray conformations. Fourth, our modeling procedure used experimental distance restraints on disulfide bridges (DBs) (see also the CABS setup in the Methods section). In all receptor cases, we used knowledge about a well-conserved DB between TM3 and EC2 loops (being the only DB in five receptors) and about DBs within EC loops (a single one in seven receptors, and three DBs in A2AR, see the list of DBs in Table S1). In turn, Nikiforovich et al. (12) used in their modeling information about the conserved DB between TM3 and EC2 only (allowing DBs within EC loops to be predicted). However, they also repeated the calculations with inserted DBs in EC loops. The insertion did not result in significant changes in β1AR and β2AR and helped to improve prediction accuracy in A2AR (which was predicted with a similar RMSDBEST value as in our calculations, see Table 3). In contrast, Goldfeld et al. (11) did not enforce experimental DBs (as explained in (57).); however, they used experimental atom-atom contact information within or between loops, derived from x-ray crystallography (Table S1 in (11)).

Loop dynamics

In our modeling procedure, loop models are generated by the CABS model through a series of small local moves controlled by the Monte Carlo method. The long series of such moves was shown to accurately describe the realistic dynamics of globular proteins. Namely, CABS predictions of protein dynamics were shown to be consistent with experimental data (for the characterization of protein folding pathway dynamics (37–39)) and MD simulation data (for the characterization of near-native dynamics (35,36)). This work provides an ensemble view of ECL2 structures (in sets of 10 or 100 cluster representatives or the lowest energy models); however, it is only validated by comparison with x-ray structures frozen in a single conformational option. Analysis of the predicted ensembles suggests that, at least for some of the modeled receptors, ECL2s may be subjected to large molecular movements. For instance, the lowest energy models of β2AR and DOR have ECL2 structures very similar to those observed in crystal structures but significantly tilted. Namely, the short α-helix of β2AR is directed toward TMH4 and the β-sheet of DOR ECL2 is tilted toward TMH3 (see Fig. 4). To our knowledge, such large-scale loop rearrangements in GPCRs were found only by extremely long MD simulations (58) and by coarse-grained modeling (12). Considering the high flexibility of ECL2s (suggested but not precisely characterized by experiment) and its functional importance (8,15–23), future theoretical studies should aim at the characterization of an ensemble view of EC loops and its validation through experimental approaches.

Conclusions

Previous reports showed that the CABS protein model offers state-of-the-art modeling capabilities, especially in difficult modeling cases (e.g., ab initio prediction of long protein fragments (24,26,27)). In this work, our goal was to test the ability of the CABS modeling approach to restore long loops (ECL2s) of 13 GPCRs (for all GPCRs with available crystal structures when this study was initiated). Based on the outcome of initial simulation runs, we introduced small modifications of the CABS algorithm that improved final performance. It should be noted that we used a low-cost computational procedure (coarse-grained CABS modeling that involves no membrane lipids, combined with a simple version of all-atom scoring and optimization). Despite the simplifications, our modeling approach yielded loop models of comparable accuracy as those obtained by other authors. Of importance, the results of our study provide benchmark data of newly crystallized GPCRs (other authors’ data were limited to the loop restoration of 4 or 5 GPCRs (11,12)) that enable researchers to compare their algorithms (our models are available from http://biocomp.chem.uw.edu.pl/GPCR-loop-modeling/). Our modeling method provides a framework for the development of more sophisticated procedures. Future developments may include: incorporation of more accurate scoring (model quality assessment) methods, introduction of the lipid bilayer in CABS simulation (which may limit loop movements), use of sparse data from experiment (e.g., from GPCRRD database (59)) or theoretical predictions (e.g., residue-residue contact predictions), introduction of ligand presence, use of x-ray interpretations on flexibility of TM end positions, inclusion of more accurate secondary structure prediction tools, or extension of the method to use GPCRs homology models. Finally, the CABS-based approach offers promising perspectives for the simulation of long timescale conformational dynamics of ECLs in GPCRs.

56 in total

1. The PSIPRED protein structure prediction server.

Authors: L J McGuffin; K Bryson; D T Jones
Journal: Bioinformatics Date: 2000-04 Impact factor: 6.937

2. Modeling flexible loops in the dark-adapted and activated states of rhodopsin, a prototypical G-protein-coupled receptor.

Authors: Gregory V Nikiforovich; Garland R Marshall
Journal: Biophys J Date: 2005-09-30 Impact factor: 4.033

3. BioShell--a package of tools for structural biology computations.

Authors: Dominik Gront; Andrzej Kolinski
Journal: Bioinformatics Date: 2006-01-10 Impact factor: 6.937

Review 4. Lifting the lid on GPCRs: the role of extracellular loops.

Authors: M Wheatley; D Wootten; M T Conner; J Simms; R Kendrick; R T Logan; D R Poyner; J Barwell
Journal: Br J Pharmacol Date: 2012-03 Impact factor: 8.739

5. CSB: a Python framework for structural bioinformatics.

Authors: Ivan Kalev; Martin Mechelke; Klaus O Kopec; Thomas Holder; Simeon Carstens; Michael Habeck
Journal: Bioinformatics Date: 2012-08-31 Impact factor: 6.937

6. Theoretical study of molecular mechanism of binding TRAP220 coactivator to Retinoid X Receptor alpha, activated by 9-cis retinoic acid.

Authors: Mateusz Kurcinski; Andrzej Kolinski
Journal: J Steroid Biochem Mol Biol Date: 2010-04-14 Impact factor: 4.292

7. Two mutations in extracellular loop 2 of the human GnRH receptor convert an antagonist to an agonist.

Authors: Thomas R Ott; Brigitte E Troskie; Roger W Roeske; Nicola Illing; Colleen A Flanagan; Robert P Millar
Journal: Mol Endocrinol Date: 2002-05

8. Loop prediction for a GPCR homology model: algorithms and results.

Authors: Dahlia A Goldfeld; Kai Zhu; Thijs Beuming; Richard A Friesner
Journal: Proteins Date: 2012-09-29

9. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints.

Authors: Robert Fredriksson; Malin C Lagerström; Lars-Gustav Lundin; Helgi B Schiöth
Journal: Mol Pharmacol Date: 2003-06 Impact factor: 4.436

Review 10. Proteomic applications of automated GPCR classification.

Authors: Matthew N Davies; David E Gloriam; Andrew Secker; Alex A Freitas; Miguel Mendao; Jon Timmis; Darren R Flower
Journal: Proteomics Date: 2007-08 Impact factor: 3.984

13 in total

1. Identification of avian vasotocin receptor subtype-specific antagonists involved in the stress response of the chicken, Gallus gallus.

Authors: Seong W Kang; Srinivas Jayanthi; Gurueswar Nagarajan; Thallapuranam Krishnaswamy Suresh Kumar; Wayne J Kuenzel
Journal: J Biomol Struct Dyn Date: 2018-05-17

2. A benchmark study of loop modeling methods applied to G protein-coupled receptors.

Authors: Lee H Wink; Daniel L Baker; Judith A Cole; Abby L Parrill
Journal: J Comput Aided Mol Des Date: 2019-05-23 Impact factor: 3.686

3. Flexible docking of peptides to proteins using CABS-dock.

Authors: Mateusz Kurcinski; Aleksandra Badaczewska-Dawid; Michal Kolinski; Andrzej Kolinski; Sebastian Kmiecik
Journal: Protein Sci Date: 2019-11-11 Impact factor: 6.725

4. Optical Control of Dopamine Receptors Using a Photoswitchable Tethered Inverse Agonist.

Authors: Prashant C Donthamsetti; Nils Winter; Matthias Schönberger; Joshua Levitz; Cherise Stanley; Jonathan A Javitch; Ehud Y Isacoff; Dirk Trauner
Journal: J Am Chem Soc Date: 2017-12-13 Impact factor: 15.419

5. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site.

Authors: Mateusz Kurcinski; Michal Jamroz; Maciej Blaszczyk; Andrzej Kolinski; Sebastian Kmiecik
Journal: Nucleic Acids Res Date: 2015-05-05 Impact factor: 16.971

6. Protein-peptide molecular docking with large-scale conformational changes: the p53-MDM2 interaction.

Authors: Maciej Pawel Ciemny; Aleksander Debinski; Marta Paczkowska; Andrzej Kolinski; Mateusz Kurcinski; Sebastian Kmiecik
Journal: Sci Rep Date: 2016-12-01 Impact factor: 4.379

7. A protocol for CABS-dock protein-peptide docking driven by side-chain contact information.

Authors: Mateusz Kurcinski; Maciej Blaszczyk; Maciej Pawel Ciemny; Andrzej Kolinski; Sebastian Kmiecik
Journal: Biomed Eng Online Date: 2017-08-18 Impact factor: 2.819

8. Modeling EphB4-EphrinB2 protein-protein interaction using flexible docking of a short linear motif.

Authors: Maciej Pawel Ciemny; Mateusz Kurcinski; Maciej Blaszczyk; Andrzej Kolinski; Sebastian Kmiecik
Journal: Biomed Eng Online Date: 2017-08-18 Impact factor: 2.819

Review 9. Modeling of Protein Structural Flexibility and Large-Scale Dynamics: Coarse-Grained Simulations and Elastic Network Models.

Authors: Sebastian Kmiecik; Maksim Kouza; Aleksandra E Badaczewska-Dawid; Andrzej Kloczkowski; Andrzej Kolinski
Journal: Int J Mol Sci Date: 2018-11-06 Impact factor: 5.923

10. Molecular insights into the interaction of hemorphin and its targets.

Authors: Amanat Ali; Bincy Baby; Soja Saghar Soman; Ranjit Vijayan
Journal: Sci Rep Date: 2019-10-14 Impact factor: 4.379