Sara Terzoli1, Guido Tiana1. 1. Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy.
Abstract
Studying the conformations involved in the dimerization of cadherins is highly relevant to understand the development of tissues and its failure, which is associated with tumors and metastases. Experimental techniques, like X-ray crystallography, can usually report only the most stable conformations, missing minority states that could nonetheless be important for the recognition mechanism. Computer simulations could be a valid complement to the experimental approach. However, standard all-atom protein models in explicit solvent are computationally too demanding to search thoroughly the conformational space of multiple chains composed of several hundreds of amino acids. To reach this goal, we resorted to a coarse-grained model in implicit solvent. The standard problem with this kind of model is to find a realistic potential to describe its interactions. We used coevolutionary information from cadherin alignments, corrected by a statistical potential, to build an interaction potential, which is agnostic about the experimental conformations of the protein. Using this model, we explored the conformational space of multichain systems and validated the results comparing with experimental data. We identified dimeric conformations that are sequence specific and that can be useful to rationalize the mechanism of recognition between cadherins.
Studying the conformations involved in the dimerization of cadherins is highly relevant to understand the development of tissues and its failure, which is associated with tumors and metastases. Experimental techniques, like X-ray crystallography, can usually report only the most stable conformations, missing minority states that could nonetheless be important for the recognition mechanism. Computer simulations could be a valid complement to the experimental approach. However, standard all-atom protein models in explicit solvent are computationally too demanding to search thoroughly the conformational space of multiple chains composed of several hundreds of amino acids. To reach this goal, we resorted to a coarse-grained model in implicit solvent. The standard problem with this kind of model is to find a realistic potential to describe its interactions. We used coevolutionary information from cadherin alignments, corrected by a statistical potential, to build an interaction potential, which is agnostic about the experimental conformations of the protein. Using this model, we explored the conformational space of multichain systems and validated the results comparing with experimental data. We identified dimeric conformations that are sequence specific and that can be useful to rationalize the mechanism of recognition between cadherins.
Cadherins are surface
proteins responsible for cell–cell
recognition and adhesion.[1] They are involved
in different stages of tumor progression, like in angiogenesis[2] and metastasis,[3] and
several germline mutations are found in solid tumors.[4] For this reason, they are important potential targets of
antitumoral molecules.A large number of members of the cadherin
superfamily have been
discovered. Particularly important for their relationship with cancer
are the so-called “classical” cadherins of types I and
II, which are present only in vertebrates and classified according
to the tissue where they were first identified. The human genome encodes
114 cadherins for example E-cadherin was found in epithelial tissues,
N-cadherin in neurons, and P-cadherin in placenta.[5]Classical cadherins display five extracellular (EC)
domains, which
are structurally similar and display a significant sequence similarity,
both comparing domains of the same protein type and across different
types (see Figure S1 in the Supporting
Information). At the interface between consecutive domains, it is
bound a calcium ion; the EC domains in the absence of calcium are
more flexible[6] and the protein loses its
adhesive function.[7]The adhesion
between two cells is stabilized by the trans dimerization
of the most distal EC1 domains; crystallization experiments
indicate that trans dimerization occurs through the
swap of their N-termini.[8] Crystal structures
of EC1-EC2 domains, mutated at the N-termini to prevent domain swapping,
show that another, X-shaped, trans dimeric conformation
is possible. Destabilization of the X-dimer by mutating a residue
at the junction between EC1 and EC2 slows down the domain-swapping
event, qualifying the X-dimer as an on-pathway intermediate.[9]In many in vivo conditions, classical cadherins
are homophilic,
in the sense that cells expressing the same cadherin associate, while
those expressing different cadherins segregate.[10] This property is at the basis of cellular binding specificity
in tissues and is critical in the correct development of organisms,
as ectopic expression of cadherins leads to morphological defects.[11] However, the homophilic effect in vivo does
not seem to be a straightforward consequence of the affinity between
cadherins of the same type. In fact, analytical-ultracentrifugation
and surface-plasmon-resonance experiments of purified EC1-EC2 domains
do not provide dissociation constants that reflect the homophilic
relations observed in vivo.[12,13] Although some physical models have been proposed to explain homophilic
interactions in systems composed of two cellular types expressing
different cadherins,[14,15] they cannot explain all aspects
of cellular sorting[16,17] and cannot be easily extended
to the case of many cell types. Thus, the molecular binding code remains
poorly understood, and indeed, it requires further investigation.[18]Computational methods could in principle
complement the available
experimental data giving an atomic-level description, analogously
to what crystal structures do but also describing the conformational
changes and the fluctuations among multiple states associated with
the molecular recognition between cadherins. The main problem in this
respect is that the system one wishes to simulate, for example that
composed of two pairs of EC1-EC2 domains, has a molecular weight of
∼50 kDa, and thus, it is huge from the point of view of standard
atomistic simulations in explicit solvent.Coarse-grained models
based on experimental data can be useful
in this context. By describing the protein system in implicit solvent
and giving a united-atom representation of some atomic groups, they
allow computers to sample reasonably fast the conformational space
of the system. For example, a Cα model interacting
with a structure-based potential was used to predict the dimerization
constants both for membrane-bound and freely diffusing cadherins.[19] Similarly, with a simple coarse-grained model,
it was possible to simulate the cooperativity between cis and trans
interactions.[20]Defining an interaction
potential based on experimental data within
the framework of the principle of the maximum entropy, guarantees
the realism of the model and minimizes the risk of introducing subjective
bias in the description of the system.[21]In the present work, we employed a coevolutionary interaction
potential,[22] calculated from the set of
homologous sequences
of the cadherin superfamily. In brief, a coevolutionary potential
describes the interaction between amino acids in a protein in such
a way to predict the correct correlations between mutations in the
alignment of homologs, as obtained from the Pfam database.[23] This kind of modelling has proven efficient
in predicting the native conformation protein monomers[22] and dimers[24] of their
conformational fluctuations,[25,26] to study protein aggregation,[27,28] and the effect of mutations in protein stability.[29,30] Importantly, we used the coevolutionary potential as it is, without
filtering in any way and not including the knowledge of the native
structure of the proteins.We first showed that the model is
able to reproduce several experimental
data observed for cadherins of different types. Then, we sampled the
conformational space of pairs of cadherins of the same kind and different
kinds to identify the sequence-dependent dimeric conformations that
could be relevant for the mechanism of molecular recognition.
Methods
Protein chains were modelled with a united-atom representation
in an implicit solvent, similar to others commonly employed in the
literature.[31,32] Each amino acid is described[33] by the positions of its N, CA, and C atoms and
by that of another bead, which represents the whole side chain and
is set in the position of its center of mass (see Figure S2 in the Supporting Information). Bond lengths and
angles are maintained fixed at the values defined by the initial conformation.The interaction energy of the system is defined aswhere i and j run on all atoms, the function
θ(x) is a step contact function, which takes
the values 1 if x ≥ 0 and 0 if x<0, RHC is the hard-core radius, the
hard-core energy JHC→+ ∞, is, and js run on the side-chain atoms, J is the interaction matrix,
and R is the interaction range. We set R = 8.5Å and RHC = 2Å for all atoms. The chains are put in
a cubic box with hard walls of volume V.The
interaction matrix was obtained from a coevolutionary model
corrected by a statistical potential. The coevolutionary model takes
the alignment of homologs and returns a tensor J(σ, τ)
of interaction energies between any residue of type σ at position I and any residue of type τ at position K. This is calculated within the pseudolikelihood approximation[34] using the code described in ref (35) (see also section S1 in the Supporting Information). Specifically,
we obtained the alignment (code PF00028) from the Pfam database,[23] projecting the alignment onto the sequence of
each pdb structure (i.e., removing the sites that correspond to gaps
in the sequence of the pdb protein) and discarding those with similarity
larger than 90%. In this way, we constructed an alignment of 8 ×
103 sequences.In the standard procedure,[36] the pseudolikelihood
is maximized under the constraint of a l2 regularizer of kind α∑(J(σ, τ))2 meant to correct finite-size
effects. An important ingredient of the present model is the use of
a different type of l2 regularizer, namely,where ϵ(σ, τ)
is a statistical potential[37] obtained from
the frequency of contacts in known protein structures. This is a system-independent
potential calculated aswhere f(σ,
τ) is the frequency of contacts between the side chains of the
nonredundant set of proteins of the pdb (with a nonredundancy threshold
of 10–7). The goal of this regularizer is to provide
a priori knowledge of the interactions in the system, especially useful
in the case of amino acids that appear with poor statistics or that
coevolve under the effect of biological constraints other than the
stability of the protein, the effect that would produce falsely strongly
interacting pairs.[38] The regularizer also
avoids the problem of choosing a guage for the maximum pseudo likelihood
problem.After calculating the tensor J(σ, τ)
of interacting
energies between any possible pair residue σ and τ that
can appear at sites i and j of the
protein, we projected it onto the sequence of interest {σ}, obtaining the matrix J(σ, σ). This
is then normalized by to set
the scale of two-body interactions
to 1. At the variance with the approach of refs (25) and (27), no filtering is applied
to the energy elements, any pair of contacts interacting with the
corresponding matrix elements is in contact.The trans interaction between identical proteins
is set using the same parameters of the corresponding cis interaction.To simulate the rigidifying effect of the Ca+ ion at
the interface between the EC1 and EC2 domains of cadherins, we implemented
an infinite energy well between residues D67 and E101, between E70
and D134, and between R68 and I139.The last term in the potential
of eq depends on the
dihedrals of the backbone of the protein,
in the formwhere ϕ are the Ramachandran dihedrals (i.e., alternatively
φ and ψ), wα and wβ are the sequence-dependent propensities of being in
α and β structures, respectively, as predicted by PsiPred,[39] ϕ0 and ϕ0 are the typical dihedrals associated with α (−63°
for φ and −44° for ψ) and β (−105°
for φ and −140° for ψ) structures, and we
set σ = 30° and σ = 40°.Thus, the potential
depends on three energy (meta)parameters, namely,
α, ε0, and ε. We explored the space of parameters in the case of small proteins
(see section S2 in the Supporting Information)
and found the optimal values α = 10–5, ε0 = −1, and ε =
90.Simulations were performed with a Metropolis Monte Carlo
(MC) algorithm,
using as elementary moves multiple flips, pivots, and roto-translations
of the center of mass of connected systems of chains.[40] To generate the initial conformations for the simulations,
we started from the pdb structure, mapped onto the coarse-grained
model (i.e., removing O, N, and H atoms and placing the sidechan bead
at the center of mass of the sidechain); in simulations of the dimer,
we place the two chains randomly in the box. Then a low-temperature
simulation (T = 10–3) is carried
out mainly to remove the steric clashes associated with the potential
defined by eq . Parallel-tempering
simulations[41] were performed by trying
an exchange between replicas of adjacent temperatures every 1000 MC
steps.Since protein domains tend to attract each other quite
strongly,
we performed an umbrella sampling[42] to
equilibrate the system more efficiently. The interaction between different
chains is rescaled by a factor k < 1. The correct equilibrium probabilities p(r) of the conformations of the system are then recovered
a posteriori from the simulated probabilities p(r) aswhere Etrans(r) is the rescaled interaction between
the chains, β = 1/T is the inverse temperature
(we set Boltzmann’s constant to 1, expressing temperatures
in energy units), and the angular brackets with the subscript k at the denominator indicate the average obtained by the
simulation. In this way, we could make the simulation faster, having
a larger k between
the two chains, and recover the correct equilibrium properties a posteriori.From the simulations of the dimer with rescaled trans interactions,
we calculated the dissociation constants aswhere f is
the fraction of dimeric conformations, obtained from the simulation
after rescaling with eq , Na is Avogadro’s number, and V is the volume of the box containing two molecules. A conformation
is defined dimeric if there is any trans attractive
contact between any two atoms of the two chains.
Results
Coevolutionary
Potentials Reproduce the Native State of Monomeric
EC1
As a first step, we simulated the dynamics of EC1 of
N-, E-, and P-cadherins (residues 1–99) in conditions of infinite
dilution where the protein is monomeric. We used as putative monomeric
reference conformations the crystallographic structures 2qvi (for
N-cadherin), 2o72 (for E-cadherin), and 4zmz (for P-cadherin). At
low temperature, the average RMSD calculated from parallel-tempering
simulations with respect to the crystallographic conformations is
≈0.5 nm for N- and E-cadherins and ≈0.7 nm for P-cadherin
(see Figure a). This
is comparable with that of the proteins previously studied (cf. section S2 in the Supporting Information). The
calculated contact-probability maps are also native-like (cf. Figure f). These two facts
suggest that the minimum requirement for the model to be useful that
is to have the experimental conformations as low-temperature equilibrium
states is met. It is important to stress that, unlike structure-based
models,[43] here, the model is agnostic of
the native conformation of the proteins. Moreover, the use of the
regularizer given by eq based on a system-independent statistical potential seems important
to obtain a realistic potential since not using it increases markedly
the RMSD of the protein (cf. Figure S3).
Figure 1
Simulations
of monomeric EC1 domains of N-, E-, and P-cadherins.
(a) The average RMSD to the crystallographic structures, (b) the average
energy, (c) and the specific heat with respect to simulation temperature
(in energy units) are displayed for the three monomers. (d) The conformational
fluctuations of the simulated proteins calculated at T = 3.6, compared to the b-factors (dashed line) of the monomeric
N-cadherin 1NCJ. (e) The equilibrium structures obtained from the
simulations in which the thickness of the cartoon reflects the fluctuations
of the corresponding monomers (left side), compared to the crystallographic
structures (right side). (f) The contact probabilities obtained from
the simulations at T = 3.6, compared to the crystallographic
contact maps.
Simulations
of monomeric EC1 domains of N-, E-, and P-cadherins.
(a) The average RMSD to the crystallographic structures, (b) the average
energy, (c) and the specific heat with respect to simulation temperature
(in energy units) are displayed for the three monomers. (d) The conformational
fluctuations of the simulated proteins calculated at T = 3.6, compared to the b-factors (dashed line) of the monomeric
N-cadherin 1NCJ. (e) The equilibrium structures obtained from the
simulations in which the thickness of the cartoon reflects the fluctuations
of the corresponding monomers (left side), compared to the crystallographic
structures (right side). (f) The contact probabilities obtained from
the simulations at T = 3.6, compared to the crystallographic
contact maps.At varying temperature, all three
proteins display a main transition
at temperature T ≈ 5 (in energy units, see Figure c) between a native
and denatured state. The transition, as described by the model, appears
as poorly cooperative; although we are not aware of calorimetric studies
of the EC1 domains of cadherins, it is likely that, as in most implicit-solvent
models,[44] this is an artifact associated
with the use of reduced degrees of freedom.The thermal fluctuations
of the residues display similar patterns
in the three monomeric cadherins (see Figure d), but their relative widths are protein-dependent
(see Figure e): E-cadherin
displays larger fluctuations in the proximal region (i.e., that linked
toEC2 in the full complex), while P-cadherin fluctuates more in the
distal region, and N-cadherin behaves in an intermediate way (cf. Figure e). The fluctuations
of the residues of N-cadherin display a significant correlation (Pearson’s r = 0.47, p-value <10–5) with the b-factors
of its crystallographic structure (see dashed line in Figure d; note that 1NCJ is the only
available structure of a monomeric EC1 domain of a classical cadherin).
Model Correctly Predicts the Dimeric Structures of EC1 of N-Cadherin
The next step was to simulate two EC1 domains, which is the system
studied in the original work of Shapiro and coworkers.[8] The two chains are put in a spherical box of volume V ≈ 3.2 × 104 nm3, corresponding
to a concentration of ≈100 μM. The fraction fB of conformations displaying inter-chain contacts is
displayed in Figure f. The experimental kD obtained at room
temperature from analytical size-exclusion chromatography on EC1 alone
is 166 μM,[45] corresponding to a fB= 0.3 in the simulation volume.
This allowed us to set the simulation temperature T = 3.6 (in energy units) as that corresponding to room temperature
(see red arrow in Figure f).
Figure 2
Result of simulations of two EC1 domains of N-cadherin at T = 3.6. (a) The calculated average contact map compared
with that of crystal structures 1NCI and 1NCH. The colored boxes highlight
the intrachain contacts present in the crystal structures (1NCI in
green and 1NCH in red). Some representative structures obtained from
a clustering of the trajectories are (b) the two disjoint chains,
(c) a conformation resembling the adhesion dimer, (d) a conformation
bound at the ends, (c) and a conformation resembling the domain-swapped
strand dimer. The associated percentages indicate the fraction of
the trajectory in each cluster. (f) The fraction fb of bound monomers as a function of temperature. The
red arrow indicates the experimental value.
Result of simulations of two EC1 domains of N-cadherin at T = 3.6. (a) The calculated average contact map compared
with that of crystal structures 1NCI and 1NCH. The colored boxes highlight
the intrachain contacts present in the crystal structures (1NCI in
green and 1NCH in red). Some representative structures obtained from
a clustering of the trajectories are (b) the two disjoint chains,
(c) a conformation resembling the adhesion dimer, (d) a conformation
bound at the ends, (c) and a conformation resembling the domain-swapped
strand dimer. The associated percentages indicate the fraction of
the trajectory in each cluster. (f) The fraction fb of bound monomers as a function of temperature. The
red arrow indicates the experimental value.The average contact map simulated at T = 3.6 is
displayed in Figure a, together with the contact maps of the two alternative crystallographic
structures found for EC1[8] describing the
domain-swapped “strand dimer” (pdb code 1NCI) and the “adhesion
dimer” (pdb code 1NCH). The binding of the two monomers does not perturb
their internal structure, the intrachain contacts remain the same
as the crystallographic ones, and the average RMSD remains ≈0.5
as in the monomeric case (cf. Figure a).The simulations also display several interchain
contacts of varying
stabilities. Two sets of contacts, marked with green and red boxes
in Figure a, correspond
to the contacts of the strand and adhesion dimers, respectively. Another
set of contacts, formed with nonnegligible probability, cannot be
explained by the available crystallographic structures.The
simulated conformations were clustered based on their mutual
similarity, and the most representative conformations are shown in Figure b–e, together
with the associated probabilities. In approximately one third of the
conformations, the two chains are disjoint (Figure b); in another third, they display a conformation
similar to that of the adhesion dimer (Figure c), making the interchain contacts marked
with red boxes in Figure a. In 13% of the sampled conformations, the tryptophans 2W
of each chain is in contact with the hydrophobic pocket of the other
chain (see Figure e and the green-boxed contacts in Figure a). An 11% of the conformations populates
a dimeric conformation (Figure d), which is not similar to any available crystallographic
structure, while the remaining 17% of the conformations populates
dimeric structures that cannot be easily clustered into well-defined
groups.
Simulated EC1-2 Fluctuate among Different Conformations, Including
X- and Domain-Swapped Dimers
Simulations of two copies of
the chain composed of the EC1-2 domain are carried out for 109 MC steps for N-, E-, and P-cadherins, rescaling the interchain
interactions as described in the Methods section.
All proteins display a very heterogeneous set of dimeric conformations.
A clustering analysis whose results are reported in Figure reports that the three proteins
can assemble in many possible conformations, and among them, there
are conformations resembling the swap-dimer and X-dimer although with
a probability lower than expected. In several conformations, the two
chains are side-by-side or display interactions between their proximal
ends.
Figure 3
Representative conformations of clusters obtained from the simulation
of the EC1-2 dimer of N-, E-, and P-cadherins. For each representative,
the percentage of conformations populating that cluster is indicated.
The red monomers are oriented to display its distal end upward.
Representative conformations of clusters obtained from the simulation
of the EC1-2 dimer of N-, E-, and P-cadherins. For each representative,
the percentage of conformations populating that cluster is indicated.
The red monomers are oriented to display its distal end upward.In the case of E-cadherin,[46] the average
contact map is displayed in Figure a. Of the 45 clusters of contacts with probability
larger than 0.1, which are apparent in the contact map, 7 are those
that stabilize the strand dimer (marked with green boxes, cf. also Figure b) and 15 are those
that stabilize the X-dimer (purple boxes, cf. also Figure c). The remaining contacts
cannot be explained from the crystallographic structures, but results
from other conformations are displayed in Figure . However, 18 of these unexplained contacts
involve all residues that are known to be associated with mutations
observed in tumoral cells that decrease cell–cell adhesion
and induce metastasis.[4]
Figure 4
(a) Simulated contact
map of E-cadherin; the lower-left and upper-right
quadrants display intrachain contacts, the upper-left and lower-right
quadrant display contacts between the two chains. (b) As a reference,
we displayed the contact maps of the crystal structures of the strand
dimer and X-dimer, and their contacts (with probability larger than
0.1) are reported with green and purple boxes, respectively, in the
simulated map. The blue boxes indicate contacts between residues whose
mutation is observed in tumoral tissues. (d) Comparison between the
distribution of distances between residues 73–75 and residues
114–116 calculated from the simulation (in blue) and measured
by DEER.
(a) Simulated contact
map of E-cadherin; the lower-left and upper-right
quadrants display intrachain contacts, the upper-left and lower-right
quadrant display contacts between the two chains. (b) As a reference,
we displayed the contact maps of the crystal structures of the strand
dimer and X-dimer, and their contacts (with probability larger than
0.1) are reported with green and purple boxes, respectively, in the
simulated map. The blue boxes indicate contacts between residues whose
mutation is observed in tumoral tissues. (d) Comparison between the
distribution of distances between residues 73–75 and residues
114–116 calculated from the simulation (in blue) and measured
by DEER.The contact maps simulated for
P- and N-cadherins display overall
fewer clusters of contacts (cf. Figure S6 in the Supporting Information). N-cadherin has 26 clusters, 6 of
them are associated with the strand dimer, 11 with the X-dimer, and
10 with other conformations that are displayed in Figure . P-cadherin[47] has 21 clusters, 5 of them are associated with the strand
dimer, 11 with the X-dimer, and 6 with other conformations.For E-cadherin, one can also compare the results of simulations
with those of double electron–electron resonance (DEER), which
is able to measure the distribution of distances between labelled
side chains within 6 nm. E-cadherin labelled at residues 73–75
and 114–116 display a double peak between 4.0 and 4.5 nm, interpreted
as arising from the strand dimer.[13] Our
simulations display a similar double peak (cf. Figure d) as the result of the contribution of all
conformations and is displayed in Figure .On the other hand, the simulated
distances between residues 135
of the two chains match poorly with the results obtained by DEER (cf. Figure S7 in the Supporting Information), most
likely because the angle between the axes of the two chains is strongly
affected by the coarse graining of the model.
Simulations of Heterophilic
Complexes
Further simulations
were carried out with pairs of different EC1-2 domains to simulate
heterophilic interactions. The average contact maps of the hybrid
systems are displayed in Figure a–c. The system E-N populates in a detectable
way the swapped and the X-dimer and the system P-N only the X-dimer,
while E-P is none of the two. However, all systems populate multiple
dimeric conformations most of which are system dependent.
Figure 5
Average contact
maps obtained simulating the EC1-2 domains of hybrid
systems composed of (a) E- and N-cadherins, (b) E- and P-cadherins,
(c) and N- and P-cadherins. The colored boxes indicate the contacts
of the swapped dimer (in green), X-dimer (in purple), and other dimeric
conformations sampled in the simulation (in orange). (d) A comparison
of the sequences of the three cadherins in which identical residues
are marked with a star, and chemically similar residues are marked
with a colon. (e) The percentage of residues associated with the dimeric
structures that are similar in the three proteins. (f) The experimental
and simulated dissociation constants (cf. also Table S1) for the various systems under study.
Average contact
maps obtained simulating the EC1-2 domains of hybrid
systems composed of (a) E- and N-cadherins, (b) E- and P-cadherins,
(c) and N- and P-cadherins. The colored boxes indicate the contacts
of the swapped dimer (in green), X-dimer (in purple), and other dimeric
conformations sampled in the simulation (in orange). (d) A comparison
of the sequences of the three cadherins in which identical residues
are marked with a star, and chemically similar residues are marked
with a colon. (e) The percentage of residues associated with the dimeric
structures that are similar in the three proteins. (f) The experimental
and simulated dissociation constants (cf. also Table S1) for the various systems under study.The residues participating in the dimeric contacts are highlighted
in the alignment displayed in Figure d. It is apparent that residues participating to the
swapped dimer are more conserved than those participating to the X-dimer,
and those participating to the other types of dimers are even less
conserved (cf. also Figure e). As a consequence, the contacts that stabilise these other
types of dimers are more specific than those in swapped and X-dimers.The dissociation constants obtained from the simulations for the
homophilic and heterophilic cases respect, in most cases, the order
of those obtained experimentally from analytical ultracentrifugation
and plasmon resonance analysis,[13] see Figure f. The major difference
is in the fact that the N-E complex is very weak (kD > 100 μM) in the simulations while should be
of
the same order of magnitude as the P-E complex (∼50 μM).
Discussion
The atomic-scale picture that we have of the
encounter mechanism
between cadherins is essentially based on the crystal structures of
the wild-type and of mutant proteins.[8,9,13,46] The behavior of cadherins
in solution, and even more in vivo, could be more complex than the
static and homogeneous situation observed in crystals. Unfortunately,
for such a large system as an assembly of cadherins, there are a few
experimental techniques that can report indirect conformational data
in solution,[13,48−51] leaving behind the problem of
turning these data into a structural understanding of their recognition
mechanism.Simulating the mutual search and binding of multiple
cadherins
with computational techniques can be a way to obtain details that
can complement experimental data and describe all the conformations
involved in the mechanism of molecular recognition at the atomic level.
The main problem in pursuing this approach with standard molecular-dynamics
simulations is that one having to deal with a system that on the scale
of computer calculations is large (∼50 kDa for the EC1-2 dimer)
and takes a long time to bind (∼1 s for E-cadherin[48]).Coarse-grained models, combined with
advanced sampling algorithms,
can be useful to study this kind of system. Using a united-atom representation
in which each amino acid is represented by 4 atoms and the solvent
is treated implicitly, we could sample at equilibrium the conformational
space of two copies of the EC1-2 domains with advanced Monte Carlo
algorithms in a few days of the computational time. With this model,
even the simulation of the whole EC1-5 system and more than two chains,
necessary to account for the cooperativity associated with cadherin
clustering,[20] does not appear to be computationally
unreachable.The main problem with computational models of biomolecules
and
coarse-grained model in particular is to build a realistic interaction
potential. A strategy that is gaining popularity is to build potentials
based on available experimental data in the framework of the principle
of the maximum entropy and then to validate the model with independent
data.[52,53]A particularly abundant set of data
available for proteins and
cadherins in particular is sequence data in the form of alignments
of homologous protein sequences. Coevolutionary analysis is a way
to extract a contact potential from these data, finding the most likely
potential that could have produced the available alignment as the
result of natural evolution. There are several implementations of
this idea,[26,34] all of them giving comparable
results.[38] Coevolutionary potentials proved
to be useful in predicting the native state of single-domain proteins,[26,34] their energy profile,[25] protein–protein
interactions[54] to study protein aggregation,[27,28] and the thermodynamic effect of point mutations.[29,30]In the present work, we applied a coevolutionary potential
to the
problem of molecular recognition between cadherins. The coevolutionary
potential was corrected with a system-independent statistical potential,
obtained from the contact probabilities obtained from the whole pdb.
This appears to be an important step because it corrects those terms
of the coevolutionary potential associated with poor statistics in
the cadherin alignments and then otherwise affected by large noise.We validated the model in several ways, also estimating what are
its limitations. First, we verified that the monomeric system displays
at low temperature a unique native conformation compatible with the
crystallographic one. This was tested for the EC1 domain of three
different cadherins and for two other small proteins used as independent
control. The positive result that we obtained is not straightforward
because, unlike structure-based models widely used to study conformational
changes in proteins,[55] we never used any
information about the native conformation of the system during the
construction of the potential. The accuracy with which we could simulate
the native state of monomers is somewhat worse than the experimental
resolution of X-ray structures, being quantified by an RMSD of the
order of 0.5–0.7 nm. This is due to the united-atom modelling
of amino acids that is required to fasten the simulation but that
does not lead to a perfect packing of side chains.Moreover,
we compared the simulated trajectories with the experimental
b-factor, with the results of analytical size-exclusion chromatography,
with double electron–electron resonance experiments and with
the dissociation constants obtained by analytical ultracentrifugation
and plasmon resonance analysis. Also, the dimeric structures generated
by the simulations were compared to the crystallographic structures
available for the N-, E-, and P-cadherins. Interestingly, in all the
three simulations, we find structures similar to the swap-dimer andX-dimer
that were identified in crystals. Since wild-type cadherins crystallize
into swap-dimers, one would have expected that this structure displayed
the largest population fraction in the simulation. One reason why
this is not the case could be that the swap-dimer is easier to crystallize,
and thus, the experiments select only one of the possible conformers.
Of course, the two-body terms in the potential are the results of
several approximations, and the coarse-graining of the model also
affects the entropy of the system. These are quantities on which that
the probability depends exponentially and consequently affects consistently
the statistical weight of the sampled conformations.A way to
improve the results, matching better the experimental
knowledge we have of the system, is to include it in the simulation
as an energy term in the framework of the principle of the maximum
entropy,[52,56] as done for an example in ref (28). A drawback of this strategy
in the case of cadherins is the scarcity of data at conformational
level and thus the impossibility of validating the results with independent
data.One can thus wonder whether the heterogeneous set of binding
modes
of the EC1-2 dimers observed in the simulations is realistic or just
an artifact of the model. An element pointing toward their realism
is that all pathogenic mutations observed in E-cadherin[4] affect the interface of these conformations,
leading in vivo to a diminished adhesion and increased migration propensity
of the cells. This fact suggests that the dimeric conformers we found
may play a role in the overall binding mechanism.This fact
becomes more evident when simulating heterophilic interactions
in hybrid systems composed of different types of cadherins. Also in
this case, we could observe conformations resembling the swapped and
the X-dimer, in addition to a complex set of other dimeric conformations.
Interestingly, while the residues that stabilize swapped and X-dimers
are quite independent on the kind of cadherin and thus weakly specific,
the other dimers interact through contacts that are much more system
dependent. One can then speculate that the multiplicity of dimeric
conformations different from the swapped and X-dimers play a role
in the selective molecular recognition between cadherins.
Conclusions
We proposed a coarse-grained model interacting through a potential
based on the coevolutionary analysis of homologous proteins to study
the molecular recognition between molecules that are too large to
be studied with standard molecular-dynamics simulations. The model
relies on sequence data and is agnostic of the structural properties
of the protein. It was validated comparing the results of Monte Carlo
simulations with experimental data of various types, giving good agreement
except for the relative populations of the different types of dimeric
structures that depend exponentially on the energies that define the
model, and are thus quite sensitive to them.A thorough sampling
of the conformational space of dimers composed
of pairs of EC1-2 domains of E-, P-, and N-cadherins show that, besides
the known swapped and X-dimers, the systems populate multiple other
dimeric conformations, which are more sequence-dependent and thus
could play an important role in the selectivity of molecular recognition
between cadherins.
Authors: Ludovico Sutto; Simone Marsili; Alfonso Valencia; Francesco Luigi Gervasio Journal: Proc Natl Acad Sci U S A Date: 2015-10-20 Impact factor: 11.205
Authors: Ying Li; Nicole L Altorelli; Fabiana Bahna; Barry Honig; Lawrence Shapiro; Arthur G Palmer Journal: Proc Natl Acad Sci U S A Date: 2013-09-25 Impact factor: 11.205