Julian E Fuchs1, Bernd Wellenzohn2, Nils Weskamp2, Klaus R Liedl1. 1. Theoretical Chemistry, Faculty of Chemistry and Pharmacy, University of Innsbruck , Innrain 82, 6020 Innsbruck, Austria. 2. Research Germany/Lead Identification and Optimization Support, Boehringer Ingelheim Pharma GmbH & Co. KG , Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany.
Abstract
Biopharmaceuticals hold great promise for the future of drug discovery. Nevertheless, rational drug design strategies are mainly focused on the discovery of small synthetic molecules. Herein we present matched peptides, an innovative analysis technique for biological data related to peptide and protein sequences. It represents an extension of matched molecular pair analysis toward macromolecular sequence data and allows quantitative predictions of the effect of single amino acid substitutions on the basis of statistical data on known transformations. We demonstrate the application of matched peptides to a data set of major histocompatibility complex class II peptide ligands and discuss the trends captured with respect to classical quantitative structure-activity relationship approaches as well as structural aspects of the investigated protein-peptide interface. We expect our novel readily interpretable tool at the interface of cheminformatics and bioinformatics to support the rational design of biopharmaceuticals and give directions for further development of the presented methodology.
Biopharmaceuticals hold great promise for the future of drug discovery. Nevertheless, rational drug design strategies are mainly focused on the discovery of small synthetic molecules. Herein we present matched peptides, an innovative analysis technique for biological data related to peptide and protein sequences. It represents an extension of matched molecular pair analysis toward macromolecular sequence data and allows quantitative predictions of the effect of single amino acid substitutions on the basis of statistical data on known transformations. We demonstrate the application of matched peptides to a data set of major histocompatibility complex class II peptide ligands and discuss the trends captured with respect to classical quantitative structure-activity relationship approaches as well as structural aspects of the investigated protein-peptide interface. We expect our novel readily interpretable tool at the interface of cheminformatics and bioinformatics to support the rational design of biopharmaceuticals and give directions for further development of the presented methodology.
Biopharmaceuticals
are defined as pharmaceutical products consisting
of (glyco)proteins and/or nucleic acids.[1] Therefore, this class of drugs mainly comprises peptide hormones,
recombinant proteins, monoclonal antibodies, and therapeutic antibodies.
Biopharmaceuticals allow access to new target classes and are therefore
considered more innovative than small-molecule drugs.[2] Accordingly, a record number of 11 new biopharmaceuticals
were approved by the FDA in 2014.[3] Therefore,
biopharmaceuticals hold promise to claim a larger share of the drug
market in the future.[4] Additionally, biosimilars
are increasingly entering the market after patent expiry of original
biopharmaceutical products.[5]Biopharmaceuticals
generally pose new challenges for the drug discovery
process, which has historically been focused on small molecules. This
includes their analytical characterization,[6] delivery and formulation[7,8] after optimization of
the biotechnological production process,[9,10] and their
molecular properties.[11] Computational modeling
techniques hold great promise to handle the complexity of the generated
data and, for example, to guide affinity optimization of therapeutic
proteins[12] or peptides.[13,14] Peptide drugs are often considered as the border between small-molecule
drugs and biopharmaceuticals, as their synthesis is mainly chemistry-driven.[15]Traditionally, quantitative structure–activity
relationship
(QSAR) and quantitative structure–property relationship (QSPR)
modeling approaches neglect the three-dimensional (3D) structure of
the peptides and proteins and are thus 2D-based. Nevertheless, approaches
using 3D interaction fields[16] or comparative
modeling techniques have been described.[17] These 3D techniques have to cover the bioactive conformation of
the usually highly flexible peptide ligands, which poses additional
challenges for modeling.[18] In a pioneering
study, Sneath derived the first molecular descriptors for the 20 natural
amino acids and applied them in QSAR modeling.[19] Later, these 2D descriptors were refined to capture chemically
intuitive information via the Z-scale model[20] or the isotropic surface area/electronic charge
index (ISA/ECI) model.[21] In contrast to
substitution matrices frequently applied in bioinformatics (e.g.,
PAM,[22] BLOSUM[23]), these descriptors are designed to reflect chemical in contrast
to evolutionary similarity. Amino acid descriptors have typically
been used to derive QSAR equations by linear regression techniques.[24]Over the past decade, the innovative cheminformatic
concept of
“matched molecular pair analysis”[25] has been gaining increasing attention. Herein, pairs of
molecules with a single difference in chemical structure are analyzed
with respect to changes in a physicochemical or biological property.[26] Data mining in large databases (e.g., bioactivities
stored in ChEMBL[27] or in-house data sets[28]) allows trends from matched molecular pairs
or matched molecular series to be applied subsequently for prediction
of substitution effects in new molecules.[29] A key advantage of matched molecular pair analysis is the direct
chemical interpretability of predictions (“white box”)
based on local SAR rules.[30] Recently, efforts
have been made to put purely ligand-based matched molecular pairs
into structural context and thereby identify the structural background
of observed bioactivity trends.[31,32]Herein we expand
the scope of matched molecular pairs to the analysis
of macromolecular data from proteins and peptides and introduce matched
peptides, a concept we expect to hold great promise for the development
of biopharmaceuticals. As an example application, we investigate peptide
binding to the major histocompatibility complex class II (MHC II),
a surface receptor crucial for T-cell activation in immune response.[33,34] A crystal structure of the receptor shows that the peptide is bound
to a hydrophobic surface groove that is flanked by two α-helices.[35] Through the availability of structural information,
most modeling approaches aiming at the prediction of peptide binding
to MHC molecules employ machine learning techniques,[36] e.g., MULTIPRED.[37] Large peptide
data sets have been compiled and used for the optimization of consensus
approaches based on machine learning methods.[38] Application of these techniques allows for the optimization of peptides
with desired immunological properties.[39] Quantitative modeling techniques are rarely applied toward MHC binding
but include classical amino acid-descriptor-based QSAR methods[40] as well as molecular dynamics simulation approaches.[41] Here we apply the novel matched peptides strategy
to the prediction of MHC II binding affinities and demonstrate the
direct interpretability of the predictions in a structural context.
Methods
Matched
Molecular Pairs and Matched Peptides
Identification
of matched molecular pairs involves an exhaustive pairwise matching
of molecular graphs. To simplify this task, molecules are usually
fragmented to aid the search for corresponding substructures.[42] Older implementations additionally required
a definition of the allowed transformations within matched pairs,[43] thus prohibiting the identification of unknown
chemical modifications associated with a change in the molecular property
under investigation.In the context of peptide and protein data,
a molecular transformation corresponds to a point mutation. Therefore,
sequences differing by a single character correspond to matched peptides.
For identification of these single substitutions, a pairwise sequence
alignment that can be performed by standard bioinformatics methodologies
is required. Sequence alignment for matched peptides is trivial since
they consistently differ by a single amino acid. This sequence alignment
step simplifies the graph matching problem for small molecules described
above, since linear peptides have the advantage of having defined
C- and N-termini as well as identical chemical backbones (see Figure ). Furthermore, insertions
and deletions between sequence pairs can be considered as trivial
additions to the exchange of single amino acids and thus may also
be involved as additional transformations in matched peptides. Matched
peptides therefore represent a special case of matched molecular pairs
that are easy to detect on the sequence level.
Figure 1
Matched molecular pairs
and matched peptides. The correspondence
of matched molecular pairs to matched peptides is exemplified by the
chemical transformation of benzene to toluene by exchange of a hydrogen
for a methyl group (orange). The same transformation is involved when
an alanine-glycine-alanine tripeptide is exchanged with alanine-alanine-alanine.
The latter transformation may be easily encoded when using standard
one-letter amino acid codes. This representation allows for fast processing
of large databases connecting sequences with respective molecular
properties.
Matched molecular pairs
and matched peptides. The correspondence
of matched molecular pairs to matched peptides is exemplified by the
chemical transformation of benzene to toluene by exchange of a hydrogen
for a methyl group (orange). The same transformation is involved when
an alanine-glycine-alanine tripeptide is exchanged with alanine-alanine-alanine.
The latter transformation may be easily encoded when using standard
one-letter amino acid codes. This representation allows for fast processing
of large databases connecting sequences with respective molecular
properties.Code implementation was
performed using standard Python tools in
combination with a custom node in KNIME[44] aiming to identify all sequence pairs differing in a single sequence
position and thus forming matched peptide pairs. Since all of the
sequences analyzed in the current study were point mutations relative
to a consensus sequence and had a constant length, no gaps occurred,
and the presented analysis is therefore independent of gap penalties
in the alignment step.Affinity differences observed for matched
peptides were aggregated
over all peptide positions, assuming that the effects of amino acid
exchanges are independent of their position and thus reflect an average
of all binding-site environments. We will demonstrate in the Results and Discussion section that this assumption
is in general valid for MHC II binders and also discuss ways to cover
position-specific aspects in matched peptide analysis.
Analyzed Data
Set
We analyzed experimental binding
affinities for a panel of 198 peptides toward MHC II molecules from
Marshall et al.[45] Fluorescence-based assays
were applied to obtain IC50 values in the nano- and subnanomolar
range for all of the peptides using a 12-point inhibition curve for
binding with three replicates each. Peptide sequence were grouped
around the template peptide sequence AAYAAAAAAAAAA, where the central
11 amino acids of the peptide with length 13 were varied. For positions
2 and 4 to 12, all 20 natural amino acids were tested, while only
seven apolar amino acids (F, I, L, M, V, W, Y) were tested for position
3. All of the sequences represent single-point mutations around the
template sequence. Additionally, the length of the peptide (13 amino
acids) was kept constant, and thus, no insertions or deletions were
present among the matched peptide sequences. All of the presented
analyses are based on negative decadic logarithms of reported IC50 values and their ratios based on molar units.
Statistical Framework
On the basis of statistical analysis
of bioactivity data from ChEMBL, the expected standard deviation of
the IC50 data from a homogeneous source is 0.2 log units.[46] Therefore, an average effect size of at least
0.20 log units establishes statistical significance versus the null
hypothesis (no change in activity) at the p = 0.05
level with at least 10 matched molecular pairs or matched peptides.[47] Therefore, the term “significant transformations”
used later on explicitly refers to those amino acid substitutions
associated with an effect on the binding affinity that is statistically
significantly different to zero.
Correlation to Amino Acid
Descriptors
We correlated
trends in bioactivities (affinity shifts) observed from analysis of
matched peptides to differences in amino acid properties. Therefore,
we employed three descriptors from the Z-scale approach
describing hydrophilicity (z1), steric
bulk (z2), and electronic properties (z3).[20] Furthermore,
we analyzed correlations to differences in the isotropic surface area
(ISA) and the electronic charge index (ECI).[21] Correlations between activity differences and property differences
were assessed via calculation of Pearson’s linear correlation
coefficient r and Spearman’s rank correlation
coefficient ρ to capture both quantitative and qualitative dependences.
Statistical analyses were performed using R.[48]
Structural Interpretation and Correlation to Peptide Specificity
To interpret the bioactivity data in structural context, we compared
the observed trends to a cocrystal structure of an MHC II in complex
with an antigenic peptide (PDB entry 1AQD(49)). We visualized
the structure in Pymol[50] and extracted
polar contacts as well as electrostatic properties of the binding-site
region using default settings.The specificities of respective
MHC II binding-site regions were assessed on the basis of binding
affinity distributions for single amino acids. Therefore, we converted
the affinity ratios to decadic log units and analyzed the distribution
of binding affinities for each single site. In the case of a highly
specific region, major differences in binding affinity are expected,
corresponding to a narrow peak in the distribution. On the contrary,
a completely nonspecific position shows an equal distribution of binding
affinities. Such experimental distributions can be converted to single
values depicting local specificity via an information-entropy-based
approach, as demonstrated earlier for amino acid distributions in
protease substrates.[51] Thereby, an entropy
of 0 corresponds to the highest specificity, whereas a value of 1
corresponds to maximum binding promiscuity with constant binding affinities.
All of the peptide residues except for position 3, where only seven
of the 20 amino acids were tested, were examined individually.
Results
and Discussion
Trends in Binding Affinity from Matched Peptide
Analysis
Applying matched peptides, we extracted information
on quantitative
changes in MHC II binding affinity induced by single-point mutations.
On the basis of 198 peptide sequences and their respective experimental
binding affinities, we extracted trends on how amino acid substitutions
increase and decrease molecular interactions via matched peptide analysis.
In total we extracted 2117 matched peptides that formed the basis
of the statistical evaluation. The order of identified matched pairs
was normalized to consistently reflect gains in affinity.The
amino acid substitutions with the strongest effects on the observed
binding affinities are summarized in Table . Overall, for 88 of 190 transformations
(44%) we observed a significant change in binding affinity. The strongest
effect was achieved by a replacement of proline by cysteine, leading
to a gain of 0.715 log units, which corresponds to 5 times stronger
binding. The standard error of the mean (SEM) observed over 10 examples
for this transformation was 0.291 log units, which is much smaller
than the average effect size. This indicates that replacement of proline
by cysteine indeed leads to an increase in binding affinity largely
independent of the peptide position where the transition occurs.
Table 1
Transformations with Major Effects
on the MHC II Binding Affinitya
transformation
pairs
mean affinity
difference [log units]
SEM [log units]
P → C
10
0.715
0.291
P → Y
10
0.672
0.329
P → L
10
0.624
0.354
P → M
10
0.620
0.320
D → C
10
0.606
0.147
P → S
10
0.606
0.332
P → N
10
0.596
0.366
P → V
10
0.594
0.339
P → A
19
0.587
0.215
K → C
11
0.585
0.250
Matched peptides were used to
extract the 10 substitutions leading to the largest changes in affinity.
These transformations were normalized to reflect affinity increases
and sorted according to decreasing effect size; the standard error
of the mean (SEM) is indicated as measure of statistical uncertainty.
Removal of proline residues increases the binding strength to MHC
II molecules, as does the removal of charged residues. On the contrary,
inclusion of cysteine residues increases the binding affinity to the
receptor.
Matched peptides were used to
extract the 10 substitutions leading to the largest changes in affinity.
These transformations were normalized to reflect affinity increases
and sorted according to decreasing effect size; the standard error
of the mean (SEM) is indicated as measure of statistical uncertainty.
Removal of proline residues increases the binding strength to MHC
II molecules, as does the removal of charged residues. On the contrary,
inclusion of cysteine residues increases the binding affinity to the
receptor.Several additional
substitutions of proline among the transformations
with the strongest effects on binding affinity indicates that this
residue is in general detrimental to MHC II binding. Replacement by
smaller residues is favored, and especially the inclusion of cysteine
residues leads to major gains in binding affinity. Additionally, replacement
of charged residues is associated with gains in binding affinity.
Within the top 10 transformations we found the substitution of aspartate
and lysine by cysteine, both of which led to an affinity gain of approximately
0.6 log units, corresponding to a factor of 4 on a linear scale. The
aspartate to cysteine transformation is associated with a particularly
small SEM of 0.147, indicating particularly conserved effects over
the whole binding-site region.In addition to 88 transformations
with significant effects on the
binding strength, we characterized 102 transformations associated
with only minor changes in MHC II binding. Here we observed the absence
of amino acids described to be associated with particularly weak or
strong binding. Therefore, the frequency of cysteine residues and
charged residues within these transformations was reduced or those
residues were completely missing. We found several transformations
involving small amino acids that appear to be readily interchangeable
within MHC II binding peptides (see Table ). This behavior is illustrated by the substitution
of valine by isoleucine, which is associated with a mean affinity
difference smaller than 0.001 log units as well as a small SEM of
0.059 log units over 11 peptide pairs. This indicates that the subtle
transformation involving an addition of a methylene group in the side
chain does not alter the binding constant independent of the position
of the exchanged amino acid. On the contrary, some substitutions involving
major chemical changes do not affect the MHC II binding affinities
significantly. This includes, for example, the substitution of a small
glycine residue by an aromatic histidine residue, which points to
the minor importance of residue size in MHC II binding. On average,
this transformation is associated with an affinity gain of less than
0.01 log units. The larger SEM of 0.154 log units over 10 pairs indicates
some dependence on the position of the transformation in this case.
Table 2
Transformations with Little Effect
on Binding Affinitya
transformation
pairs
mean affinity
difference [log units]
SEM [log units]
V → I
11
<0.001
0.059
T → H
10
0.001
0.129
V → N
10
0.002
0.107
I → W
11
0.003
0.114
L → F
11
0.003
0.173
V → W
11
0.003
0.128
G → T
10
0.008
0.124
A → V
19
0.008
0.073
G → H
10
0.009
0.154
N → S
10
0.010
0.133
Matched peptides
were used to
search for amino acid substitutions with the smallest changes in experimentally
measured binding affinity to MHC II. The top 10 transformations were
sorted according to increasing effect on the binding affinity. Statistical
uncertainty is shown by the SEM. The complete absence of proline residues
and charged amino acids indicates their major impact on the observed
binding affinities to MHC II. The smallest effect is observed for
the replacement of a valine by an isoleucine, corresponding to the
addition of a methylene group.
Matched peptides
were used to
search for amino acid substitutions with the smallest changes in experimentally
measured binding affinity to MHC II. The top 10 transformations were
sorted according to increasing effect on the binding affinity. Statistical
uncertainty is shown by the SEM. The complete absence of proline residues
and charged amino acids indicates their major impact on the observed
binding affinities to MHC II. The smallest effect is observed for
the replacement of a valine by an isoleucine, corresponding to the
addition of a methylene group.On the basis of the same matched peptide analysis, we aimed to
identify the most favored and unfavored residues in MHC II binding
peptides. Since the peptide transformations have been arranged to
reflect gains in binding affinity, we counted the occurrence of all
20 amino acids on the left side of the transformation (Nleft, smaller activity) and on the right side (Nright, higher activity) among all 88 pairs showing
a significant change in binding affinity. The differences in these
occurrences (Nright – Nleft), which are given in Figure A, enable qualitative identification of favorable
and unfavorable amino acids. We found six amino acids to be mainly
disfavored in MHC II binding proteins: proline, all four charged amino
acids (aspartate, glutamate, lysine, and arginine), and the polar
amino acid glutamine. Glutamine shows a smaller negative effect (a
total of five pairs with decreased affinity) than the other five amino
acids, all of which exhibit very similar disruptive effects (a total
of 14 or 15 pairs with decreased affinity). On the other end of the
amino acid ranking, tyrosine was found to enhance the MHC II binding
affinity in a total of nine significant peptide pairs, followed by
cysteine, which was identified as a favorable replacement in a total
of eight cases. Mostly small amino acids follow in the ranking, including
alanine, methionine, asparagine, and serine. The difference in size
might also explain the marked difference observed in comparisons of
peptides containing asparagine versus glutamine. The smaller asparagine
appears to be favorable for MHC II binding (+6 pairs), whereas glutamine
is unfavorable (−5 pairs).
Figure 2
Amino acids favored and disfavored in
MHC II binding. On the basis
of matched peptides, we identified amino acids frequently associated
with a loss in binding affinity. (A) Differences in the number of
significant matched peptides leading to a gain versus a loss of affinity
in MHC II binding. (B) Absolute average differences in bioactivity
when exchanging an amino acid with any other natural amino acid. Proline
residues as well as charged amino acids are strongly disfavored in
MHC II binding. On the other end of the spectrum, cysteine and tyrosine
residues enhance peptide–MHC II interactions.
Amino acids favored and disfavored in
MHC II binding. On the basis
of matched peptides, we identified amino acids frequently associated
with a loss in binding affinity. (A) Differences in the number of
significant matched peptides leading to a gain versus a loss of affinity
in MHC II binding. (B) Absolute average differences in bioactivity
when exchanging an amino acid with any other natural amino acid. Proline
residues as well as charged amino acids are strongly disfavored in
MHC II binding. On the other end of the spectrum, cysteine and tyrosine
residues enhance peptide–MHC II interactions.
Quantitative Effects of Amino Acid Exchanges
A similar
analysis can be performed on the basis of the average effect size
rather than the occurrence of amino acids on each side of the transformation.
Here, all of the transformations, including those with insignificant
effects on the binding affinity, were analyzed to yield a quantitative
ranking of amino acid contributions to MHC II binding affinities (see Figure B). Consistent with
the other presented analyses, proline is associated with a major decrease
in MHC II binding affinity representing the strongest effect observed
within the data set. On average, 0.474 log units can be gained by
replacement of a proline with any other natural amino acid. A replacement
of either charged amino acid leads to a gain of between 0.31 and 0.36
log units, thus halving the binding affinity. On the other end of
the spectrum, the introduction of cysteine and tyrosine residues is
favored and leads to a gain in affinity by 0.26 to 0.28 log unit.
Several mainly small and hydrophobic residues are slightly favored
and show affinity increases of around 0.1 log units on average. Hydrophobic
residues have been described to drive association of protein–protein
interactions in general.[52] As shown by
statistical analysis of crystal structure data, aliphatic amino acids
and tyrosine residues predominantly form cores of protein–protein
interaction areas.[53]To allow a comparison
with a conventional peptide analysis method, we divided our data set
of 198 peptides into two halves according to higher and lower binding
affinity to MHC II and analyzed enriched amino acids among both sets.
We report residues with an enrichment factor larger than 1.5 in decreasing
order of their enrichment. We found that hydrophobic cysteine, methionine,
isoleucine, valine, and phenylalanine are enriched among high-activity
binders. Low-affinity binders on the other hand show a disproportionately
high content of aspartate, glutamate, proline, arginine, and tryptophan.
The observed trends for favored and disfavored residues are similar
to those from the pairwise comparisons conducted for matched peptide
analysis. Nevertheless, the results from matched peptide analysis
provide deeper insights since experimental uncertainty can be handled
directly and no classification into affinity classes via an arbitrary
cutoff is required.
Correlation to Amino Acid Descriptors
As we observed
clear correlations between affinity differences and chemical properties
of amino acids, we tested the performance of classical QSAR amino
acid descriptors in reproducing those. Therefore, we calculated property
differences for all of the transformations in five different dimensions
based on z-scales (three descriptors) as well as
the ISA/ECI scheme (two descriptors). Then we correlated the property
differences to the experimentally measured affinity differences identified
via matched peptide analysis and analyzed them using Pearson’s
linear correlation coefficient and Spearman’s rank correlation
coefficient.We observed weak correlations between individual
descriptor differences of substituted amino acids and the associated
bioactivity changes of 190 matched peptide pairs (see Figure S1 for correlation plots). The most pronounced
correlation was identified for the z2 axis
representing residue size. Here we found an inverse correlation (r = −0.33, ρ = −0.32), indicating that
a reduction in residue size is associated with an increase in binding
affinity. This index is closely followed by the ECI, which designates
both positively and negatively charged residues with large values
and therefore indicates polarity (r = −0.31,
ρ = −0.31). The observed inverse correlation indicates
that a reduction in polarity is favored in MHC II binding. The third
axis contributing to binding affinity is the z1 descriptor that reflects hydrophobicity. We again observed
an inverse correlation (r = −0.29, ρ
= −0.27) and conclude in agreement with the ECI results that
a reduction in polarity favors MHC II binding. The other two descriptors
(z3 and ISA) show correlation coefficients
smaller than or equal to r = 0.1. Thus, electronic
properties and residue surface area were found to be less important
in MHC II recognition. The lack of correlation to the surface area
appears surprising since a reduction in residue size was identified
as favorable. We attribute this seeming contradiction to the inclusion
of solvation factors in the ISA calculation, which leads to the lowest
ISA values for asparagine and aspartate even though these amino acids
have a larger molecular weight than, for example, glycine.As
these analyses included proline residues in the data set, we
wondered whether removal of this residue with a different backbone,
and thus a refined depiction of the side-chain properties, would increase
the observed correlations. Therefore, we repeated the correlation
analysis covering only the 171 pairs not affecting proline. We found
that the magnitudes of all of the correlation coefficients consistently
increased, pointing toward the special status of this amino acid.
The correlation between the affinity change and the difference in
ECI was strengthened from r = −0.31 to r = −0.49 upon removal of the uncharged but still
disfavored proline residue. Similarly, the correlation of z1 differences to the binding affinity changes
jumped from r = −0.29 to r = 0.40 when proline was included. This points to the special importance
of peptide backbone hydrogen bonding in MHC II interactions.
Structural
Analysis of the MHC II Receptor–Peptide Complex
To
interpret these SAR trends in a structural context, we investigated
a cocrystal structure of the MHC II receptor with a cognate peptide
of length 15 amino acids.[49] The peptide
is bound in an extended conformation and is tightly bound to the receptor
via 14 hydrogen bonds of the backbone carbonyls and amides (see Figure A). This large number
of interactions indicates the importance of backbone hydrogen bonding
in receptor binding and therefore explains the observed trend that
removal of prolines, which lack the backbone NH capable of hydrogen-bond
formation, leads to stronger binding. In contrast, amino acid side
chains are involved in fewer interactions with the receptor then the
peptide backbone. Most of the side chains of the bound peptide are
bound to shallow pockets rather than to pronounced cavities and show
limited contact area. This explains the lack of particular amino acids
being favored as a result of the absence of polar interactions with
the receptor. This is further underlined by the absence of charged
regions in the binding site (see Figure B). The binding-site region for side chains
is mostly a flat solvent-exposed surface patch that therefore can
bind several amino acids. The hydrophobicity of a large part of the
binding site additionally leads to a preference for apolar amino acids
due to energy gains by desolvation.[54] Since
such hydrophobic and solvent-exposed interfaces are expected to be
multispecific,[55] the MHC II may perform
its key function in immunology to recognize and present diverse peptides
from pathogens.[56]
Figure 3
Structural interpretation
of MHC II–peptide interactions.
On the basis of the cocrystal structure of MHC II and a high-affinity
peptide ligand,[49] we analyzed molecular
interactions. (A) The peptide ligand (sticks in elemental colors with
carbon in gray) is bound to a broad surface groove of the MHC II (green
cartoon and lines with semitransparent surface). Hydrogen bonding
(yellow dots) is mainly observed via the peptide backbone. (B) An
electrostatic map of the MHC II binding site is shown (blue, positively
charged; white, neutral; red, negatively charged). The peptide is
bound to a predominantly neutral region of the binding site. Additionally,
most of the amino acid side chains are bound to solvent-exposed regions
on the surface. These binding-site properties explain the promiscuity
of MHC II, which is crucial for its immunological function.
Structural interpretation
of MHC II–peptide interactions.
On the basis of the cocrystal structure of MHC II and a high-affinity
peptide ligand,[49] we analyzed molecular
interactions. (A) The peptide ligand (sticks in elemental colors with
carbon in gray) is bound to a broad surface groove of the MHC II (green
cartoon and lines with semitransparent surface). Hydrogen bonding
(yellow dots) is mainly observed via the peptide backbone. (B) An
electrostatic map of the MHC II binding site is shown (blue, positively
charged; white, neutral; red, negatively charged). The peptide is
bound to a predominantly neutral region of the binding site. Additionally,
most of the amino acid side chains are bound to solvent-exposed regions
on the surface. These binding-site properties explain the promiscuity
of MHC II, which is crucial for its immunological function.
Promiscuity in MHC II Receptor–Peptide
Interactions
This promiscuous binding of peptides is also
reflected in our data
set, where the range of measured binding affinities is approximately
3 log units from the most tightly bound peptide to the weakest binder
(see Figure S2 for affinity distributions).
Many amino acid exchanges show negligible effects on the observed
binding constant, in agreement with promiscuous binding. With the
assumption of a conserved position of the peptide termini, which might
be provided via the central binding register of approximately nine
residues in MHC II,[57] the specificity for
each of the respective peptide positions can be quantified as information
entropy from the distribution of binding affinities for single-point
mutants. An entropy of 0 depicts the highest specificity and thus
affinity for only a single amino acid, while an entropy of 1 corresponds
to completely unspecific binding with constant affinities for all
binding partners.[51]We found all
of the peptide positions to be predominantly unspecific, with information
entropies ranging from 0.83 for position 11 to 0.97 for position 10.
At the most specific peptide position 11, seven amino acids (D, E,
F, I, K, R, and Y) show binding affinities differing from the most
favorable amino acid, cysteine, by at least 1 log unit. On the contrary,
for the almost completely unspecific position 10 all of the binding
affinities lie within 0.8 log units, with tyrosine as the most favored
residue. This latter situation reflects the typical situation in MHC
II peptide recognition, where seven of the 10 investigated pockets
show an information entropy larger than 0.9. On the other hand, more
specific positions coincide with classical MHC II specificity sites
4, 6, and 9,[58] which show information entropies
of 0.90, 0.88, and 0.91, respectively. The overall promiscuity is
also reflected by a comparison of activity rankings in respective
binding pockets, an approach similar to a matched series. We found
only weak correlations between subpocket profiles, with Spearman rank
correlation coefficients between −0.32 and +0.57. In fact,
32% of the affinity profiles appear anticorrelated the inherent experimental
error is neglected, which we expect to limit the applicability of
this approach given the narrow affinity ranges within the data set.The overall promiscuity shows the applicability of the matched
pair approach, which treats all amino acid exchanges equally and does
not include the specific position. This inherent limitation of the
approach could be overcome by the use of context-specific matched
peptides in analogy to context-specific matched pairs, where the chemical
environment of the transformation is included.[59] In the standard implementation of matched pairs and peptides,
transformations showing context-specific effects are attributed with
higher standard errors. This can be seen in our data set for the proline
to arginine transformation, which shows a weak average gain in binding
affinity of 0.158 log units in 10 pairs. The standard error of the
mean for this transformation is 0.400 log units, showing that the
same transformation is favorable for binding in four cases and unfavorable
in six. Here the position of the transformation and therefore the
chemical context are crucial for the observed effect, although MHC
II is overall highly unspecific. The strongest effect for the proline
to arginine transformation is observed at position 4, where the binding
affinity is increased by more than 3 orders of magnitude. This represents
the most dramatic effect of a single substitution and occurs at position
4, where proline appears to be especially disfavored compared with
all other amino acids (see Figure ). Position 6 is the second peptide position where
the presence of proline is strongly disfavored. Here replacement by
asparagine yields a gain in affinity of 2.6 log units. In terms of
context-specific matched peptides, positions 4 and 6 show the strongest
affinity differences in the analyzed data set. We expect that context-specific
matched pairs are especially required in order to reliably predict
changes in binding affinities to specific receptors. Physico-chemical
properties like solubility, on the contrary, are expected to be more
independent of the chemical surroundings and thus easier to capture
using standard matched peptides.
Figure 4
Context specificity of matched peptides:
Negative decadic logarithms
of MHC-II binding affinity ratios vs the template peptide AAYAAAAAAAAAA
are shown as a heat map, where blue boxes represent a gain in affinity,
white boxes indicate no change, and red boxes indicate affinity losses.
Black boxes represent missing data at position 3, where only seven
amino acids were tested experimentally. Substitutions at positions
4 and 6 lead to major differences in binding affinity, and proline
is particularly disfavored at these positions. Context-specific analysis
of affinity data rather than averaging over all peptide positions
is expected to capture these effects in more detail.
Context specificity of matched peptides:
Negative decadic logarithms
of MHC-II binding affinity ratios vs the template peptide AAYAAAAAAAAAA
are shown as a heat map, where blue boxes represent a gain in affinity,
white boxes indicate no change, and red boxes indicate affinity losses.
Black boxes represent missing data at position 3, where only seven
amino acids were tested experimentally. Substitutions at positions
4 and 6 lead to major differences in binding affinity, and proline
is particularly disfavored at these positions. Context-specific analysis
of affinity data rather than averaging over all peptide positions
is expected to capture these effects in more detail.
Future Potential of Matched Peptide Analysis
In summary,
we have introduced a new quantitative modeling tool at the interface
of cheminformatics and bioinformatics: matched peptides. By means
of this extension of matched molecular pair analysis, protein, peptide,
and nucleic acid sequence-related properties such as bioactivity,
solubility, aggregation, pI, bioavailability, stability, and expression
yield may be predicted. The inclusion of nonstandard residues and
modified amino acids or nucleotides and cross-links is straightforward
given sufficient training data. The coverage of cyclic sequences,
which are of high interest because of the additional stability of
cyclic peptides and proteins,[60] requires
the use of special software tailored for alignment of cyclic sequences
because of undefined terminal residues.[61] When matched peptide analysis is extended beyond the analysis of
single-point mutations, cooperative effects in structure–activity
relationships may be identified using nonadditivity cycles, as recently
demonstrated for small molecules via matched molecular pair analysis.[62]In comparison with the wealth of bioactivity
data readily available for small-molecule ligands (e.g., via ChEMBL[27]), the data basis for peptide binding data is
much sparser. To date there exist only a few databases listing peptide
affinities to general targets (e.g., JenPep[63]) that could be used for matched peptide analysis using public domain
data. Most of the open data are centered around immunology and MHC
binding (e.g., IEDB[64]), hindering broad
application of the technique in the academic environment. As a result
of recent biotechnological advances in the development of protein,
peptide, and nucleic acid microarrays, a plethora of qualitative and
quantitative binding data are available.[65] Similar data can be obtained, e.g., from proteomics techniques,[66] protein-fragment complementation assays,[67] or phage display.[68] Additionally, synthetic access to peptides can be automated using
solid-phase synthesis.[69] The decision of
which peptide to synthesize next may be supported by using the presented
matched peptide approach, which allows existing bioactivity data to
be captured in an intuitive way and provides a qualitative and quantitative
ranking of favored residues. Therefore, the matched peptide strategy
shows synergy and complementarity with classical QSAR techniques.
Conclusion
We have presented matched peptides as an extension
of standard
matched molecular pair analysis that pushes the technology to the
interface of cheminformatics and bioinformatics. Matched peptides
correspond to single-point mutants, which are easily identified via
pairwise sequence alignments, simplifying the data analysis. Differences
in molecular properties may be identified via the matched peptide
strategy and can subsequently be applied to identify SAR trends and
predict properties of new sequences. Herein we have presented a statistical
analysis of MHC II peptide binding data and discussed observed trends
with respect to classical QSAR as well as structural data of the complex.
We expect our presented methodology, which is readily applicable to
peptides, proteins, and nucleic acids, to be of high relevance for
the rational design of novel biopharmaceuticals.
Authors: George Papadatos; Muhammad Alkarouri; Valerie J Gillet; Peter Willett; Visakan Kadirkamanathan; Christopher N Luscombe; Gianpaolo Bravi; Nicola J Richmond; Stephen D Pickett; Jameed Hussain; John M Pritchard; Anthony W J Cooper; Simon J F Macdonald Journal: J Chem Inf Model Date: 2010-10-25 Impact factor: 4.956
Authors: Chia-En A Chang; William A McLaughlin; Riccardo Baron; Wei Wang; J Andrew McCammon Journal: Proc Natl Acad Sci U S A Date: 2008-05-21 Impact factor: 11.205
Authors: Julian E Fuchs; Susanne von Grafenstein; Roland G Huber; Michael A Margreiter; Gudrun M Spitzer; Hannes G Wallnoefer; Klaus R Liedl Journal: PLoS Comput Biol Date: 2013-04-18 Impact factor: 4.475