Literature DB >> 26501781

Matched Peptides: Tuning Matched Molecular Pair Analysis for Biopharmaceutical Applications.

Julian E Fuchs¹, Bernd Wellenzohn², Nils Weskamp², Klaus R Liedl¹.

Abstract

Biopharmaceuticals hold great promise for the future of drug discovery. Nevertheless, rational drug design strategies are mainly focused on the discovery of small synthetic molecules. Herein we present matched peptides, an innovative analysis technique for biological data related to peptide and protein sequences. It represents an extension of matched molecular pair analysis toward macromolecular sequence data and allows quantitative predictions of the effect of single amino acid substitutions on the basis of statistical data on known transformations. We demonstrate the application of matched peptides to a data set of major histocompatibility complex class II peptide ligands and discuss the trends captured with respect to classical quantitative structure-activity relationship approaches as well as structural aspects of the investigated protein-peptide interface. We expect our novel readily interpretable tool at the interface of cheminformatics and bioinformatics to support the rational design of biopharmaceuticals and give directions for further development of the presented methodology.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2015 PMID： 26501781 PMCID： PMC4658635 DOI： 10.1021/acs.jcim.5b00476

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

Biopharmaceuticals are defined as pharmaceutical products consisting of (glyco)proteins and/or nucleic acids.[1] Therefore, this class of drugs mainly comprises peptide hormones, recombinant proteins, monoclonal antibodies, and therapeutic antibodies. Biopharmaceuticals allow access to new target classes and are therefore considered more innovative than small-molecule drugs.[2] Accordingly, a record number of 11 new biopharmaceuticals were approved by the FDA in 2014.[3] Therefore, biopharmaceuticals hold promise to claim a larger share of the drug market in the future.[4] Additionally, biosimilars are increasingly entering the market after patent expiry of original biopharmaceutical products.[5] Biopharmaceuticals generally pose new challenges for the drug discovery process, which has historically been focused on small molecules. This includes their analytical characterization,[6] delivery and formulation[7,8] after optimization of the biotechnological production process,[9,10] and their molecular properties.[11] Computational modeling techniques hold great promise to handle the complexity of the generated data and, for example, to guide affinity optimization of therapeutic proteins[12] or peptides.[13,14] Peptide drugs are often considered as the border between small-molecule drugs and biopharmaceuticals, as their synthesis is mainly chemistry-driven.[15] Traditionally, quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR) modeling approaches neglect the three-dimensional (3D) structure of the peptides and proteins and are thus 2D-based. Nevertheless, approaches using 3D interaction fields[16] or comparative modeling techniques have been described.[17] These 3D techniques have to cover the bioactive conformation of the usually highly flexible peptide ligands, which poses additional challenges for modeling.[18] In a pioneering study, Sneath derived the first molecular descriptors for the 20 natural amino acids and applied them in QSAR modeling.[19] Later, these 2D descriptors were refined to capture chemically intuitive information via the Z-scale model[20] or the isotropic surface area/electronic charge index (ISA/ECI) model.[21] In contrast to substitution matrices frequently applied in bioinformatics (e.g., PAM,[22] BLOSUM[23]), these descriptors are designed to reflect chemical in contrast to evolutionary similarity. Amino acid descriptors have typically been used to derive QSAR equations by linear regression techniques.[24] Over the past decade, the innovative cheminformatic concept of “matched molecular pair analysis”[25] has been gaining increasing attention. Herein, pairs of molecules with a single difference in chemical structure are analyzed with respect to changes in a physicochemical or biological property.[26] Data mining in large databases (e.g., bioactivities stored in ChEMBL[27] or in-house data sets[28]) allows trends from matched molecular pairs or matched molecular series to be applied subsequently for prediction of substitution effects in new molecules.[29] A key advantage of matched molecular pair analysis is the direct chemical interpretability of predictions (“white box”) based on local SAR rules.[30] Recently, efforts have been made to put purely ligand-based matched molecular pairs into structural context and thereby identify the structural background of observed bioactivity trends.[31,32] Herein we expand the scope of matched molecular pairs to the analysis of macromolecular data from proteins and peptides and introduce matched peptides, a concept we expect to hold great promise for the development of biopharmaceuticals. As an example application, we investigate peptide binding to the major histocompatibility complex class II (MHC II), a surface receptor crucial for T-cell activation in immune response.[33,34] A crystal structure of the receptor shows that the peptide is bound to a hydrophobic surface groove that is flanked by two α-helices.[35] Through the availability of structural information, most modeling approaches aiming at the prediction of peptide binding to MHC molecules employ machine learning techniques,[36] e.g., MULTIPRED.[37] Large peptide data sets have been compiled and used for the optimization of consensus approaches based on machine learning methods.[38] Application of these techniques allows for the optimization of peptides with desired immunological properties.[39] Quantitative modeling techniques are rarely applied toward MHC binding but include classical amino acid-descriptor-based QSAR methods[40] as well as molecular dynamics simulation approaches.[41] Here we apply the novel matched peptides strategy to the prediction of MHC II binding affinities and demonstrate the direct interpretability of the predictions in a structural context.

Methods

Matched Molecular Pairs and Matched Peptides

Identification of matched molecular pairs involves an exhaustive pairwise matching of molecular graphs. To simplify this task, molecules are usually fragmented to aid the search for corresponding substructures.[42] Older implementations additionally required a definition of the allowed transformations within matched pairs,[43] thus prohibiting the identification of unknown chemical modifications associated with a change in the molecular property under investigation. In the context of peptide and protein data, a molecular transformation corresponds to a point mutation. Therefore, sequences differing by a single character correspond to matched peptides. For identification of these single substitutions, a pairwise sequence alignment that can be performed by standard bioinformatics methodologies is required. Sequence alignment for matched peptides is trivial since they consistently differ by a single amino acid. This sequence alignment step simplifies the graph matching problem for small molecules described above, since linear peptides have the advantage of having defined C- and N-termini as well as identical chemical backbones (see Figure ). Furthermore, insertions and deletions between sequence pairs can be considered as trivial additions to the exchange of single amino acids and thus may also be involved as additional transformations in matched peptides. Matched peptides therefore represent a special case of matched molecular pairs that are easy to detect on the sequence level.

Figure 1

Matched molecular pairs and matched peptides. The correspondence of matched molecular pairs to matched peptides is exemplified by the chemical transformation of benzene to toluene by exchange of a hydrogen for a methyl group (orange). The same transformation is involved when an alanine-glycine-alanine tripeptide is exchanged with alanine-alanine-alanine. The latter transformation may be easily encoded when using standard one-letter amino acid codes. This representation allows for fast processing of large databases connecting sequences with respective molecular properties. Code implementation was performed using standard Python tools in combination with a custom node in KNIME[44] aiming to identify all sequence pairs differing in a single sequence position and thus forming matched peptide pairs. Since all of the sequences analyzed in the current study were point mutations relative to a consensus sequence and had a constant length, no gaps occurred, and the presented analysis is therefore independent of gap penalties in the alignment step. Affinity differences observed for matched peptides were aggregated over all peptide positions, assuming that the effects of amino acid exchanges are independent of their position and thus reflect an average of all binding-site environments. We will demonstrate in the Results and Discussion section that this assumption is in general valid for MHC II binders and also discuss ways to cover position-specific aspects in matched peptide analysis.

Analyzed Data Set

We analyzed experimental binding affinities for a panel of 198 peptides toward MHC II molecules from Marshall et al.[45] Fluorescence-based assays were applied to obtain IC50 values in the nano- and subnanomolar range for all of the peptides using a 12-point inhibition curve for binding with three replicates each. Peptide sequence were grouped around the template peptide sequence AAYAAAAAAAAAA, where the central 11 amino acids of the peptide with length 13 were varied. For positions 2 and 4 to 12, all 20 natural amino acids were tested, while only seven apolar amino acids (F, I, L, M, V, W, Y) were tested for position 3. All of the sequences represent single-point mutations around the template sequence. Additionally, the length of the peptide (13 amino acids) was kept constant, and thus, no insertions or deletions were present among the matched peptide sequences. All of the presented analyses are based on negative decadic logarithms of reported IC50 values and their ratios based on molar units.

Statistical Framework

On the basis of statistical analysis of bioactivity data from ChEMBL, the expected standard deviation of the IC50 data from a homogeneous source is 0.2 log units.[46] Therefore, an average effect size of at least 0.20 log units establishes statistical significance versus the null hypothesis (no change in activity) at the p = 0.05 level with at least 10 matched molecular pairs or matched peptides.[47] Therefore, the term “significant transformations” used later on explicitly refers to those amino acid substitutions associated with an effect on the binding affinity that is statistically significantly different to zero.

Correlation to Amino Acid Descriptors

We correlated trends in bioactivities (affinity shifts) observed from analysis of matched peptides to differences in amino acid properties. Therefore, we employed three descriptors from the Z-scale approach describing hydrophilicity (z1), steric bulk (z2), and electronic properties (z3).[20] Furthermore, we analyzed correlations to differences in the isotropic surface area (ISA) and the electronic charge index (ECI).[21] Correlations between activity differences and property differences were assessed via calculation of Pearson’s linear correlation coefficient r and Spearman’s rank correlation coefficient ρ to capture both quantitative and qualitative dependences. Statistical analyses were performed using R.[48]

Structural Interpretation and Correlation to Peptide Specificity

To interpret the bioactivity data in structural context, we compared the observed trends to a cocrystal structure of an MHC II in complex with an antigenic peptide (PDB entry 1AQD(49)). We visualized the structure in Pymol[50] and extracted polar contacts as well as electrostatic properties of the binding-site region using default settings. The specificities of respective MHC II binding-site regions were assessed on the basis of binding affinity distributions for single amino acids. Therefore, we converted the affinity ratios to decadic log units and analyzed the distribution of binding affinities for each single site. In the case of a highly specific region, major differences in binding affinity are expected, corresponding to a narrow peak in the distribution. On the contrary, a completely nonspecific position shows an equal distribution of binding affinities. Such experimental distributions can be converted to single values depicting local specificity via an information-entropy-based approach, as demonstrated earlier for amino acid distributions in protease substrates.[51] Thereby, an entropy of 0 corresponds to the highest specificity, whereas a value of 1 corresponds to maximum binding promiscuity with constant binding affinities. All of the peptide residues except for position 3, where only seven of the 20 amino acids were tested, were examined individually.

Results and Discussion

Trends in Binding Affinity from Matched Peptide Analysis

Applying matched peptides, we extracted information on quantitative changes in MHC II binding affinity induced by single-point mutations. On the basis of 198 peptide sequences and their respective experimental binding affinities, we extracted trends on how amino acid substitutions increase and decrease molecular interactions via matched peptide analysis. In total we extracted 2117 matched peptides that formed the basis of the statistical evaluation. The order of identified matched pairs was normalized to consistently reflect gains in affinity. The amino acid substitutions with the strongest effects on the observed binding affinities are summarized in Table . Overall, for 88 of 190 transformations (44%) we observed a significant change in binding affinity. The strongest effect was achieved by a replacement of proline by cysteine, leading to a gain of 0.715 log units, which corresponds to 5 times stronger binding. The standard error of the mean (SEM) observed over 10 examples for this transformation was 0.291 log units, which is much smaller than the average effect size. This indicates that replacement of proline by cysteine indeed leads to an increase in binding affinity largely independent of the peptide position where the transition occurs.

Table 1

Transformations with Major Effects on the MHC II Binding Affinitya

transformation	pairs	mean affinity difference [log units]	SEM [log units]
P → C	10	0.715	0.291
P → Y	10	0.672	0.329
P → L	10	0.624	0.354
P → M	10	0.620	0.320
D → C	10	0.606	0.147
P → S	10	0.606	0.332
P → N	10	0.596	0.366
P → V	10	0.594	0.339
P → A	19	0.587	0.215
K → C	11	0.585	0.250

Matched peptides were used to extract the 10 substitutions leading to the largest changes in affinity. These transformations were normalized to reflect affinity increases and sorted according to decreasing effect size; the standard error of the mean (SEM) is indicated as measure of statistical uncertainty. Removal of proline residues increases the binding strength to MHC II molecules, as does the removal of charged residues. On the contrary, inclusion of cysteine residues increases the binding affinity to the receptor. Several additional substitutions of proline among the transformations with the strongest effects on binding affinity indicates that this residue is in general detrimental to MHC II binding. Replacement by smaller residues is favored, and especially the inclusion of cysteine residues leads to major gains in binding affinity. Additionally, replacement of charged residues is associated with gains in binding affinity. Within the top 10 transformations we found the substitution of aspartate and lysine by cysteine, both of which led to an affinity gain of approximately 0.6 log units, corresponding to a factor of 4 on a linear scale. The aspartate to cysteine transformation is associated with a particularly small SEM of 0.147, indicating particularly conserved effects over the whole binding-site region. In addition to 88 transformations with significant effects on the binding strength, we characterized 102 transformations associated with only minor changes in MHC II binding. Here we observed the absence of amino acids described to be associated with particularly weak or strong binding. Therefore, the frequency of cysteine residues and charged residues within these transformations was reduced or those residues were completely missing. We found several transformations involving small amino acids that appear to be readily interchangeable within MHC II binding peptides (see Table ). This behavior is illustrated by the substitution of valine by isoleucine, which is associated with a mean affinity difference smaller than 0.001 log units as well as a small SEM of 0.059 log units over 11 peptide pairs. This indicates that the subtle transformation involving an addition of a methylene group in the side chain does not alter the binding constant independent of the position of the exchanged amino acid. On the contrary, some substitutions involving major chemical changes do not affect the MHC II binding affinities significantly. This includes, for example, the substitution of a small glycine residue by an aromatic histidine residue, which points to the minor importance of residue size in MHC II binding. On average, this transformation is associated with an affinity gain of less than 0.01 log units. The larger SEM of 0.154 log units over 10 pairs indicates some dependence on the position of the transformation in this case.

Table 2

Transformations with Little Effect on Binding Affinitya

transformation	pairs	mean affinity difference [log units]	SEM [log units]
V → I	11	<0.001	0.059
T → H	10	0.001	0.129
V → N	10	0.002	0.107
I → W	11	0.003	0.114
L → F	11	0.003	0.173
V → W	11	0.003	0.128
G → T	10	0.008	0.124
A → V	19	0.008	0.073
G → H	10	0.009	0.154
N → S	10	0.010	0.133

Matched peptides were used to search for amino acid substitutions with the smallest changes in experimentally measured binding affinity to MHC II. The top 10 transformations were sorted according to increasing effect on the binding affinity. Statistical uncertainty is shown by the SEM. The complete absence of proline residues and charged amino acids indicates their major impact on the observed binding affinities to MHC II. The smallest effect is observed for the replacement of a valine by an isoleucine, corresponding to the addition of a methylene group. On the basis of the same matched peptide analysis, we aimed to identify the most favored and unfavored residues in MHC II binding peptides. Since the peptide transformations have been arranged to reflect gains in binding affinity, we counted the occurrence of all 20 amino acids on the left side of the transformation (Nleft, smaller activity) and on the right side (Nright, higher activity) among all 88 pairs showing a significant change in binding affinity. The differences in these occurrences (Nright – Nleft), which are given in Figure A, enable qualitative identification of favorable and unfavorable amino acids. We found six amino acids to be mainly disfavored in MHC II binding proteins: proline, all four charged amino acids (aspartate, glutamate, lysine, and arginine), and the polar amino acid glutamine. Glutamine shows a smaller negative effect (a total of five pairs with decreased affinity) than the other five amino acids, all of which exhibit very similar disruptive effects (a total of 14 or 15 pairs with decreased affinity). On the other end of the amino acid ranking, tyrosine was found to enhance the MHC II binding affinity in a total of nine significant peptide pairs, followed by cysteine, which was identified as a favorable replacement in a total of eight cases. Mostly small amino acids follow in the ranking, including alanine, methionine, asparagine, and serine. The difference in size might also explain the marked difference observed in comparisons of peptides containing asparagine versus glutamine. The smaller asparagine appears to be favorable for MHC II binding (+6 pairs), whereas glutamine is unfavorable (−5 pairs).

Figure 2

Amino acids favored and disfavored in MHC II binding. On the basis of matched peptides, we identified amino acids frequently associated with a loss in binding affinity. (A) Differences in the number of significant matched peptides leading to a gain versus a loss of affinity in MHC II binding. (B) Absolute average differences in bioactivity when exchanging an amino acid with any other natural amino acid. Proline residues as well as charged amino acids are strongly disfavored in MHC II binding. On the other end of the spectrum, cysteine and tyrosine residues enhance peptide–MHC II interactions.

Quantitative Effects of Amino Acid Exchanges

A similar analysis can be performed on the basis of the average effect size rather than the occurrence of amino acids on each side of the transformation. Here, all of the transformations, including those with insignificant effects on the binding affinity, were analyzed to yield a quantitative ranking of amino acid contributions to MHC II binding affinities (see Figure B). Consistent with the other presented analyses, proline is associated with a major decrease in MHC II binding affinity representing the strongest effect observed within the data set. On average, 0.474 log units can be gained by replacement of a proline with any other natural amino acid. A replacement of either charged amino acid leads to a gain of between 0.31 and 0.36 log units, thus halving the binding affinity. On the other end of the spectrum, the introduction of cysteine and tyrosine residues is favored and leads to a gain in affinity by 0.26 to 0.28 log unit. Several mainly small and hydrophobic residues are slightly favored and show affinity increases of around 0.1 log units on average. Hydrophobic residues have been described to drive association of protein–protein interactions in general.[52] As shown by statistical analysis of crystal structure data, aliphatic amino acids and tyrosine residues predominantly form cores of protein–protein interaction areas.[53] To allow a comparison with a conventional peptide analysis method, we divided our data set of 198 peptides into two halves according to higher and lower binding affinity to MHC II and analyzed enriched amino acids among both sets. We report residues with an enrichment factor larger than 1.5 in decreasing order of their enrichment. We found that hydrophobic cysteine, methionine, isoleucine, valine, and phenylalanine are enriched among high-activity binders. Low-affinity binders on the other hand show a disproportionately high content of aspartate, glutamate, proline, arginine, and tryptophan. The observed trends for favored and disfavored residues are similar to those from the pairwise comparisons conducted for matched peptide analysis. Nevertheless, the results from matched peptide analysis provide deeper insights since experimental uncertainty can be handled directly and no classification into affinity classes via an arbitrary cutoff is required.

Correlation to Amino Acid Descriptors

As we observed clear correlations between affinity differences and chemical properties of amino acids, we tested the performance of classical QSAR amino acid descriptors in reproducing those. Therefore, we calculated property differences for all of the transformations in five different dimensions based on z-scales (three descriptors) as well as the ISA/ECI scheme (two descriptors). Then we correlated the property differences to the experimentally measured affinity differences identified via matched peptide analysis and analyzed them using Pearson’s linear correlation coefficient and Spearman’s rank correlation coefficient. We observed weak correlations between individual descriptor differences of substituted amino acids and the associated bioactivity changes of 190 matched peptide pairs (see Figure S1 for correlation plots). The most pronounced correlation was identified for the z2 axis representing residue size. Here we found an inverse correlation (r = −0.33, ρ = −0.32), indicating that a reduction in residue size is associated with an increase in binding affinity. This index is closely followed by the ECI, which designates both positively and negatively charged residues with large values and therefore indicates polarity (r = −0.31, ρ = −0.31). The observed inverse correlation indicates that a reduction in polarity is favored in MHC II binding. The third axis contributing to binding affinity is the z1 descriptor that reflects hydrophobicity. We again observed an inverse correlation (r = −0.29, ρ = −0.27) and conclude in agreement with the ECI results that a reduction in polarity favors MHC II binding. The other two descriptors (z3 and ISA) show correlation coefficients smaller than or equal to r = 0.1. Thus, electronic properties and residue surface area were found to be less important in MHC II recognition. The lack of correlation to the surface area appears surprising since a reduction in residue size was identified as favorable. We attribute this seeming contradiction to the inclusion of solvation factors in the ISA calculation, which leads to the lowest ISA values for asparagine and aspartate even though these amino acids have a larger molecular weight than, for example, glycine. As these analyses included proline residues in the data set, we wondered whether removal of this residue with a different backbone, and thus a refined depiction of the side-chain properties, would increase the observed correlations. Therefore, we repeated the correlation analysis covering only the 171 pairs not affecting proline. We found that the magnitudes of all of the correlation coefficients consistently increased, pointing toward the special status of this amino acid. The correlation between the affinity change and the difference in ECI was strengthened from r = −0.31 to r = −0.49 upon removal of the uncharged but still disfavored proline residue. Similarly, the correlation of z1 differences to the binding affinity changes jumped from r = −0.29 to r = 0.40 when proline was included. This points to the special importance of peptide backbone hydrogen bonding in MHC II interactions.

Structural Analysis of the MHC II Receptor–Peptide Complex

To interpret these SAR trends in a structural context, we investigated a cocrystal structure of the MHC II receptor with a cognate peptide of length 15 amino acids.[49] The peptide is bound in an extended conformation and is tightly bound to the receptor via 14 hydrogen bonds of the backbone carbonyls and amides (see Figure A). This large number of interactions indicates the importance of backbone hydrogen bonding in receptor binding and therefore explains the observed trend that removal of prolines, which lack the backbone NH capable of hydrogen-bond formation, leads to stronger binding. In contrast, amino acid side chains are involved in fewer interactions with the receptor then the peptide backbone. Most of the side chains of the bound peptide are bound to shallow pockets rather than to pronounced cavities and show limited contact area. This explains the lack of particular amino acids being favored as a result of the absence of polar interactions with the receptor. This is further underlined by the absence of charged regions in the binding site (see Figure B). The binding-site region for side chains is mostly a flat solvent-exposed surface patch that therefore can bind several amino acids. The hydrophobicity of a large part of the binding site additionally leads to a preference for apolar amino acids due to energy gains by desolvation.[54] Since such hydrophobic and solvent-exposed interfaces are expected to be multispecific,[55] the MHC II may perform its key function in immunology to recognize and present diverse peptides from pathogens.[56]

Figure 3

Structural interpretation of MHC II–peptide interactions. On the basis of the cocrystal structure of MHC II and a high-affinity peptide ligand,[49] we analyzed molecular interactions. (A) The peptide ligand (sticks in elemental colors with carbon in gray) is bound to a broad surface groove of the MHC II (green cartoon and lines with semitransparent surface). Hydrogen bonding (yellow dots) is mainly observed via the peptide backbone. (B) An electrostatic map of the MHC II binding site is shown (blue, positively charged; white, neutral; red, negatively charged). The peptide is bound to a predominantly neutral region of the binding site. Additionally, most of the amino acid side chains are bound to solvent-exposed regions on the surface. These binding-site properties explain the promiscuity of MHC II, which is crucial for its immunological function.

Promiscuity in MHC II Receptor–Peptide Interactions

This promiscuous binding of peptides is also reflected in our data set, where the range of measured binding affinities is approximately 3 log units from the most tightly bound peptide to the weakest binder (see Figure S2 for affinity distributions). Many amino acid exchanges show negligible effects on the observed binding constant, in agreement with promiscuous binding. With the assumption of a conserved position of the peptide termini, which might be provided via the central binding register of approximately nine residues in MHC II,[57] the specificity for each of the respective peptide positions can be quantified as information entropy from the distribution of binding affinities for single-point mutants. An entropy of 0 depicts the highest specificity and thus affinity for only a single amino acid, while an entropy of 1 corresponds to completely unspecific binding with constant affinities for all binding partners.[51] We found all of the peptide positions to be predominantly unspecific, with information entropies ranging from 0.83 for position 11 to 0.97 for position 10. At the most specific peptide position 11, seven amino acids (D, E, F, I, K, R, and Y) show binding affinities differing from the most favorable amino acid, cysteine, by at least 1 log unit. On the contrary, for the almost completely unspecific position 10 all of the binding affinities lie within 0.8 log units, with tyrosine as the most favored residue. This latter situation reflects the typical situation in MHC II peptide recognition, where seven of the 10 investigated pockets show an information entropy larger than 0.9. On the other hand, more specific positions coincide with classical MHC II specificity sites 4, 6, and 9,[58] which show information entropies of 0.90, 0.88, and 0.91, respectively. The overall promiscuity is also reflected by a comparison of activity rankings in respective binding pockets, an approach similar to a matched series. We found only weak correlations between subpocket profiles, with Spearman rank correlation coefficients between −0.32 and +0.57. In fact, 32% of the affinity profiles appear anticorrelated the inherent experimental error is neglected, which we expect to limit the applicability of this approach given the narrow affinity ranges within the data set. The overall promiscuity shows the applicability of the matched pair approach, which treats all amino acid exchanges equally and does not include the specific position. This inherent limitation of the approach could be overcome by the use of context-specific matched peptides in analogy to context-specific matched pairs, where the chemical environment of the transformation is included.[59] In the standard implementation of matched pairs and peptides, transformations showing context-specific effects are attributed with higher standard errors. This can be seen in our data set for the proline to arginine transformation, which shows a weak average gain in binding affinity of 0.158 log units in 10 pairs. The standard error of the mean for this transformation is 0.400 log units, showing that the same transformation is favorable for binding in four cases and unfavorable in six. Here the position of the transformation and therefore the chemical context are crucial for the observed effect, although MHC II is overall highly unspecific. The strongest effect for the proline to arginine transformation is observed at position 4, where the binding affinity is increased by more than 3 orders of magnitude. This represents the most dramatic effect of a single substitution and occurs at position 4, where proline appears to be especially disfavored compared with all other amino acids (see Figure ). Position 6 is the second peptide position where the presence of proline is strongly disfavored. Here replacement by asparagine yields a gain in affinity of 2.6 log units. In terms of context-specific matched peptides, positions 4 and 6 show the strongest affinity differences in the analyzed data set. We expect that context-specific matched pairs are especially required in order to reliably predict changes in binding affinities to specific receptors. Physico-chemical properties like solubility, on the contrary, are expected to be more independent of the chemical surroundings and thus easier to capture using standard matched peptides.

Figure 4

Context specificity of matched peptides: Negative decadic logarithms of MHC-II binding affinity ratios vs the template peptide AAYAAAAAAAAAA are shown as a heat map, where blue boxes represent a gain in affinity, white boxes indicate no change, and red boxes indicate affinity losses. Black boxes represent missing data at position 3, where only seven amino acids were tested experimentally. Substitutions at positions 4 and 6 lead to major differences in binding affinity, and proline is particularly disfavored at these positions. Context-specific analysis of affinity data rather than averaging over all peptide positions is expected to capture these effects in more detail.

Future Potential of Matched Peptide Analysis

In summary, we have introduced a new quantitative modeling tool at the interface of cheminformatics and bioinformatics: matched peptides. By means of this extension of matched molecular pair analysis, protein, peptide, and nucleic acid sequence-related properties such as bioactivity, solubility, aggregation, pI, bioavailability, stability, and expression yield may be predicted. The inclusion of nonstandard residues and modified amino acids or nucleotides and cross-links is straightforward given sufficient training data. The coverage of cyclic sequences, which are of high interest because of the additional stability of cyclic peptides and proteins,[60] requires the use of special software tailored for alignment of cyclic sequences because of undefined terminal residues.[61] When matched peptide analysis is extended beyond the analysis of single-point mutations, cooperative effects in structure–activity relationships may be identified using nonadditivity cycles, as recently demonstrated for small molecules via matched molecular pair analysis.[62] In comparison with the wealth of bioactivity data readily available for small-molecule ligands (e.g., via ChEMBL[27]), the data basis for peptide binding data is much sparser. To date there exist only a few databases listing peptide affinities to general targets (e.g., JenPep[63]) that could be used for matched peptide analysis using public domain data. Most of the open data are centered around immunology and MHC binding (e.g., IEDB[64]), hindering broad application of the technique in the academic environment. As a result of recent biotechnological advances in the development of protein, peptide, and nucleic acid microarrays, a plethora of qualitative and quantitative binding data are available.[65] Similar data can be obtained, e.g., from proteomics techniques,[66] protein-fragment complementation assays,[67] or phage display.[68] Additionally, synthetic access to peptides can be automated using solid-phase synthesis.[69] The decision of which peptide to synthesize next may be supported by using the presented matched peptide approach, which allows existing bioactivity data to be captured in an intuitive way and provides a qualitative and quantitative ranking of favored residues. Therefore, the matched peptide strategy shows synergy and complementarity with classical QSAR techniques.

Conclusion

We have presented matched peptides as an extension of standard matched molecular pair analysis that pushes the technology to the interface of cheminformatics and bioinformatics. Matched peptides correspond to single-point mutants, which are easily identified via pairwise sequence alignments, simplifying the data analysis. Differences in molecular properties may be identified via the matched peptide strategy and can subsequently be applied to identify SAR trends and predict properties of new sequences. Herein we have presented a statistical analysis of MHC II peptide binding data and discussed observed trends with respect to classical QSAR as well as structural data of the complex. We expect our presented methodology, which is readily applicable to peptides, proteins, and nucleic acids, to be of high relevance for the rational design of novel biopharmaceuticals.

61 in total

1. Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of HERG inhibition, solubility, and lipophilicity.

Authors: George Papadatos; Muhammad Alkarouri; Valerie J Gillet; Peter Willett; Visakan Kadirkamanathan; Christopher N Luscombe; Gianpaolo Bravi; Nicola J Richmond; Stephen D Pickett; Jameed Hussain; John M Pritchard; Anthony W J Cooper; Simon J F Macdonald
Journal: J Chem Inf Model Date: 2010-10-25 Impact factor: 4.956

Review 2. Recombinant protein production by large-scale transient gene expression in mammalian cells: state of the art and future perspectives.

Authors: Lucia Baldi; David L Hacker; Myriam Adam; Florian M Wurm
Journal: Biotechnol Lett Date: 2007-01-19 Impact factor: 2.461

3. Regulatory watch: Innovation in biologic new molecular entities: 1986-2014.

Authors: Kathleen L Miller; Michael Lanthier
Journal: Nat Rev Drug Discov Date: 2015-02 Impact factor: 84.694

4. Matched molecular pair analysis: significance and the impact of experimental uncertainty.

Authors: Christian Kramer; Julian E Fuchs; Steven Whitebread; Peter Gedeck; Klaus R Liedl
Journal: J Med Chem Date: 2014-04-16 Impact factor: 7.446

5. Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues.

Authors: E R Collantes; W J Dunn
Journal: J Med Chem Date: 1995-07-07 Impact factor: 7.446

6. Entropic contributions and the influence of the hydrophobic environment in promiscuous protein-protein association.

Authors: Chia-En A Chang; William A McLaughlin; Riccardo Baron; Wei Wang; J Andrew McCammon
Journal: Proc Natl Acad Sci U S A Date: 2008-05-21 Impact factor: 11.205

7. Using matched molecular series as a predictive tool to optimize biological activity.

Authors: Noel M O'Boyle; Jonas Boström; Roger A Sayle; Adrian Gill
Journal: J Med Chem Date: 2014-03-14 Impact factor: 7.446

8. Large scale characterization of the LC13 TCR and HLA-B8 structural landscape in reaction to 172 altered peptide ligands: a molecular dynamics simulation study.

Authors: Bernhard Knapp; James Dunbar; Charlotte M Deane
Journal: PLoS Comput Biol Date: 2014-08-07 Impact factor: 4.475

Matched Peptides: Tuning Matched Molecular Pair Analysis for Biopharmaceutical Applications.

Introduction

Methods

Matched Molecular Pairs and Matched Peptides

Analyzed Data Set

Statistical Framework

Correlation to Amino Acid Descriptors

Structural Interpretation and Correlation to Peptide Specificity

Results and Discussion

Trends in Binding Affinity from Matched Peptide Analysis

Quantitative Effects of Amino Acid Exchanges

Correlation to Amino Acid Descriptors

Structural Analysis of the MHC II Receptor–Peptide Complex

Promiscuity in MHC II Receptor–Peptide Interactions

Future Potential of Matched Peptide Analysis

Conclusion

1. Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of HERG inhibition, solubility, and lipophilicity.

Review 2. Recombinant protein production by large-scale transient gene expression in mammalian cells: state of the art and future perspectives.

3. Regulatory watch: Innovation in biologic new molecular entities: 1986-2014.

4. Matched molecular pair analysis: significance and the impact of experimental uncertainty.

5. Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues.

6. Entropic contributions and the influence of the hydrophobic environment in promiscuous protein-protein association.

7. Using matched molecular series as a predictive tool to optimize biological activity.

8. Large scale characterization of the LC13 TCR and HLA-B8 structural landscape in reaction to 172 altered peptide ligands: a molecular dynamics simulation study.

9. Cleavage entropy as quantitative measure of protease specificity.

10. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach.

1. Computer-aided drug design at Boehringer Ingelheim.

Review 2. Matched Molecular Pair Analysis in Short: Algorithms, Applications and Limitations.