Literature DB >> 34432559

Accurate determination of epitope for antibodies with unknown 3D structures.

Shifa Tahir¹, Thomas Bourquard^1,2, Astrid Musnier^1,2, Yann Jullian^2,3, Yannick Corde², Zakaria Omahdi², Laetitia Mathias¹, Eric Reiter^1,4,5, Pascale Crépieux^1,4,5, Gilles Bruneau¹, Anne Poupon^1,2,4,5.

Abstract

MAbTope is a docking-based method for the determination of epitopes. It has been used to successfully determine the epitopes of antibodies with known 3D structures. However, during the antibody discovery process, this structural information is rarely available. Although we already have evidence that homology models of antibodies could be used instead of their 3D structure, the choice of the template, the methodology for homology modeling and the resulting performance still have to be clarified. Here, we show that MAbTope has the same performance when working with homology models of the antibodies as compared to crystallographic structures. Moreover, we show that even low-quality models can be used. We applied MAbTope to determine the epitope of dupilumab, an anti- interleukin 4 receptor alpha subunit therapeutic antibody of unknown 3D structure, that we validated experimentally. Finally, we show how the MAbTope-determined epitopes for a series of antibodies targeting the same protein can be used to predict competitions, and demonstrate the accuracy with an experimentally validated example.3D: three-dimensionalRMSD: root mean square deviationCDR: complementary-determining regionCPU: central processing unitsVH: heavy chain variable regionVL: light chain variable regionscFv: single-chain variable fragmentsVHH: single-chain antibody variable regionIL4Rα: Interleukin 4 receptor alpha chainSPR: surface plasmon resonancePDB: protein data bankHEK293: Human embryonic kidney 293 cellsEDTA: Ethylenediaminetetraacetic acidFBS: Fetal bovine serumANOVA: Analysis of varianceEGFR: Epidermal growth factor receptorPE: PhycoerythrinAPC: AllophycocyaninFSC: forward scatterSSC: side scatterWT: wild typeKeywords: MAbTope, Epitope Mapping, Molecular docking, Antibody modeling, Antibody-antigen docking.

Entities: Chemical

Keywords: Antibody; Epitope mapping; artificial intelligence; bioinformatics; dupilimumab; epitope binning

Mesh：

Substances：

Year: 2021 PMID： 34432559 PMCID： PMC8405158 DOI： 10.1080/19420862.2021.1961349

Source DB: PubMed Journal: MAbs ISSN： 1942-0862 Impact factor: 5.857

Introduction

Antibodies play an essential role in the immune system of living organisms. Their ability to bind their target with high affinity and specificity allows the detection of foreign molecules, which are then dealt with by other components of the immune system. The specific region on the surface of the foreign molecule is known as the epitope. When developing antibodies, either for therapeutic or diagnostic use, knowledge of the epitope is important because it gives clues about the antibody biological effect. Knowledge of the epitopes of a series of antibodies targeting a unique protein also allows prediction of the competitions between them. Finally, the epitope is an important element to be precisely determined for patent protection. In the case of protein antigens, these epitopes can be of two types: linear (continuous) or conformational (discontinuous).[1] A linear epitope is composed of 6 to 10 adjacent amino acids in the primary structure, whereas a conformational epitope is composed of 10 to 20 amino acids that are in close proximity in the 3D structure, but scattered in the linear sequence.[2-4] This second category represents 90% of the known epitopes.[5,6] Because antibodies are used as therapeutic agents in a growing number of diseases, including cancers, rheumatoid arthritis, and infections, it is very important to know how and where they bind to their targets. Conventional methods for determining the epitope are experimental, e.g., peptide arrays, hydrogen-deuterium exchange, the gold standard being the determination of the 3D structure of the antibody-antigen complex.[7] However, these methods require many experimental manipulations, specialized equipment and skills, and consequently are time consuming and costly. Moreover, non-structural methods are error-prone, and structural methods cannot always be applied, especially when the target is a membrane protein. Facing such issues, substantial efforts have been expended on developing efficient in silico methods. These methods can be divided in two main types: antigen specific and antibody-antigen specific. Antigen-specific approaches, such as DiscoTope,[6] BePro,[8] ElliPro,[9] SEPPA,[10] FRODOCK,[11] PPiPP,[12] only consider antigen information, whereas antibody-antigen-specific approaches, such as ASEP,[13] Cluspro,[14] and EPiPred,[15] also integrate antibody information.[7] Comparison of their respective performances shown that the antibody-antigen-specific methods clearly outperform antigen-specific approaches.[7] Our team developed PRIOR,[16] a protein-protein docking method that had very good accuracy. One of the conclusions of this work was that using scoring functions optimized for different categories of protein-protein complexes (enzymes, antibody-antigen and others) clearly increased the performance. Another important conclusion was that, although it was sometimes difficult to efficiently select one docking pose with low RMSD with the crystallographic complex, the highest ranked poses shared, in most cases, the same interaction regions, which was always correct. Thus, we slightly modified the objective function of the machine-learning to optimize the selection of the correct interaction region, rather than the RMSD with the crystallographic solution. Indeed, knowledge of the precise arrangement of the two partners within a complex is a critical source of functional information. However, optimizing the procedure on this criterion led to sometimes completely missing the correct interaction region. This led to the development of the antibody-dedicated method MAbTope.[17] On a benchmark of 129 antibody-antigen complexes of known 3D structure, for which the isolated 3D structure of the antibody and of the antigen was known, the epitope was correctly determined in all cases. MAbTope also allowed us to determine the epitopes of numerous different antibodies whose structure was previously unknown, including golimumab and certolizumab, which target tumor necrosis factor,[17] eculizumab, which targets the complement protein C5,[18] 5B9, which targets platelet factor 4 complexed with heparin,[19] 4C3, which targets the proteinase 3[20] or C6 and D5, which target HER4 JMa/CYT1 isoform.[21] In all these studies, the 3D structures of the antibodies were unknown, as it is usually the case when working on new antibodies. Therefore, these studies, although successful since experimentally validated, raise the question of antibody modeling, and the performance of the method when relying on such models. Here, we sought to evaluate the performances of MAbTope when working with homology models of antibody 3D structure, depending on the method chosen for homology modeling. Accurate homology modeling of antibodies 3D structures remains difficult, essentially because of the high variability in both sequence and structure of the complementary-determining regions (CDRs), especially CDR H3. Different studies[22-24] have shown that state-of-the-art methods, such as BIOVIA tools,[25] RosettaAntibody[26,27] and SAbPred,[28] lead to high-quality models. But, compared to more basic tools such as Modeler,[29] these methods are consume a considerable central processing units (CPUs), and cannot be used at high throughput. Another question was the choice of the template for homology modeling. Indeed, the known 3D structure displaying the highest sequence identity with the protein to be modeled is often preferred. However, in the particular case of antibodies, this choice might not be the wisest, since sequence identity relies mostly on the frameworks, whereas antibody specificity is mostly due to CDRs.

Results

Selecting templates with identical CDR length

Since antibodies bind their cognate antigen mainly through the CDRs, for modeling a given variable domain (query) we first chose a template fulfilling the following criteria: 1) query and template have CDRs of identical length; and 2) the template is an antibody of known 3D structure which has the highest sequence identity with the query. Models of the 3D structures of each antibody were built using Modeler[29] and different templates for VH and VL (see material & methods). For 291 complexes of 292 in the test set, the epitope was correctly predicted by MAbTope (Table 1, Figure 1, Table S1), meaning that at least one of the top 4 ranked regions contains at least one residue belonging to the crystallographic epitope. An example is given in Figure 2 that illustrates the relationship between the ES score, the designed interfering peptides and shows the comparison between the predicted and the crystallographic epitopes.

Table 1.

Subset	Epi Res	P1	P2	P3	P4	Top-4	Overall
All	15.73	5.91	3.48	2.12	1.32	12.80	82.68^a
VH+VL	16.06	5.96	3.40	2.25	1.39	12.96	81.67^a
VHH	14.54	5.71	3.79	1.65	1.06	12.22	86.38^a
In learn	15.09	5.76	3.30	2.16	1.31	12.45	83.55^a
Not in learn	16.31	6.05	3.65	2.08	1.33	13.11	81.90^a
NC	15.86	3.06	1.27	1.28	1.01	6.62	41.74^b

Figure 1.

Percentage of correctly predicted epitope residues within the 292 complexes of the dataset. Bar chart illustrating the percentage of epitopic residues correctly predicted

Figure 2.

Prediction of the epitope of the therapeutic antibody IMC-11F8 on epidermal growth factor receptor (PDB: 3B2V]). (a): crystal structure of the antibody-EGFR complex. (b): predicted epitope (violet: ES>20, blue: ES in 15–20, cyan: ES in 10–15, light cyan: ES in 5–10). C: top 30 docking poses. (d): crystallographic epitope. (e): regions involved in the epitope (dark red to yellow for regions ranked 1 to 4). (f): overlap between predicted and crystallographic epitope

Epitope prediction accuracy of different subsets. All: complete test dataset; VH+VL: Fab and scFv; VHH: single-chain antibodies; In learn: complexes that belong to the learning dataset of MAbTope; Not in learn: complexes that are not part of MAbTope learning dataset; NC: negative control. Epi Res: average number of residues in the crystallographic epitope; P1 to P4: number of residues of the crystallographic epitope belonging to regions ranked 1 to 4; Top-4: number of epitope residues present in one of the top-4 regions, Overall: top-4/Epi Res, i.e., proportion of epitope residues predicted in one of the top-4 regions. a and b indicate statistically different values (Student test) Percentage of correctly predicted epitope residues within the 292 complexes of the dataset. Bar chart illustrating the percentage of epitopic residues correctly predicted Prediction of the epitope of the therapeutic antibody IMC-11F8 on epidermal growth factor receptor (PDB: 3B2V]). (a): crystal structure of the antibody-EGFR complex. (b): predicted epitope (violet: ES>20, blue: ES in 15–20, cyan: ES in 10–15, light cyan: ES in 5–10). C: top 30 docking poses. (d): crystallographic epitope. (e): regions involved in the epitope (dark red to yellow for regions ranked 1 to 4). (f): overlap between predicted and crystallographic epitope It should be noted that for 2 of the 292 complexes for which the prediction is correct (4I9W[30] and 4HG4[31]), the four best-ranked regions contain less than 30% of the epitope, which could be considered as low. However, the prediction still gives the approximate position of the epitope on these rather large targets (309 and 501 residues, respectively, Figure S1). On the complete dataset, on average 83% of epitope residues are predicted, meaning that they belong to one of the top 4 regions (Figure 1). For 199 complexes, more than 80% of the amino acids belonging to the crystallographic epitope belong to one of the top 4 regions. It is easier to predict the correct epitope for small targets than for large ones, since the surface ratio covered by the top 4 peptides is larger for small proteins. However, for the 50 smallest proteins of our dataset, the predicted epitope covers only 30% of the total accessible area. This ratio falls to 6% for the 50 largest targets of the benchmark. The total accessible surface area of targets and that of predicted epitope are given in Table S1. Very comparable results are obtained on the different tested subsets (Table 1), antigen-binding fragments (Fabs) (or single-chain variable fragments (scFvs)) and VHH, but also complexes that belong or not to the learning dataset (see Material and Methods). Finally, when using the negative controls dataset, on average only 42% of the crystallographic epitope is correctly predicted, which is significantly different from the results obtained on the test datasets. Supplementary Figure S2 shows the same average values as indicated in Table 1, together with the standard deviations and the results of pairwise statistical tests. All the differences between series and negative controls are significant for peptides 1 and 2. Surprisingly, for peptide 3, only the “in learn” series is not significantly different from the controls. For peptide 4, only “All” and “VH+VL” are significantly different from control. This is less surprising because these are the two largest categories. Since MAbTope is a docking-based method, we also wanted to evaluate its pure docking performances. For 259 of the complexes of the test set, one of the top 30-ranked poses satisfy the CAPRI criteria, and can thus be considered as near-native.[32]

Impact of sequence identity

To evaluate the effect of sequence identity between query and template sequences, for a given query variable domain, we built models using as templates antibodies with known 3D structure, having the same CDR lengths, and decreasing overall sequence identities. We built a dataset of 20 randomly selected complexes of our dataset for which the epitope was correctly predicted in our first experiment, and for which we were able to find more than one possible template. It should be noted that even though the different templates have different sequences (as compared to the query and between each other), the query always has the same sequence. The number of models generated for each antibody varied from 2 to 107, accounting for a total of 767 models (Table S2). The lowest overall sequence identity between query and template is 60%, which is very low for variable domains of antibodies. Indeed, CDRs represent only around 22% of the sequence (25 residues of 115), and are more variable than the frameworks. Thus, 50.8% sequence identity corresponds to antibodies that have similar frameworks, but completely different CDRs (Figure S3). Nevertheless, MAbTope was able to predict the correct epitope in all cases, and a near-native conformation was present in the top 30 ranked docking poses for 752 of the 767 test cases.

Templates with insertions or deletions

In some cases, it is not possible to find a template having identical CDR lengths for homology modeling. Thus, we evaluated the impact of using templates having different CDR length. To this aim, we randomly selected 26 complexes in our test set, for which we were able to correctly determine the epitope in the first experiment. For each antibody VH domain, we searched for templates having 1 to 5 insertions or deletions in the CDRs, as compared to the CDRs of the query. In other words, the template was considered to have 1 insertion or deletion if one of its CDR was, respectively, one residue longer or shorter than the corresponding CDR of the query, the other CDRs having the same length. The templates we selected had the insertions mostly at the CDRH3, as this is the most variable CDR. As above, even though the templates have different sequences and different CDR lengths (as compared to the query and between each other), the resulting 3D models of the query antibody always have the same sequence and CDR lengths (sequence and CDR lengths of the query). It must be noted that increasing the number of insertions or deletions in the template decreases the quality of the 3D model of the query. Indeed, if the template is longer than the query, the residues missing from the query are removed during the modeling process, and the flanking residues must be brought closer to reconstitute the protein skeleton. On the contrary, if the template is shorter than the query the skeleton has to be cut to accommodate for the residues present in the query and absent from the template. The flanking residues have to be moved apart and the inserted residues modeled from scratch. In both cases, this type of modeling involves deformations of the template skeleton, which is a well-known source of imprecision. Despite this, we found that, on average, 91% of the residues of the crystallographic epitope belong to one of the top 4 regions. Moreover, this percentage is not significantly correlated to the number of insertions or deletions, and is not significantly different from the percentage obtained when using templates with CDRs having the same length as the query antibody (Figure 3, Figure S2, Table S3, Table S4).

Figure 3.

Impact of insertions or deletions in the template. Each dot represents the ratio of correctly predicted epitope residues when using templates having 1 to 5 deletions or 1 to 5 insertions as compared to the query sequence. The dots at abscissa 0 are the predictions obtained when using templates having the same CDR lengths as compared to query sequence, and variable overall sequence identities. The line shows the averages. The differences between averages were never significant

Determining the epitope of dupilumab

To further demonstrate the ability of MAbTope to determine the epitope of modeled antibodies, we chose to work on dupilumab, which targets an unknown epitope on the interleukin 4 receptor α subunit (IL4Rα). Marketed as DUPIXENT®, dupilumab is approved for therapeutic use in eczema, atopic dermatitis and several forms of severe asthma. Dupilumab’s structure was modeled using templates 6AZM for heavy chain and 5BK5[33] for light chains, targeting circumsporozoite protein NANP 5-mer and circumsporozoite protein 663, respectively. For IL4Rα, we used the 3D structure given in PDB entry 1IAR.[34] Dupilumab’s epitope was determined using MAbTope (Figure 4(a) and 4(b)). This epitope appears to be scattered among 4 regions of IL4Rα: P1 from T56 to P86, P2 from H87 to Q107, P3 from L103 to P123, and P4 from S193 to S113. In order to validate the determined epitope, 4 mutant IL4Rα were designed, P1m, P2m, P3m and P4m, each containing from 4 to 7 alanine mutations in one of the 4 regions predicted (Figure 4(c) and 4(d)). These mutations are chosen as to have as little impact as possible on the global folding of the protein (see Materials and Methods). Membrane protein folding and subsequent stability being correlated with their expression at the plasma membrane,[35] the surface expression of wild-type (WT) IL4Rα was compared with the ones of each of the mutants (Figure S4). Receptor overexpression is assessed in the phycoerythrin (PE) channel and the % of PE+ cells were measured by cytometry. Except for IL4Rα_P2m, for which a decrease in the % of PE+ cells was observed but which reaches the membrane, the mutants are expressed at the membrane as much as the WT, showing their stability.

Figure 4.

MAbTope-based determination of dupilumab’s epitope. (a) top-30 ranked docking poses. (b). Predicted epitope of dupilumab on IL4Rα (purple: ES ≥ 20, blue: 20 > ES ≥ 15, cyan: 15 > ES ≥ 10, light cyan: 10 > ES ≥ 15). (c to f). Experimental validation. (c-d). Mutations introduced in the 4 mutants of IL4Rα (IL4R_P1m, IL4R_P2m, ILR4_P3m, and IL4R_P4m, see complete sequences in Figure S5). The actual epitope residues from the crystallographic structure (PDB:6WGL) solved after the submission of this study are shown by stars. (e). HEK293 cells were transiently transfected with the IL4Rα WT or mutated. IL4Rα expression at the membrane was monitored with a PE-coupled anti-Flag antibody (y-axis) and dupilumab binding was measured in APC (x-axis). An isotype IgG was used as a control (Figure S6). (f). APC+PE+ double positive cells percentages were collected from 3 independent experiments and normalized by the total PE+ cells. Results are shown as mean ± sem. Different letters indicates statistical difference at p ≤ 0.05 The binding of dupilumab to each of the constructs was assessed by cytometry in the allophycocyanin (APC) channel (Figure 4(e)). An IgG4 isotype was used as a control (Figure S4). The percentages of APC+PE+ double positive cells in each cell lines were collected and normalized over the PE+ cells total subset. A significant decrease in dupilumab binding was observed for all the 4 mutants when compared to the WT IL4Rα (Figure 4(e-f)). Interestingly, the amplitude of the binding inhibition was variable from one set of mutations to the other, being more important for P1m than for P4m, for example. This suggests that the proportion of actual epitope residues mutated varies from one mutant IL4Rα to the other. This is also corroborated by the scattering of mutated residues over the structure of IL4Rα as shown in Figure 4(d). It is hence tempting to speculate that dupilumab core epitope is located at the intersection of the 4 regions described, which is visible in the middle panel of Figure 4(d). In an attempt to more precisely define the epitope of dupilumab, slightly less mutated constructs were generated (Figure S5). One mutation was removed from IL4R_P1m (R69A), 2 mutations from IL4R_P3m (H120A and K122A), and 4 mutations from IL4R_P4m (R202A, W204A, N209A and W212A). ILR4_P2m was unchanged. The surface expression levels of the 3 new constructs (still evaluated as the % of PE+ cells) were a little decreased when compared to the WT, and similar to the expression of IL4Rα_P2m shown before (and reported on the graph in Figure S6), indicating a little instability even though they reach the membrane. The 3 new mutants showed a decrease in dupilumab binding, but the inhibition was lower than with the first series of mutants, indicating that epitope residues remain present in all the regions and confirming that the epitope is scattered across the 4 mutants. To conclude, here it is shown that: 1) Dupilumab’s epitope is conformational since it is located over 4 discontinuous regions of IL4Rα; and 2) MAbTope efficiently predicted epitope residues of a modeled antibody. As the structure of the complex between dupilumab and IL4Rα (PDB:6WGL[36]) was published after we performed our prediction and validation, we compared the crystallographic epitope with the one we predicted. Peptides 1 and 2 indeed constitute the core of the crystallographic epitope (Figure 4c). They are also those most impairing binding of the antibody when mutated.

Application to epitope binning

Epitope binning consists in determining, for a series of antibodies binding to the same target, those that are in competition, and those which can bind the target simultaneously.[16,37,38] The conventional methods for epitope binning consist in measuring the pairwise competition between the antibodies, for example by ELISA or surface plasmon resonance (SPR). However, these methods require the availability of both antibodies and their targets. Moreover, depending on the number of antibody pairs to test, the process can require a substantial amount of time. Since MAbTope epitope prediction are very accurate and can be obtained in minutes, we hypothesized that they could also be used for epitope binning. To validate this hypothesis, we predicted the epitope binning of 20 anti-hen egg lysozyme antibodies, based on the epitope determination through MAbTope. To evaluate the accuracy of the predicted competitions, we compared with crystallographic structures of the antibody-antigen complexes available in the PDB, and with results from Sivasubramanian et al,[39] who measured the competition between 7 different anti-lysozyme antibodies using high-throughput SPR. As shown in Figure 5(a) and 5(b), there is a very good concordance between the crystallographic overlap Xo and the value of the RawS score, which measures the overlap between the predicted epitopes. Over the 21 possible pairs, the average error, defined as the absolute difference between Xo and RawS is only 0.14. There is only one case, for the pair (D1.3, HyHEL-10), where this error leads to the incorrect prediction of an overlap (Figure 5(c) and 5(d)). Similarly, on the 20 antibodies dataset, the average error between Xo and RawS is 0.12 (Figure S7). There are only 31 pairs of antibodies over 190 for which the error is greater than 0.25, and among these cases, only 2 are greater than 0.5. Thus, the RawS score is a good predictor of epitope overlap. Good prediction of epitope overlap also enables the prediction of competition. If RawS values greater than 0.25 are considered predictive of a competition, almost all competitions within the 7 antibodies were correctly predicted by MAbTope. It should be noted that even an overlap of one residue between the epitopes of two antibodies leads to a competition between them. However, MAbTope always predicts an epitope that is larger than the actual one. Thus, two antibodies with an overlap of 25% between the predicted epitope might not have any common amino acids in their actual epitopes.

Figure 5.

Prediction of epitope overlap and competition of 7 anti-lysozyme antibodies. (a): overlap of crystallographic epitopes. (b): RawS scores. In A and B, values between 0.75 and 1 are colored dark violet, values between 0.5 and 0.75 medium violet and values between 0.25 and 0.5 light violet. (c): competitions, measured by SPR (Surface Plasmon Resonance) from Sivasubramanian et al.[39] are colored in green. (d): RawS scores, values above 0.25 are colored in green. (e-h): Focus on two competing antibodies: D1.3 and D11.15 (e): cartoon and surface view of the predicted epitope of antibody D1.3 on lysozyme (purple: ES ≥ 20, blue: 20 > ES ≥ 15, cyan: 15 > ES ≥ 10, light cyan: 10 > ES ≥ 15). (f): crystallographic epitope of antibody D1.3. G: cartoon and surface view of the predicted epitope of antibody D11.15. (h): crystallographic epitope of antibody D11.15. (i): MAbTope-predicted overlap of epitopes of antibodies D1.3 and D11.15. The residues are colored as a function of their contribution to the RawS score (dark red: >400, red: 300–400, orange: 200–300, yellow: <200). J: overlap of crystallographic epitopes

Discussion

Antibody structure modeling has been a major challenge in bioinformatics for many years. Despite decisive improvements, it is still a challenging task, mainly because of the very high diversity in sequence and structure of the CDRs, especially CDR H3, which is the most variable, and also the most important for antibody binding.[40] However, since our MAbTope method is based on a coarse-grained description of protein 3D structures, we hypothesized that high-quality models of antibody 3D structures would not be necessary for accurate epitope prediction. Indeed, in the first part of this work we demonstrated that, for a given query antibody, we were able to determine the correct epitope, and, using as template the 3D structure of an antibody having CDRs of the same length, models obtained using MODELER[27] were sufficient to accurately determine the epitope. We also show that the sequence identity of the template is not a crucial parameter, and that the epitope is correctly determined even when this identity is low, as long as the CDRs of query and template have the same length. In our examples, the epitope was still correctly predicted even when sequence identity went down to 60%. To go further, we also demonstrated that despite a few insertions or deletions in the template as compared to the query, the epitope could still be correctly determined. This is very important since it is not always possible to find a template with CDRs of the same length as those of the query, especially when working with camelid single-chain antibodies. Indeed, these antibodies have very long CDRH3 and there are not yet as many structures available as for conventional antibodies. There again, MAbTope is able to correctly predict the epitope, even when there are as many as 5 insertions or deletions in the modeled antibody as compared to the closest available template. There was only one complex for which we could not correctly predict the epitope, i.e., a complex between a nanobody and the Bloom’s syndrome helicase (4CDG[41]). When comparing the DNA helicase’s 3D structure between the complex and the isolated protein (4CGZ[42]), on which we docked a nanobody, we observed that the interaction with DNA provokes large movements of one sub-domain of the protein, completely burying the epitope. However, when using MAbTope with the isolated sub-domain to which the nanobody binds, the epitope was correctly predicted (Figure S8). However, this cannot be entirely anticipated when the structure of the complex is unknown, even though some knowledge can be acquired through the prediction of epitope using different 3D structures of the target when available. We also show that MAbTope can accurately predict epitope binning. Indeed, the RawS score is a good predictor of epitope overlap, and can consequently be used to predict which antibody pairs are in competition for a common target. This method presents great advantages over experimental binning methods, since, being a purely in silico method, it is applicable to large numbers of antibodies. Moreover, it can be done as soon as the sequences of the antibodies are available, for example just after bio-panning. This can be essential for the choice of leads to be further characterized. It can lead to increasing the epitope coverage of the target, by choosing a set of antibodies having different epitopes. Conversely, for a given epitope, this methodology allows selection of a larger number of unrelated antibodies. Finally, we provide a proof-of-concept case, with the MAbTope determination of epitope of the therapeutic antibody dupilumab, which was unknown when we started the work, and the experimental validation. As shown on this example, MAbTope not only provides accurate prediction of the epitope, but it also allows facile design mutants of the target that can then be used for experimental proof. Unsurprisingly, the determined epitope overlaps with the interaction region of the IL4Rα with its cognate ligand, IL4, thereby explaining its functional effect. In a recent study, Kim et al.[43] identified V93 and D97 as being part of dupilumab’s epitope, which belong to region 2 of our predictions. Thus, our results are in agreement with theirs, but MAbTope allows the complete epitope to be defined, whereas only two residues were defined in the work of Kim et al. Indeed, as shown by the experimental results obtained, the four regions defined by MAbTope are part of the epitope. In conclusion, MAbTope is able to predict conformational epitopes even from homology models of antibody 3D structures. Moreover, MAbTope can accommodate moderate quality models resulting either from low sequence identity templates or from templates having insertions or deletions in CDRs as compared to the query antibody, and does not require the use of CPU-consuming modeling methods. As a consequence of this work, epitopes can now be reliably predicted from nucleotide sequences of Abs variable regions, which are easy to obtain very early in the discovery process. This robustness allows use of MAbTope for high-throughput epitope binning of antibodies. A remaining limitation is access to a structure (or a good-quality model) of the antigen. Despite the high accuracy, it is still crucial that the predictions be experimentally validated. Moreover, further experimental validation of the determined epitope is notably facilitated by the prediction of involved regions using MAbTope.

Materials and methods

Test datasets

Antibody-antigen complexes of known structure were extracted from the PDB (www.rcsb.org; version of January 2017). A test dataset was built gathering the antibody-antigen complexes fulfilling the following criteria: 1) Unbound 3D structure of antigen is available; 2) Unbound 3D structure of antibody is not available; 3) No missing residues in paratope and epitope of 3D structure. The third criterion is very important. From different tests we concluded that missing residues are an important cause in failure of prediction (data not shown). The dataset used for this study contained 292 complexes, 245 were Fabs or scFv and 47 were VHH (Table S1). Importantly, some of the complexes of this test dataset are also present in the learning dataset, as shown in the last column of Table S1. Results obtained in the study presented here are given for the complete dataset, as well as for complexes of the test dataset that are or are not present in the learning dataset.

Negative controls

Our previous dataset[17] was used. Within this dataset, we selected only one complex for each target, so that all the remaining complexes have different antigens. For negative controls, we docked each of the 129 antibodies on the 128 targets of the other complexes selected, and we compare the epitope predicted with the non-cognate antibody with the crystallographic epitope of the cognate antibody.

Template Selection

In order to find templates for homology modeling, we used Blastp[28] with all the parameters kept default, except database, which was set as PDB. Templates for modeling of the VH and VL are treated separately.

Model Generation

Homology models were generated using MODELER.[29] For each variable domain, five different models were generated and the best model was selected using DOPE score, which is a pairwise atomistic statistical potential used to assess quality of structure model.[44] VH and VL models were computed separately, and then assembled using the relative orientations of VH and VL domains in the template used for VH modeling. The resulting H-L orientations were in accordance with ABangle methodology.[45]

Epitope Prediction

Epitope determination was made using MAbTope. Briefly, MAbTope consists of three consecutive stages: docking, scoring and epitope identification. In the docking step, the top-500 docking poses generated by Hex are retained,[46-51]then different functions are used to re-rank Hex’s top-500 docking pose,[16] and the 30 best ranked ones are retained. Based on the top-30 ranked docking poses, an epitope score ES is computed for each residue i of the target: In other words, ES(i) is the number of poses within the top 30 in which amino acid i belongs to the epitope. For each possible region j consisting of 15 consecutive amino acids of the target, a region score PS is then computed: All the regions are then ranked according to this score, and regions that overlap by more than 5 residues with a better-ranked region are removed. As previously described,[17] the choice of 15-mers results from empirical observations we have made along the development of this method: shorter peptides tend to give a poor signal and longer peptides tend to span over more than one loop, making the results more difficult to interpret. A corrected region score CES(i) can then be computed for each amino acid i. CES(i) is equal to ES(i) if i belongs to one of the top 4 ranked peptides, and 0 if not. These scores depend on the antibody considered, since two antibodies binding to the same target might have different epitopes: for an antibody A, they will be noted ES(i,A), PS(j,A) and CES(i,A). The prediction of the epitope is considered successful when at least one of the top 4 regions contain at least one residue belonging to the crystallographic epitope, which is defined as the ensemble of amino acids of the target that have at least one atom distant by less than 4 Å from an atom of the antibody. It is essential to note here that the test dataset has been extracted from the 2017 version of the PDB, whereas MAbTope has been trained on a set extracted from the January 2015 version of the PDB. Consequently, 153 complexes of the test set are not part of the learning set, whereas 139 belong to the learning set.

Epitope overlap

The predicted overlap of the epitopes recognized by two different antibodies A and B on the same target T is estimated by the score RawS(A,B): Consequently, if no overlap is predicted, RawS(A,B) = 0, if the overlap if perfect, RawS(A,B) = 1. Similarly, the crystallographic overlap is defined as: where min(A,B) is the minimum overlap between the number of residues in the epitope for antibodies A and B, and epi(i, A) is equal to 1 if i belongs to the crystallographic epitope of antibody A.

Design of IL4Rα and mutants

The 3D structure of dupilumab antibody was modeled using MODELER with the aforementioned criteria. Then, by using MAbTope, dupilumab’s epitope was predicted to be located over 4 regions of interleukin 4 receptor α subunit (IL4Rα) extracellular domain, named P1 to P4. Within each of these regions, amino acids having solvent-exposed side-chains were mutated to alanines in the full-length IL4R. We thus designed one mutated IL4Rα construct for each region initially predicted (full-length sequences in Figure S5). In order to avoid affecting IL4Rα’s correct folding, only amino acids with solvent-exposed side-chains were mutated, except those making interactions with other amino acids of the target. We also avoid mutating prolines, as those are usually essential in correct folding. The constructs were named IL4R_WT, IL4R_P1m, IL4R_P2m, IL4R_P3m and IL4R_P4m. The constructs were Flag-tagged at their N-termini in order to monitor antigen expression in cells. Gene synthesis and cloning into pcDNA3.1+ were performed by Twist Bioscience (San Francisco, CA, USA).

Flow cytometry analysis of wild-type and mutated IL4R-transfected HEK293 cells

HEK293 cells were seeded into 10 cm dishes in 10% fetal bovine serum (FBS)-medium and incubated for 24 h at 37°C. Cells were then transiently transfected with 5 μg of each mutated construct using Metafectene (Biontex Lab.; München, Germany) according to the manufacturer’s instructions. Twenty-four hours after transfection, cells were collected and processed in cytometry. Five µg of dupilumab, or a human IgG4 control isotype (BioLegend, ref BLE 403702), were incubated with 2 × 105 cells/tube into 100 μl of phosphate-buffered saline (PBS) supplemented with 2 mM EDTA and 1% FBS for 1 h at 4°C. Cells were washed in 2 ml of the same buffer. The cell pellets were suspended in 100 μl of PBS-EDTA-SVF containing 0.03 µg of PE-coupled rat anti-Flag antibody (BioLegend, BLE637310) and 0.15 µg of biotinylated mouse anti-human IgG4 antibody (Invitrogen, MH1542)- and left 45 min at 4°C. After a wash in PBS-EDTA-SVF, cells were incubated with 0.03 µg of APC-coupled streptavidin (BioLegend, BLE405207) for 15 min. Cells were firstly washed with 2 ml PBS-EDTA-FBS, and secondly washed with 2 ml PBS- EDTA before final suspension in 150 μl PBS-EDTA. Samples were run in a MACSQuant Analyzer 10 (Miltenyi Biotec). Cytometry data from 4 independent experiments were analyzed using FlowJo.[35] The number of APC+PE+ double-positive cells was collected from every sample and normalized on the total PE+ cells within the sample. Histograms and ANOVA statistical analyses were realized with Prism software (GraphPad Software, San Diego, CA). Click here for additional data file.

51 in total

1. 5B9, a monoclonal antiplatelet factor 4/heparin IgG with a human Fc fragment that mimics heparin-induced thrombocytopenia antibodies.

Authors: C Kizlik-Masson; C Vayne; S E McKenzie; A Poupon; Y Zhou; G Champier; C Pouplard; Y Gruel; J Rollin
Journal: J Thromb Haemost Date: 2017-09-04 Impact factor: 5.824

2. PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure.

Authors: Michael J Sweredoski; Pierre Baldi
Journal: Bioinformatics Date: 2008-04-28 Impact factor: 6.937

3. Mapping Epitope Structure and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity

Authors:
Journal: Methods Date: 1996-06 Impact factor: 3.608

4. A proposal for the nomenclature of antigenic sites in peptides and proteins.

Authors: M Z Atassi; J A Smith
Journal: Immunochemistry Date: 1978-08

Review 5. Antigenic structures of proteins. Their determination has revealed important aspects of immune recognition and generated strategies for synthetic mimicking of protein binding sites.

Authors: M Z Atassi
Journal: Eur J Biochem Date: 1984-11-15

6. Computational docking of antibody-antigen complexes, opportunities and pitfalls illustrated by influenza hemagglutinin.

Authors: Mattia Pedotti; Luca Simonelli; Elsa Livoti; Luca Varani
Journal: Int J Mol Sci Date: 2011-01-05 Impact factor: 5.923

7. Nanobody mediated inhibition of attachment of F18 Fimbriae expressing Escherichia coli.

Authors: Kristof Moonens; Maia De Kerpel; Annelies Coddens; Eric Cox; Els Pardon; Han Remaut; Henri De Greve
Journal: PLoS One Date: 2014-12-11 Impact factor: 3.240

8. Broad epitope coverage of a human in vitro antibody library.

Authors: Arvind Sivasubramanian; Patricia Estep; Heather Lynaugh; Yao Yu; Adam Miles; Josh Eckman; Kevin Schutz; Crystal Piffath; Nadthakarn Boland; Rebecca Hurley Niles; Stéphanie Durand; Todd Boland; Maximiliano Vásquez; Yingda Xu; Yasmina Abdiche
Journal: MAbs Date: 2016-10-17 Impact factor: 5.857

9. High throughput solution-based measurement of antibody-antigen affinity and epitope binning.

Authors: Patricia Estep; Felicia Reid; Claire Nauman; Yuqi Liu; Tingwan Sun; Joanne Sun; Yingda Xu
Journal: MAbs Date: 2013 Mar-Apr Impact factor: 5.857

10. SAbPred: a structure-based antibody prediction server.

Authors: James Dunbar; Konrad Krawczyk; Jinwoo Leem; Claire Marks; Jaroslaw Nowak; Cristian Regep; Guy Georges; Sebastian Kelm; Bojana Popovic; Charlotte M Deane
Journal: Nucleic Acids Res Date: 2016-04-29 Impact factor: 16.971

2 in total

Review 1. V_HH Structural Modelling Approaches: A Critical Review.

Authors: Poonam Vishwakarma; Akhila Melarkode Vattekatte; Nicolas Shinada; Julien Diharce; Carla Martins; Frédéric Cadet; Fabrice Gardebien; Catherine Etchebest; Aravindan Arun Nadaradjane; Alexandre G de Brevern
Journal: Int J Mol Sci Date: 2022-03-28 Impact factor: 5.923

2. A New in Silico Antibody Similarity Measure Both Identifies Large Sets of Epitope Binders with Distinct CDRs and Accurately Predicts Off-Target Reactivity.

Authors: Astrid Musnier; Thomas Bourquard; Amandine Vallet; Laetitia Mathias; Gilles Bruneau; Mohammed Akli Ayoub; Ophélie Travert; Yannick Corde; Nathalie Gallay; Thomas Boulo; Sandra Cortes; Hervé Watier; Pascale Crépieux; Eric Reiter; Anne Poupon
Journal: Int J Mol Sci Date: 2022-08-28 Impact factor: 6.208