| Literature DB >> 25071985 |
Dimitris Nikoloudis1, Jim E Pitts1, José W Saldanha2.
Abstract
The accurate prediction of the conformation of Complementarity-Determining Regions (CDRs) is important in modelling antibodies for protein engineering applications. Specifically, the Canonical paradigm has proved successful in predicting the CDR conformation in antibody variable regions. It relies on canonical templates which detail allowed residues at key positions in the variable region framework or in the CDR itself for 5 of the 6 CDRs. While no templates have as yet been defined for the hypervariable CDR-H3, instead, reliable sequence rules have been devised for predicting the base of the CDR-H3 loop. Here a new method termed Disjoint Combinations Profiling (DCP) is presented, which contributes a considerable advance in the prediction of CDR conformations. This novel method is explained and compared with canonical templates and sequence rules in a 3-way blind prediction. DCP achieved 93% accuracy over 951 blind predictions and showed an improvement in cumulative accuracy compared to predictions with canonical templates or sequence rules. In addition to its overall improvement in prediction accuracy, it is suggested that DCP is open to better implementations in the future and that it can improve as more antibody structures are deposited in the databank. In contrast, it is argued that canonical templates and sequence rules may have reached their peak.Entities:
Keywords: Antibody engineering; Blind test; CDR conformation; CDR-H3 sequence rules; Canonical templates; Conformational prediction; DCP; Humanisation; Prediction from sequence
Year: 2014 PMID: 25071985 PMCID: PMC4103075 DOI: 10.7717/peerj.455
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
DCP terms and definitions.
A list of terms that were used for the formulation of the DCP method and their definitions.
| Interaction Frame (IF) | A list of Fv positions that are found in the neighbourhood of the |
| IF sequence | A sequence of residues derived from an antibody’s Fv that |
| Query IF sequence set | A group of non-redundant IF sequences from all members of |
| Target IF sequence set | A group of non-redundant IF sequences from the members of all |
| IF fragment | A singlet or a non-necessarily consecutive combination of IF positions |
| IF fragment sequence | The corresponding sequence of residues in an IF fragment |
| Query/Target | IF fragment sequences from the Query/Target IF sequence |
| Signature signal | An IF fragment that presents disjoint IF fragment sequences |
| DCP signature | The complete set of signature signals that are consequently |
Figure 1Preparatory steps for DCP.
Here, an Interaction Frame (IF) is selected for CDR-L1 and the corresponding IF sequences are synthesised for each one of the four clusters of the given length. For computational reasons the same IF is defined for all lengths of any given CDR (here, CDR-L1 for illustration purposes). Therefore, observed gaps in IF sequences correspond to insertions populated in longer lengths than the one shown in the illustrated example—gaps are filled accordingly in those lengths’ IF sequences. Spare gaps, on the other hand, may correspond to IF positions pointing to unpopulated insertions from other CDRs or deletions in the Fv sequence. Also, gaps are present if there is no Light or Heavy chain in that particular structure. Positions at the end of the IF, marked as ‘n−x’, refer to CDR-H3 positions at a sequential distance x from the last residue n (H102; see text).
Figure 2The training procedure using Disjoint Combinations Profiling.
Definition of Query and Target IF sequence sets, extraction of all available IF fragment sequences and comparison between corresponding sets of fragments for disjointness, leading to signature signals.
Figure 3Representation of the workflow for CDR conformation prediction by DCP signatures.
New Fv sequences are referred to as “Query” sequences, as they become the profiled object, and therefore IF fragment sequences from the new Fv sequences become ‘Query IF fragment sequences’ for the purposes of prediction.
Canonical positions.
Table showing the canonical positions per CDR/length, used for CDR conformation prediction by canonical templates.
| CDR/Length | Canonical positions |
|---|---|
| CDR-L1/11 | L2 L4 L25 L26 L28 L29 L30 L33 L34 L36 L46 L49 L51 L71 L90 L93 |
| CDR-L1/12 | L2 L4 L25 L29 L33 L71 L90 L91 L93 |
| CDR-L1/13 | L4 L25 L29 L30 L33 L66 L71 |
| CDR-L1/14 | L4 L25 L29 L30 L31 L33 L66 L71 L90 |
| CDR-L1/15 | L2 L4 L24 L25 L26 L28 L29 L30 L30c L33 L34 L51 L71 L90 L92 L93 |
| CDR-L1/16 | L2 L4 L25 L26 L27 L29 L30a L30b L30c L30d L32 L33 L34 L51 L71 L90 L92 L93 |
| CDR-L3/8 | L36 L89 L90 L91 L94 L95 L97 L98 |
| CDR-L3/9 | L2 L3 L4 L28 L30 L31 L32 L33 L89 L90 L91 L92 L93 L94 L95 L96 L97 L98 H47 |
| CDR-L3/10 | L4 L32 L36 L89 L90 L91 L92 L95a L96 L97 L98 H47 |
| CDR-H1/13 | H2 H4 H20 H24 H26 H29 H32 H33 H34 H35 H48 H51 H69 H78 H80 H90 H94 H102 |
| CDR-H1/15 | H20 H24 H26 H28 H29 H34 H48 H53 H78 H80 H94 |
| CDR-H2/9 | H47 H51 H55 H59 H69 H71 |
| CDR-H2/10 | H33 H47 H50 H51 H52 H53 H54 H55 H56 H58 H59 H69 H71 H78 |
| CDR-H2/12 | L94 H47 H50 H51 H54 H55 H59 H69 H71 H78 |
Dataset naming and usage.
Summary of experiments performed, explaining the usage of datasets in each phase.
|
|
| ||
|---|---|---|---|
|
|
| ||
| Clustering | - DCP training | - DCP training | |
| Blind set | Subset 1 | - DCP validation | - DCP training |
| Subset 2 | - DCP testing | - DCP testing | |
Interaction Frames that resulted in the construction of the most accurate DCP signatures, and their respective CDR neighbourhood radius.
Notations ‘E’, ‘K’ and ‘K+’, at the end of the CDR-H3-base Interaction Frame, refer to the β-hairpin type that is favoured at the CDR-H3 apex, depending on the formation of an Extended (E), Kinked (K) and Kinked with double-bulged base (K+).
| CDR | Interaction Frames | CDR Neighbourhood radius (Å) |
|---|---|---|
| CDR-L1 | L2 L3 L4 L5 L22 L24 L25 L26 L27 L28 L29 L30 L30a L30b L30c L30d | 6 |
| CDR-L2 | L30 L30a L30b L30c L30d L30e L30f L31 L32 L33 L34 L46 L47 L48 L49 | 4 |
| CDR-L3 | L1 L2 L3 L4 L27 L28 L29 L30 L30a L30b L30c L30d L30e L30f L31 L32 | 4 |
| CDR-H1 | L91 L92 L93 L96 H1 H2 H3 H4 H5 H6 H7 H20 H23 H24 H25 H26 H27 | 8 |
| CDR-H2 | H24 H28 H29 H30 H31 H31a H31b H31c H31d H31e H31f H31g H31h | 6 |
| CDR-H3-base | L34 L36 L43 L44 L45 L46 L49 L55 L87 L89 L91 L96 L98 H4 H27 H35 | 4 |
Individual accuracy percentages per experiment in CDR-L1 and -L3, excluding non-predictable (novel) conformations.
The previously acquired clustering set was used for initial DCP training and canonical templates’ updating. The newly downloaded blind dataset was divided in two subsets: for DCP, subset 1 was used for parameter validation (“validation set”), while subset 2 was used for evaluation (“test set”). Both subsets were used for evaluation of canonical templates, as no parameterisation was necessary, however the terms “validation” and “test” were retained for the two subsets for disambiguation and in order to allow direct comparisons. In post-evaluation Phase-2, the validation set was merged to the clustering set for DCP re-training and canonical templates’ re-updating. Updated methods were then evaluated on the test set that remained blind, but also were applied for retro-prediction on the validation set.
|
| |||||||
| Phase-1 Initial DCP signatures | Phase-2 Updated DCP signatures | Phase-1 Initial canonical templates | Phase-2 Updated canonical templates | ||||
| Training: |
| Training: | Training: | Template Updating: |
| Training: | Template Updating: |
| 99% (86/87) |
| 100% (87/87) | 98% (76/78) | 92% (80/87) |
| 98% (85/87) | 96% (75/78) |
| Cumulative evaluation on | Cumulative evaluation on validation | Cumulative evaluation on validation | Cumulative evaluation on validation | ||||
| 99% (163/165) | 99% (163/165) | 94% (155/165) | 97% (160/165) | ||||
|
| |||||||
| Phase-1 Initial DCP signatures | Phase-2 Updated DCP signatures | Phase-1 Initial canonical templates | Phase-2 Updated canonical templates | ||||
| Training: |
| Training: | Training: | Template Updating: | Training: | Template Updating: | |
| 95% (84/88) |
| 100% (88/88) | 91% (72/79) | 95% (69/73) |
| 100% (73/73) | 89% (63/71) |
| Cumulative evaluation on | Cumulative evaluation on validation | Cumulative evaluation on validation | Cumulative evaluation on validation | ||||
| 92% (154/167) | 96% (160/167) | 91% (131/144) | 94% (136/144) | ||||
Individual accuracy percentages per experiment in CDR-H1, -H2 and -H3, excluding non-predictable (novel) conformations.
Also see notes in Table 5.
|
| |||||||
| Phase-1 Initial DCP signatures | Phase-2 Updated DCP signatures | Phase-1 Initial canonical templates | Phase-2 Updated canonical templates | ||||
| Training: |
| Training: | Training: | Template Updating: |
| Training: | Template Updating: |
| 96% (92/96) |
| 100% (96/96) | 96% (92/96) | 79% (76/96) |
| 83% (80/96) | 85% (82/96) |
| Cumulative evaluation on validation + test sets | Cumulative evaluation on validation | Cumulative evaluation on validation | Cumulative evaluation on validation | ||||
| 95% (183/192) | 98% (188/192) | 81% (156/192) | 84% (162/192) | ||||
|
| |||||||
| Phase-1 Initial DCP signatures | Phase-2 Updated DCP signatures | Phase-1 Initial canonical templates | Phase-2 Updated canonical templates | ||||
| Training: |
| Training: | Training: | Template Updating: |
| Training: | Template Updating: |
| 93% (98/105) |
| 100% (105/105) | 81% (87/108) | 58% (61/105) |
| 64% (67/105) | 56% (61/108) |
| Cumulative evaluation on validation + test sets | Cumulative evaluation on validation | Cumulative evaluation on validation | Cumulative evaluation on validation | ||||
| 91% (193/213) | 90% (192/213) | 58% (123/213) | 60% (128/213) | ||||
|
| |||||||
| Phase-1 Initial DCP signatures | Phase-2 Updated DCP signatures | 1999 sequence rules | 2007 sequence rules | ||||
| Training: |
| Training: | Training: | Evaluation: |
| Evaluation: |
|
| 89% (93/104) |
| 100% (104/104) | 88% (99/112) | 83% (86/104) |
| 86% (89/104) |
|
| Cumulative evaluation on validation + test sets | Cumulative evaluation on validation | Cumulative evaluation on validation | Cumulative evaluation on validation | ||||
| 90% (195/216) | 94% (203/216) | 85% (183/216) | 85% (183/216) | ||||
Summary table of Phase-1 prediction results over all test data belonging to non-single cluster lengths, for CDR-L1 and -L3.
Percentages are rounded to the closest unit. Totals for canonical templates in CDR-L3 are marked in italics because they don’t include predictions for a length of 11-residues (no template available). For a direct comparison, total accurate predictions for DCP signatures for 8-, 9- and 10-resides CDR-L3 were 133/153 (87%). Totals include novel conformations.
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
| CDR/Length | Accurately | Uncertain | False |
| Accurately | Uncertain | False | New CDR | Unique CDR |
| CDR-L1-11 | 97/107 (91%) | 0/105 | 0/107 |
| 93/107 (87%) | 4/107 (4%) | 0/107 | 68/107 (64%) | 177 |
| CDR-L1-12 | 11/14 (79%) | 0/14 | 2/14 (14%) |
| 8/14 (57%) | 4/14 (29%) | 1/14 (7%) | 11/14 (79%) | 25 |
| CDR-L1-13 | 16/17 (94%) | 0/17 | 0/17 |
| 16/17 (94%) | 0/17 | 0/17 | 15/17 (88%) | 26 |
| CDR-L1-14 | 10/10 (100%) | 0/10 | 0/10 |
| 10/10 (100%) | 0/10 | 0/10 | 6/10 (60%) | 26 |
| CDR-L1-15 | 11/11 (100%) | 0/11 | 0/11 |
| 10/11 (91%) | 1/11 (9%) | 0/11 | 9/11 (82%) | 16 |
| CDR-L1-16 | 18/18 (100%) | 0/18 | 0/18 |
| 18/18 (100%) | 0/18 | 0/18 | 10/18 (56%) | 71 |
| Total | 163/177 (92%) | 0/177 | 2/177 (1%) |
| 155/177 (88%) | 9/177 (5%) | 1/177 (0.5%) | 119/177 (67%) | 341 |
| CDR-L3-8 | 18/19 (95%) | 0/19 | 1/19 (5%) |
| 17/19 (89%) | 1/19 (5%) | 1/19 (5%) | 12/19 (63%) | 44 |
| CDR-L3-9 | 111/119 (93%) | 1/119 (1%) | 6/119 (5%) |
| 110/119 (92%) | 4/119 (3%) | 4/119 (3%) | 84/119 (71%) | 359 |
| CDR-L3-10 | 4/15 (27%) | 0/15 | 4/15 (27%) |
| 4/15 (27%) | 2/15 (13%) | 2/15 (13%) | 14/15 (93%) | 26 |
| CDR-L3-11 | 19/25 (76%) | 0/25 | 1/25 (4%) |
| N/A | N/A | N/A | 23/25 (92%) | 36 |
| Total | 152/178 (85%) | 1/178 (1%) | 12/178 (7%) |
|
|
|
| 133/178 (75%) | 465 |
Extended performance measures for major cluster (class-I) predictions in each CDR-L1 and -L3 length (Phase-1).
No canonical templates were available for CDR-L3/11-residues. The asterisk points the fact that clusters in CDR-L3/10-residues are all small, however cluster CDR-L3-10-I was technically considered here for consistency with all other major clusters.
| Class-I predictions | Statistics | |||||||
|---|---|---|---|---|---|---|---|---|
| CDR/Length | True positives | True negatives | False positives | False negatives | Accuracy | Precision | Recall | |
|
| ||||||||
| CDR-L1-11 | 82 | 21 | 4 | 0 | 0.96 | 0.95 | 1.00 | 0.98 |
| CDR-L1-12 | 7 | 5 | 1 | 1 | 0.86 | 0.88 | 0.88 | 0.88 |
| CDR-L1-13 | 18 | 0 | 1 | 0 | 0.95 | 0.95 | 1.00 | 0.97 |
| CDR-L1-14 | 8 | 2 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
| CDR-L1-15 | 11 | 0 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
| CDR-L1-16 | 18 | 0 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
|
| ||||||||
| CDR-L1-11 | 82 | 21 | 4 | 0 | 0.96 | 0.95 | 1.00 | 0.98 |
| CDR-L1-12 | 4 | 5 | 0 | 5 | 0.64 | 1.00 | 0.44 | 0.62 |
| CDR-L1-13 | 18 | 1 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
| CDR-L1-14 | 8 | 2 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
| CDR-L1-15 | 10 | 0 | 0 | 1 | 0.91 | 1.00 | 0.91 | 0.95 |
| CDR-L1-16 | 18 | 0 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
|
| ||||||||
| CDR-L3-8 | 14 | 4 | 1 | 0 | 0.95 | 0.93 | 1.00 | 0.97 |
| CDR-L3-9 | 107 | 4 | 8 | 0 | 0.93 | 0.93 | 1.00 | 0.96 |
| CDR-L3-10* | 1 | 11 | 0 | 3 | 0.80 | 1.00 | 0.25 | 0.40 |
| CDR-L3-11 | 19 | 1 | 5 | 0 | 0.80 | 0.79 | 1.00 | 0.88 |
|
| ||||||||
| CDR-L3-8 | 13 | 4 | 1 | 1 | 0.89 | 0.93 | 0.93 | 0.93 |
| CDR-L3-9 | 105 | 8 | 4 | 2 | 0.95 | 0.96 | 0.98 | 0.97 |
| CDR-L3-10* | 1 | 11 | 0 | 3 | 0.80 | 1.00 | 0.25 | 0.40 |
| CDR-L3-11 | – | – | – | – | – | – | – | – |
Summary table of Phase-1 prediction results over all test data belonging to non-unique-cluster lengths, for CDR-H1 and -H2.
Totals include novel conformations.
| DCP signatures | Canonical templates | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| CDR/Length | Accurately | Uncertain | False |
| Accurately | Uncertain | False | New CDR | Unique CDR |
| CDR-H1-13 | 178/201 (89%) | 0/201 | 8/201 (4%) |
| 153/201 (76%) | 24/201 (12%) | 8/201 (4%) | 135/201 (67%) | 419 |
| CDR-H1-15 | 5/9 (56%) | 0/9 | 2/9 (22%) |
| 3/9 (33%) | 2/9 (22%) | 2/9 (22%) | 8/9 (89%) | 27 |
| Total | 183/210 (87%) | 0/210 | 10/210 (5%) |
| 156/210 (74%) | 26/210 (12%) | 10/210 (5%) | 143/210 (68%) | 446 |
| CDR-H2-9 | 41/41 (100%) | 0/41 | 0/41 |
| 27/41 (66%) | 14/41 (34%) | 0/41 | 30/41 (73%) | 117 |
| CDR-H2-10 | 145/168 (86%) | 6/168 (4%) | 14/168 (8%) |
| 89/168 (53%) | 60/168 (36%) | 16/168 (10%) | 128/168 (76%) | 350 |
| CDR-H2-12 | 7/8 (88%) | 0/8 | 0/8 |
| 7/8 (88%) | 0/8 | 0/8 | 4/8 (50%) | 39 |
| Total | 193/217 (89%) | 6/217 (3%) | 14/217 (6%) |
| 123/217 (57%) | 74/217 (34%) | 16/217 (7%) | 162/217 (75%) | 506 |
Extended performance measures for major cluster (class-I) predictions in each CDR-H1 and -H2 length (Phase 1).
| Class-I predictions | Statistics | |||||||
|---|---|---|---|---|---|---|---|---|
| CDR/Length | True positives | True negatives | False positives | False negatives | Accuracy | Precision | Recall | |
|
| ||||||||
| CDR-H1-13 | 173 | 6 | 21 | 1 | 0.89 | 0.89 | 0.99 | 0.94 |
| CDR-H1-15 | 4 | 1 | 4 | 0 | 0.56 | 0.50 | 1.00 | 0.67 |
|
| ||||||||
| CDR-H1-13 | 153 | 7 | 20 | 21 | 0.80 | 0.88 | 0.88 | 0.88 |
| CDR-H1-15 | 3 | 2 | 3 | 1 | 0.56 | 0.50 | 0.75 | 0.60 |
|
| ||||||||
| CDR-H2-9 | 41 | 0 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 |
| CDR-H2-10 | 103 | 45 | 4 | 16 | 0.88 | 0.96 | 0.87 | 0.91 |
| CDR-H2-12 | 7 | 0 | 1 | 0 | 0.88 | 0.88 | 1.00 | 0.93 |
|
| ||||||||
| CDR-H2-9 | 27 | 0 | 0 | 14 | 0.66 | 1.00 | 0.66 | 0.79 |
| CDR-H2-10 | 74 | 43 | 6 | 45 | 0.70 | 0.93 | 0.62 | 0.74 |
| CDR-H2-12 | 7 | 0 | 1 | 0 | 0.88 | 0.88 | 1.00 | 0.93 |
Summary table of Phase-1 prediction results for the CDR-H3-base conformation over all test data.
| DCP signatures | H3-rules, 1999 edition | H3-rules, 2007 edition | ||||
|---|---|---|---|---|---|---|
| Accurately | False | Accurately | False | Accurately | False | |
| CDR-H3-base | 195/216 (90%) | 21/216 (10%) | 183/216 (85%) | 33/216 (15%) | 183/216 (85%) | 33/216 (15%) |
Extended performance measures for Kinked base predictions in CDR-H3 (Phase 1).
| Class-I predictions | Statistics | |||||||
|---|---|---|---|---|---|---|---|---|
| True positives | True negatives | False positives | False negatives | Accuracy | Precision | Recall | ||
| CDR-H3, kinked |
| |||||||
| 191 | 4 | 13 | 8 | 0.90 | 0.94 | 0.96 | 0.95 | |
|
| ||||||||
| 182 | 1 | 15 | 18 | 0.85 | 0.92 | 0.91 | 0.92 | |
|
| ||||||||
| 178 | 5 | 13 | 20 | 0.85 | 0.93 | 0.90 | 0.92 | |