| Literature DB >> 30287940 |
Alejandro Clavero-Álvarez1, Tomas Di Mambro2, Sergio Perez-Gaviro1,3,4, Mauro Magnani2, Pierpaolo Bruscolini5,6.
Abstract
Antibody humanization is a key step in the preclinical phase of the development of therapeutic antibodies, originally developed and tested in non-human models (most typically, in mouse). The standard technique of Complementarity-Determining Regions (CDR) grafting into human Framework Regions of germline sequences has some important drawbacks, in that the resulting sequences often need further back-mutations to ensure functionality and/or stability. Here we propose a new method to characterize the statistical distribution of the sequences of the variable regions of human antibodies, that takes into account phenotypical correlations between pairs of residues, both within and between chains. We define a "humanness score" of a sequence, comparing its performance in distinguishing human from murine sequences, with that of some alternative scores in the literature. We also compare the score with the experimental immunogenicity of clinically used antibodies. Finally, we use the humanness score as an optimization function and perform a search in the sequence space, starting from different murine sequences and keeping the CDR regions unchanged. Our results show that our humanness score outperforms other methods in sequence classification, and the optimization protocol is able to generate humanized sequences that are recognized as human by standard homology modelling tools.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30287940 PMCID: PMC6172228 DOI: 10.1038/s41598-018-32986-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1ROC curves for the different models. ROC curves obtained upon classification of the test human and murine databases, using the human learning database to learn the MG distribution, and to calculate the average distance for the “T” and “T” methods. Panel (a): ROC curves obtained using the full VHVL chain for classification, with or without CDR regions. Panel (b): ROC curves obtained using the VH or VL chain separately for classification. “Corr. off” indicates that correlations between residues have been removed in the MG model, “FR” refers to the curve obtained removing the CDRs and keeping just the framework regions, “h-m” indicates that classification is performed using both human and murine learning datasets as reference (see Methods).
Figure 2Boxplot of three score distributions for the different datasets: Panel (a): MG score; Panel (b): T score; Panel (c): T20 score. Pharmaceutical antibodies are indicated according to the suffix in their International Nonproprietary Name: “umab” are fully human antibodies; “zumab” are humanized antibodies, usually containing murine CDRs grafted on top of human framework variable regions; “ximab” are chimeric antibodies, obtained by assembling the whole murine variable region on top of a human constant part; “omab” are murine antibodies. Since we deal with just the antibodies’ variable regions, “ximab” and “omab” are indistinguishable. The horizontal lines signal the threshold score above (for the MG case) or below (for the other cases) which the sequence is classified as human. The threshold values are , , .
Fraction of correct predictions.
| Human | Murine | umab | zumab | ximab | omab | |
|---|---|---|---|---|---|---|
| MG | 1289/1388 | 1324/1379 | 11/11 | 19/20 | 6/6 | 9/9 |
|
| 1225/1388 | 1296/1379 | 11/11 | 19/20 | 6/6 | 9/9 |
|
| 1195/1388 | 1274/1379 | 11/11 | 19/20 | 6/6 | 9/9 |
|
| 935/1388 | 1193/1379 | 11/11 | 19/20 | 5/6 | 8/9 |
Fraction of correct predictions for the test and therapeutic databases, using the threshold obtained as specified in Methods, to distinguish between human and murine sequences.
Figure 3Scatter plot of the experimental immunogenicity and the MG score. The immunogenicity (% of patients that develop antibodies against the therapeutic antibody) is plotted versus the MG-score.
Figure 4Scatter MG score-Hamming distance plot for the Steepest Descent (Panel (a)) and the SAMC (Panel (b)) for all our targets. Here HD is the Hamming distance between our proposed sequence and the original murine one. The SAMC trajectories start at the murine sequences, on the left (HD = 0, MG-score around 4500), then jump immediately to the region at the bottom-right of high HD, low score (highly non-human sequences, but very different from the original murine). Notice that all trajectories roughly overlap in this region: there is no memory of their different, and fixed, CDR sequences, and we witness a basically free exploration of the sequence space. Then, when the temperature falls below a certain threshold, the trajectories move to the top-left region, of highly human sequences with score and HD depending on the fixed CDR regions of the original sequence.
Comparison between original, simulated and experimentally humanized sequences.
| tgt | MG (o) | MG (h) | MG (p) | HD (p, o) | HD (h, o) | TP | FP | TN | FN | FPR | TPR | YJS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 4963 | 7571 | 9000 | 49 | 37 | 31 | 18 | 246 | 3 | 0.07 | 0.84 | 0.78 |
| 2 | 4905 | 7184 | 9328 | 61 | 48 | 43 | 18 | 234 | 3 | 0.07 | 0.90 | 0.83 |
| 3 | 4976 | 6780 | 7817 | 48 | 37 | 24 | 24 | 241 | 9 | 0.10 | 0.65 | 0.57 |
| 4 | 4734 | 5689 | 9362 | 61 | 38 | 22 | 39 | 228 | 9 | 0.15 | 0.58 | 0.46 |
| 5 | 4978 | 7202 | 8131 | 53 | 39 | 37 | 16 | 243 | 2 | 0.06 | 0.95 | 0.89 |
| 6 | 4350 | 5193 | 8159 | 62 | 29 | 18 | 44 | 232 | 4 | 0.16 | 0.62 | 0.48 |
| 7 | 4481 | 5468 | 8379 | 63 | 35 | 24 | 39 | 229 | 6 | 0.15 | 0.69 | 0.56 |
Column 2, 3 and 4 report the MG-score for the original murine sequence “o”, the experimentally humanized sequence “h”, and the one predicted by SD “p”, for each target (column 1). Column 5, 6 report the number of mutations between pairs of sequences (HD). We define as “positive” (P) the mutations of the predicted sequence with respect to the murine one: P = HD (p, o); “negative” (N) the number of corresponding identical residues in the predicted and murine sequence. Accordingly, True Positive (TP) will be the number of common mutations, with respect to “o”, shared by “p” and “h”; False Positive (FP) indicates that in the predicted sequence there is a mutation with respect to the murine, but such mutation is not the present in “h” (or it is not the same mutation); True Negative (TN) imply that neither the predicted nor the humanized sequence have mutations, while False Negative (FN) indicates that the humanized sequence present a mutation with respect to the murine, but the predicted sequence does not. Schematically, being A, B, C possible aminoacids for the triplet (murine, humanized, predicted), we have: (A, A, A) → TN; (A, A, B), (A, B, C) → FP; (A, B, A) → FN; (A, B, B) → TP; thus, HD (p, h) = FP + FN. The True and False Positive Rates are defined as TPR = TP/(TP + FN); FPR = FP/(TN + FP). The last column is the Youden’s index: see Methods.