| Literature DB >> 23803466 |
Pier Paolo Olimpieri1, Anna Chailyan, Anna Tramontano, Paolo Marcatili.
Abstract
MOTIVATION: Antibodies or immunoglobulins are proteins of paramount importance in the immune system. They are extremely relevant as diagnostic, biotechnological and therapeutic tools. Their modular structure makes it easy to re-engineer them for specific purposes. Short of undergoing a trial and error process, these experiments, as well as others, need to rely on an understanding of the specific determinants of the antibody binding mode.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23803466 PMCID: PMC3753563 DOI: 10.1093/bioinformatics/btt369
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The eleven classes used to encode amino acids in the reduced 12-letter alphabet adopted for models B and C
| Cluster | Amino acid |
|---|---|
| Aliphatuc | Ala, Val, Ile, Leu |
| Sulfur | Cys, Met |
| Hydroxyl | Ser, Thr |
| Acidic | Asp, Glut |
| Basic | His, Lys, Arg |
| Amide | Asn |
| Phenylalanine | Phe |
| Tryptophan | Trp |
| Tyrosine | Tyr |
| Glycine | Gly |
| Proline | Pro |
Random forest models
| Model | Sequence | Antigen Volume | CDRs Lengths | Germline | Position |
|---|---|---|---|---|---|
| A | 20 + gap | Yes | Yes | Yes | 20 + gap |
| B | 11 + gap | Yes | Yes | Yes | 20 + gap |
| C | 11 + gap | No | Yes | Yes | 20 + gap |
| Naive | No | No | No | No | 20 + gap |
Note: The different sets of variables were adopted to train models A, B, C and the naïve predictor. All predictors use the complete amino acid alphabet to encode the residue at the specific position for which the interaction is being predicted (‘Position’ column). The complete alphabet is used in model A to encode the whole sequence, while the reduced 12-letter alphabet described in Table 1 is adopted in models B and C (‘Sequence’ column). Models A, B and C share the same sequence-derived features (canonical structures, HV loop length and germline family). The ‘Antigen Volume’ binary variable, which labels antigens with a volume larger or smaller than 1538 Å3, is used only in models A and B.
Fig. 1.Non-bonded contact prediction ROC curves for models A,B and C and the naive predictor
Matthews correlation coefficient and area under the curve values for each classifier and each type of interaction
| Non-bonded contacts (%) | Hydrogen bonds (%) | Hydrophobic interactions (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| All | Side | Main | All | Side | Main | All | Side | Main | |
| AUC | |||||||||
| A | 84.8 | 83.7 | 82.2 | 73.8 | 73.3 | 75.9 | 79.4 | 79.6 | 70.8 |
| B | 85.1 | 85.0 | 82.8 | 76.3 | 76.6 | 76.1 | 80.7 | 80.5 | 72.2 |
| C | 84.7 | 84.5 | 82.6 | 75.9 | 75.9 | 75.2 | 80.1 | 80.4 | 71.3 |
| Naive | 77.7 | 78.0 | 69.8 | 64.8 | 64.7 | 58.8 | 72.0 | 73.6 | 59.9 |
| MCC | |||||||||
| A | 51.9 | 48.2 | 41.5 | 25.5 | 26.0 | 19.8 | 36.2 | 36.6 | 11.2 |
| B | 52.2 | 51.0 | 40.2 | 26.9 | 27.0 | 22.0 | 38.5 | 38.4 | 14.1 |
| C | 51.2 | 49.8 | 40.4 | 26.9 | 25.3 | 20.8 | 37.5 | 38.0 | 14.2 |
| Naive | 41.4 | 41.1 | 25.4 | 18.5 | 20.2 | 12.6 | 30.0 | 33.5 | 0.0 |
The top 20 variables ordered according to their overall importance
| Heavy chain | Light chain | ||
|---|---|---|---|
| Variable | Importance | Variable | Importance |
| Germline family VL | 208,56 | Germline family VH | 111,16 |
| Position | 140,90 | Germline family VL | 90,48 |
| Germline family VH | 107,98 | Position | 82,35 |
| H:95 + 1 | 97,65 | L:96 | 70,88 |
| H:95 + 2 | 93,78 | H3 Length | 68,93 |
| H:101 − 3 | 91,44 | L:92 | 51,37 |
| H:95 | 87,91 | L:50 | 51,37 |
| L1 Canonical structure | 87,62 | L:94 | 51,28 |
| H:95 + 3 | 86,54 | L:91 | 49,73 |
| H:101 − 4 | 84,89 | L:30 | 41,40 |
| H:101 − 2 | 82,69 | L:93 | 39,71 |
| H:50 | 75,04 | L:55 | 39,52 |
| H:95 + 4 | 67,92 | L:32 | 38,75 |
| H:33 | 66,77 | L:34 | 37,53 |
| H:52 | 62,25 | L1 Canonical structure | 37,26 |
| H:53 | 58,77 | L3 Canonical structure | 30,07 |
| H:56 | 58,33 | H2 Canonical structure | 28,34 |
| H:101 − 1 | 52,05 | L:89 | 26,85 |
| Antigen volume | 50,83 | Antigen volume | 25,70 |
| H:58 | 49,05 | L:30 | 24,96 |
Note: The variable importance has been calculated by summing the mean decrease Gini value of the variable for each position. H3 residues are numbered according to their relative position with respect to H:95 and H:101 (i.e. H:95, H:95 + 1, H:95 + 2, … , H:101-2, H:101-1, H:101).
Comparison between proABC and Paratome
| Index | proABC | Paratome |
|---|---|---|
| True positives | 624 | 778 |
| False positives | 286 | 1386 |
| True negatives | 1264 | 164 |
| False negatives | 156 | 2 |
| Recall | 80% | 100% |
| Precision | 69% | 36% |
| Specificity | 82% | 11% |
| MCC | 60% | 19% |
Note: Comparison of proABC and Paratome in terms of true positives, true negatives, false positives, false negatives, MCC, recall, precision and specificity of the two methods.
Fig. 2.Three-dimensional model generated by the proABC server for Gevokizumab. Residues are colored according to their predicted contact probabilities (light gray to blue gradient for the light chain, dark gray to purple gradient for the heavy chain)
Fig. 3.The results of the proABC server on the sequence of the humanized antibody Gevokizumab are compared with the experimentally observed interactions computed using Ligplot on the solved structure of the antibody in complex with its cognate antigen (PDB code:: 4G6M). The plots report the non-bonded contact probability for each residue and, separately, for its side chain and its main chain. The same information is reported in a tabular form as well. False positive predictions were made for H:100, H:95, H:50 (all and side) and H:96 (all and main), false negative predictions for H:32, L:50, L:53, L:28, L:27 (all, side), H:57 (all, main) and H:58 (main) while the other predictions shown in Fig.3 are all true positives. Each residue for which proABC returns an interaction probability higher than 0.5 is considered as a predicted contact