| Literature DB >> 31535139 |
Rob van der Kant1,2, Joschka Bauer3, Anne R Karow-Zwick3, Sebastian Kube3, Patrick Garidel3, Michaela Blech3, Frederic Rousseau1,2, Joost Schymkowitz1,2.
Abstract
Monoclonal antibodies bind with high specificity to a wide range of diverse antigens, primarily mediated by their hypervariable complementarity determining regions (CDRs). The defined antigen binding loops are supported by the structurally conserved β-sandwich framework of the light chain (LC) and heavy chain (HC) variable regions. The LC genes are encoded by two separate loci, subdividing the entity of antibodies into kappa (LCκ) and lambda (LCλ) isotypes that exhibit distinct sequence and conformational preferences. In this work, a diverse set of techniques were employed including machine learning, force field analysis, statistical coupling analysis and mutual information analysis of a non-redundant antibody structure collection. Thereby, it was revealed how subtle changes between the structures of LCκ and LCλ isotypes increase the diversity of antibodies, extending the predetermined restrictions of the general antibody fold and expanding the diversity of antigen binding. Interestingly, it was found that the characteristic framework scaffolds of κ and λ are stabilized by diverse amino acid clusters that determine the interplay between the respective fold and the embedded CDR loops. In conclusion, this work reveals how antibodies use the remarkable plasticity of the beta-sandwich Ig fold to incorporate a large diversity of CDR loops.Entities:
Keywords: antibody; architecture; isotype
Mesh:
Substances:
Year: 2019 PMID: 31535139 PMCID: PMC6908821 DOI: 10.1093/protein/gzz012
Source DB: PubMed Journal: Protein Eng Des Sel ISSN: 1741-0126 Impact factor: 1.650
Fig. 1Global sequence-structure analysis of antibodies VL domains splitting into kappa and lambda isotypes. (A) Schematic representation of the surface of a full-length antibody that comprises two identical heavy (HC, blue) and light chains (LC, yellow), jointly forming the antigen (Ag, green) binding fragment (Fab, highlighted by red box). (B) The Fab fragment consists of HC’s constant CH1 (cyan) and variable VH domain (red), as well as LC’s constant CL (green) and variable VL domain (blue). Fab’s Ag affinity is cooperatively mediated by six complementarity-determining regions (CDRs) that are equally distributed over VH and VL (blue box). (C) VL contributes to the Ag binding via three CDR loops (CDR L1 in magenta, CDR L2 in cyan and CDR L3 in red, Chothia definition) that are structurally orientated by four neighboring framework regions (LFR1 green, LFR2 orange, LFR3 yellow, and LFR4 blue). (D) Scheme of the VL structure that comprises three CDR loops being embedded in four β-strand (arrows) rich LFRs. N, amino-terminal end; C, carboxy-terminal end. (E) A set of 333 non-redundant VL sequences was assigned to the lambda (λ, light gray) and kappa (κ; dark gray) subtype on the basis of their identity to the λ and κ consensus sequence. (F-H) Statistical analysis of the energy contributions to VL and CL structures. The boxplots incorporate the dataset’s entire value distribution (whiskers), 25th (lower limit of the box) and 75th percentile (upper limit), mean (red dot) as well as median (central line) with a 95% confidence interval (notches). Statistical significances obtained by a Wilcoxon rank-sum test are indicated by ns (P > 0.05), * (P ≤ 0.05), ** (P ≤ 0.01), *** (P ≤ 0.001) and **** (P ≤ 0.0001). (F) FoldX evaluations of the κ (dark gray) and λ (light gray) dataset provided the average residue contributions to the free energy of the folding stability of the entire LC (VL + CL). (G) The average free energies of the VL and CL fold are illustrated separately for the κ (dark gray) and λ (light gray) dataset. (H). The LFRs, and especially LFR2, contribute to the folding stability of VLκ (dark gray) and VLλ (light gray), while the CDRs (L1, L2, L3) destabilize both structures. (I & J) The cytoscape software (version 3.7.1. (Shannon )) generated 2D visualizations of the residue-residue interaction network that stabilizes the VLλ (I) and VLκ (J) folding. Each residue is shown as a node, and each edge represents a Van der Waals contact that was determined by FoldX. Representative protein structures of the VLκ and VLλ isotypes (pdb id 1L7I (Vajdos ) and 6axk (Oyen ) were used in consistence with Fig. 1E. In cytoscape, the network layout was set to perfuse force directed in order to ensure that residues buried in the structure are centered in the 2D representation. The color code corresponds to the regions of the domain as defined in Fig. 1D. Large nodes indicate residues that differentially stabilize the complementary isotypes as it was identified in Fig. 5B, and square-node representations of the corresponding residues in the contrary isotype simplify the cross-comparison. Residues that interact with the VH domain are shown in gray.
Fig. 5Key residues contribute significantly different to the folding stability of VLκ and VLλ structures. (A) Key residues that differed most clearly in their energy contribution to the folding stability of either VLκ or VLλ were identified. The Wilcoxon rank-sum test yielded a statistical significance of P ≤ 0.0001 (****) for key residues. The amino acid conservation in the ‘sequence logo’ format was extracted from Fig. 2A and B. In accordance to the schema in Fig. 1C, the color coding of the lower bar indicates the CDR L2 and LFR3. (B) The volcano plot visualizes the per-residue preference for stabilizing either the VLλ (negative energy values) or the VLκ fold (positive energy values) by plotting the difference of mean energy contribution to both structure types against the -log10 P-value (negative 10 base log of the P-value). Highly significant data points correspond to low P-values that appear at the top of the plot. The mean energy difference threshold was set at > 0.5 kcal/mol (corresponding to the accepted uncertainty of FoldX (Schymkowitz )) and the P-value threshold at P < 0.05 (corresponding to a -log10 P-value of < 1.30). The dots are color coded according to Fig. 1C and D. (C) The isotype-specific folds are stabilized by particular residue positions (Fig. 5B) that are highlighted on representative VLκ (red clusters, PDBid: 5ifa (Huang ) and VLλ structures (blue clusters, PDBid: 3ujj (Guan ). (D-E) Distinct clusters of stabilizing amino acids near the antigen binding site are shown for (D) VLκ (PDBid: 5ifa (Huang ) and (E) VLλ (PDBid: 3ujj (Guan ).
Descriptions of FoldX variables.
| Variable | Definition | Unit |
|---|---|---|
| total.energy | The predicted overall stability | kcal/mol |
| Backbone.Hbond | The contribution of backbone H-bonds | kcal/mol |
| Sidechain.Hbond | The contribution of sidechain H-bonds | kcal/mol |
| Van.der.Waals | The contribution of VanderWaals forces | kcal/mol |
| Electrostatics | The contribution of electrostatic interactions | kcal/mol |
| Solvation.Polar | The penalty for burying polar residues | kcal/mol |
| Solvation.Hydrophobic | The contribution of hydrophobic groups | kcal/mol |
| Van.der.Waals.clashes | The penalty for VanderWaals’ clashes (interresidue) | kcal/mol |
| entropy.sidechain | The entropy cost of fixing the sidechain | kcal/mol |
| entropy.mainchain | The entropy cost of fixing the mainchain/backbone | kcal/mol |
| cis_bond | The penalty for having a cis peptide bond | kcal/mol |
| torsional.clash | The penalty for VanderWaals’ torsional clashes (intraresidue) | kcal/mol |
| backbone.clash | The penalty for VanderWaals’ backbone-backbone clashes | kcal/mol |
| helix.dipole | The contribution of the helix dipole (electrostatic) | kcal/mol |
| disulfide | The contribution of disulfide bonds | kcal/mol |
| electrostatic.kon | Th electrostatic interaction between molecules in the precomplex | kcal/mol |
| energy.Ionisation | The contribution of ionization energy | kcal/mol |
| sidechain.burial | Burial of the sidechain | fraction |
| mainchain.burial | Burial of the backbone | fraction |
| sidechain.Occ | Occupancy of the sidechain | fraction |
| mainchain.Occ | Occupancy of the backbone | fraction |
Fig. 2Per-residue analysis of the isotype-specific sequence-structure relationship of VL. (A, B) “Sequence logo” (Wagih, 2017a) representation of the amino acid occurrence per Chothia positions in the LFR and CDR (red boxes) of VLκ (A) and VLλ (B) sequences. The stack height indicates the conservation of the position, whilst the character height reports the relative conservation of distinct amino acids. The theoretical maximum value for the sequence entropy of a protein alignment is 4.36 bits. 221 VLκ and 112 VLλ sequences were used for the alignment, significantly exceeding the critical quantity of 40 sequences allowing for accurate computation (Crooks ). The percentage frequency of contribution to the interaction with antigen (C) and VH domain (D) was calculated per position for VLκ (red) and VLλ (blue). Residues were considered to be relevantly involved in binding if the FoldX force field calculated an interaction energy of at least −0.5 kcal/mol. CDRs are marked by red boxes, and significance levels of the differences > 0.2 kcal/mol between VLκ and VLλ are indicated (* for P ≤ 0.05, ** for P ≤ 0.01, *** for P ≤ 0.001, **** for P ≤ 0.0001).
Similarities and differences in the conservation of VLκ and VLλ as determined by the alignments.
| Definition | Chothia numbers |
|---|---|
| Conserved in both VLκ and VLλ, identical amino acids | L5, L6, L16, L23, L35, L36, L37, L38, L40, L41, L44, L46, L48, L49, L57, L61, L62, L63, L64, L67, L68, L73, L75, L82, L86, L87, L88, L98, L99, L101, L102, L103 |
| Conserved in both VLκ and VLλ, different amino acids | L7, L11, L71, L105, L106, L109 and L111 |
| Highly variable in both VLκ and VLλ | L3, L13, L17, L19, L22, L27, L28, L29, L30, L30B, L30C, L31, L32, L33, L34, L42, L43, L45, L50, L51, L53, L58, L60, L76, L77, L78, L79, L80, L89, L91, L92, L93, L94, L95A, L95B, L96, L97, L104 |
| Conserved in VLκ while variable in VLλ | L1, L2, L8, L18, L20, L24, L26, L39, L47, L52, L59, L66, L69, L70, L72, L81, L90, L95, L107, L108 |
| Conserved in VLλ while variable in VLκ | L4, L9, L12, L15, L21, L25, L30A, L54, L55, L56, L83, L92, L100 |
Fig. 3Statistical Coupling Analysis (SCA) of VL structures identified three conserved clusters of statistically coupled amino acids. Ranganathan’s SCA method determined three statistically coupled networks of amino acids (1, red; 2, blue; 3, magenta) in the VL domain. The clusters are shown on a representative VLκ structure (left, PDB: 1l7i (Vajdos )) and marked through bars on a topological scheme. The color-coding scheme for CDR and LFR is maintained from Fig. 1C and D.
Ranganathan clusters
| Cluster # | Chothia # | Region |
|---|---|---|
| 1 | L7 | LFR1 |
| L25 | CDRL1 | |
| L62 | LFR3 | |
| L67 | LFR3 | |
| L98 | LFR4 | |
| L99 | LFR4 | |
| L101 | LFR4 | |
| L102 | LFR4 | |
| L104 | LFR4 | |
| 2 | L5 | LFR1 |
| L6 | LFR1 | |
| L22 | LFR1 | |
| L31 | CDRL1 | |
| L57 | LFR3 | |
| L58 | LFR3 | |
| L78 | LFR3 | |
| L82 | LFR3 | |
| L84 | LFR3 | |
| L94 | CDRL3 | |
| L95 | CDRL3 | |
| L95B | CDRL3 | |
| 3 | L12 | LFR1 |
| L18 | LFR1 | |
| L19 | LFR1 | |
| L20 | LFR1 | |
| L21 | LFR1 | |
| L36 | LFR2 | |
| L37 | LFR2 | |
| L38 | LFR2 | |
| L41 | LFR2 | |
| L45 | LFR2 | |
| L54 | CDRL2 | |
| L74 | LFR3 | |
| L75 | LFR3 |
Fig. 4Structural differences between VLκ and VLλ were determined by RandomForest. (A) The most reliable classifiers for assigning the dataset of full light chain structures to the κ and λ isotypes are identified by the mean decrease gini (variables defined in ). (B) Unknown sequences can be reliably assigned to the κ and λ isotype according to the length of LFR4. (C) An overlay of representative Cα-traces displays how the exclusion (VLκ, green, 5ifa (Huang )) or incorporation (VLλ, red, 3ujj (Guan )) of L106A in LFR4 affects the structure. (D) VLλ structures exhibited a significantly lower energy penalty for peptide bonds in the cis-conformation if compared to VLκ structures. (E) The number of residues in LFR1 enables to distinguish between VLκ and VLλ sequences. (F) Representative structures of VLλ (3ujj (Gorny ), red) and VLκ (5ifa (Jardine ), green) illustrate structural differences in LFR1. VLκ structures frequently contain a cis-proline at position L8, stabilizing a fold that dramatically differs from VLλ structures that favor trans-prolines at positions L7 and L8. (G) ‘Sequence logo’ representation of the amino acids identities in LFR1 of VLκ (top) and VLλ (bottom). (H, I) Representative structures of VLκ (H, green, PDBs: 3drq (Julien ), 4ygv (Schiele ), 4jm4 (Kong ) and 4zyk (Gilman )) and VLλ (I, red, PDBs: 4h8w (Acharya ), 1aqk (Faber ), 5d7s (Eylenstein ), 5cck (Lee )) clearly differ in the conserved environment of residue L95. Yellow cylinders, hydrogen bonds; yellow and orange structure, heavy chain; cyan residue, Gln L90; magenta, Pro L95; orange, Thr L97; blue, Phe L98; green, Val L97. (J) ‘Sequence logo’ representation of the amino acids occurrence spatially neighboring residues of L95 in VLκ (top) and VLλ (bottom).
Counts of canonical structures for Dunbrack and Deane definitions.
| Dunbrack | Deane | |||||
|---|---|---|---|---|---|---|
| CDR | Canonical | K count | L count | Canonical | K count | L count |
| L1 | L1-8-* | 0 | 2 | L1-10,11,12-A | 52 | 0 |
| L1-9-* | 4 | 0 | L1-11-A | 0 | 10 | |
| L1-10-1 | 4 | 0 | L1-11-B | 0 | 4 | |
| L1-11-1 | 103 | 0 | L1-12-A | 2 | 0 | |
| L1-11-2 | 11 | 1 | L1-12-B | 3 | 0 | |
| L1-11-3 | 2 | 36 | L1-12-D | 1 | 0 | |
| L1-12-1 | 13 | 0 | L1-13-A | 0 | 1 | |
| L1-12-2 | 10 | 0 | L1-13,14-A | 0 | 26 | |
| L1-12-3 | 0 | 1 | L1-17-A | 1 | 0 | |
| L1-13-1 | 0 | 28 | Unclustered | 4 | 5 | |
| L1-13-2 | 1 | 3 | ||||
| L1-14-2 | 0 | 24 | ||||
| L1-14-cis9-* | 0 | 1 | ||||
| L1-15-1 | 9 | 0 | ||||
| L1-15-2 | 1 | 0 | ||||
| L1-16-1 | 24 | 0 | ||||
| L1-16-cis9-* | 1 | 0 | ||||
| L1-17-1 | 9 | 0 | ||||
| <NA> | 29 | 16 | <NA> | 158 | 66 | |
| L2 | L2-6-* | 0 | 1 | L2-7-A | 63 | 44 |
| L2-8-1 | 182 | 77 | L2-7-B | 2 | 0 | |
| L2-8-2 | 5 | 10 | ||||
| L2-8-3 | 2 | 0 | ||||
| L2-8-4 | 1 | 5 | ||||
| L2-8-5 | 1 | 1 | ||||
| L2-8-cis3-* | 2 | 1 | ||||
| L2-12-2 | 0 | 1 | ||||
| <NA> | 28 | 16 | <NA> | 156 | 68 | |
| L3 | L3-5-* | 10 | 1 | L3-5-A | 4 | 1 |
| L3-6-cis4-* | 0 | 1 | L3-7-A | 1 | 0 | |
| L3-7-1 | 1 | 0 | L3-8-A | 2 | 0 | |
| L3-8-1 | 9 | 0 | L3-9-A | 0 | 3 | |
| L3-8-2 | 4 | 0 | L3-9-B | 0 | 1 | |
| L3-8-cis6-1 | 1 | 0 | L3-9,10-A | 51 | 0 | |
| L3-9-1 | 4 | 12 | L3-10-A | 0 | 3 | |
| L3-9-2 | 12 | 2 | L3-10-C | 1 | 2 | |
| L3-9-cis5,7-* | 1 | 0 | L3-10-D | 0 | 1 | |
| L3-9-cis7-1 | 116 | 0 | L3-10,11-A | 0 | 15 | |
| L3-9-cis7-2 | 1 | 0 | L3-12-A | 0 | 1 | |
| L3-9-cis7-3 | 4 | 0 | L3-13-A | 0 | 2 | |
| L3-10-1 | 5 | 27 | Unclustered | 10 | 15 | |
| L3-10-cis5,8-* | 1 | 0 | ||||
| L3-10-cis6-* | 0 | 2 | ||||
| L3-10-cis7-* | 3 | 0 | ||||
| L3-10-cis7,8-1 | 6 | 0 | ||||
| L3-10-cis8-1 | 1 | 0 | ||||
| L3-11-1 | 8 | 37 | ||||
| L3-11-cis7-1 | 5 | 0 | ||||
| L3-11-cis8-* | 1 | 1 | ||||
| L3-12-1 | 1 | 11 | ||||
| L3-12-cis8-* | 0 | 1 | ||||
| L3-13-1 | 0 | 2 | ||||
| <NA> | 27 | 15 | <NA> | 152 | 68 | |
Fig. 6Chord diagrams of mutual information between canonical structures and framework residues. (A, B, C) Mutual information of Dunbrack’s canonical structures (North ) and framework residues of (A) VL, (B) VLλ and (C) VLκ was calculated, and values > 0.5 are shown. The color of the chords corresponds with CDR’s canonical group (salmon: CDR L3, purple: CDR L1), and chord width correlates with the extent of mutual information (in bits). The coloring scheme of Chothia positions reflects their topological region similar to that used in Figs 1–3.
Fig. 7Networks of interacting amino acids stabilize distinct canonical structures of CDR L1 in VLλ. (A) Dunbrack’s CDR cluster L1-11-3 is favored by a hydrogen bond network between Asp L51 of CDR L2 and Asn L66 of LFR3, cooperatively coordinating the C-terminal end of CDR L1 (L33) and the center of CDR L1 (L29). PDB 4m5y was representatively used to visualize the structure (Hong ). (B) Canonical class L1-13-1 is stabilized by hydrogen bonding between Asn L51 of CDR L2 and Val L33 of CDR L1, and between Lys L66 of LFR3 and Ile L29 of CDR L1. The structure of PDB 3n9g was exemplary used for visualization (Kaufmann ). (C) The canonical structure L1-14-2 is favored by a hydrogen bond between Val L29 of CDR L1 and Lys L66 of LFR3, whilst L33 and L51 do not interact as it was shown for PDB 3kdm (Niemi ). (D) An overlay of (A-C) highlights the structural differences between the canonical classes L1-11-3, L1-13-1 and L1-14-2. The consistent hydrogen bonds between the sidechain of L66 and the central part CDR L1, and between the sidechain of L51 of CDR L2 and L33 of the C-terminus of CDR L1 define the Dunbrack canonical classes of CDR L1 in VLλ.