| Literature DB >> 33180672 |
Simon P Kelow1,2, Jared Adolf-Bryfogle3,4,5, Roland L Dunbrack1.
Abstract
Antibody variable domains contain "complementarity-determining regions" (CDRs), the loops that form the antigen binding site. CDRs1-3 are recognized as the canonical CDRs. However, a fourth loop sits adjacent to CDR1 and CDR2 and joins the D and E strands on the antibody v-type fold. This "DE loop" is usually treated as a framework region, even though mutations in the loop affect the conformation of the CDRs and residues in the DE loop occasionally contact antigen. We analyzed the length, structure, and sequence features of all DE loops in the Protein Data Bank (PDB), as well as millions of sequences from HIV-1 infected and naïve patients. We refer to the DE loop as H4 and L4 in the heavy and light chains, respectively. Clustering the backbone conformations of the most common length of L4 (6 residues) reveals four conformations: two κ-only clusters, one λ-only cluster, and one mixed κ/λ cluster. Most H4 loops are length-8 and exist primarily in one conformation; a secondary conformation represents a small fraction of H4-8 structures. H4 sequence variability exceeds that of the antibody framework in naïve human high-throughput sequences, and both L4 and H4 sequence variability from λ and heavy germline sequences exceed that of germline framework regions. Finally, we identified dozens of structures in the PDB with insertions in the DE loop, all related to broadly neutralizing HIV-1 antibodies (bNabs), as well as antibody sequences from high-throughput sequencing studies of HIV-infected individuals, illuminating a possible role in humoral immunity to HIV-1.Entities:
Keywords: Antibody therapeutics; antibody complementarity determining regions; antibody structure; structural bioinformatics
Year: 2020 PMID: 33180672 PMCID: PMC7671036 DOI: 10.1080/19420862.2020.1840005
Source DB: PubMed Journal: MAbs ISSN: 1942-0862 Impact factor: 5.857
Figure 1.Position of the DE loop in antibody structures
Figure 2.Ramachandran plots for part of the D strand, the DE loop, and part of the E strand (IMGT residues 77–90) for the most common DE loop lengths
Map between various numbering schemes within H4 and L4 loops
| IMGT | AHo | Position in | Position in | Position in | Chothia/Kabat | Position in L4-6 | Position | Chothia/Kabat |
|---|---|---|---|---|---|---|---|---|
| 80 | 82 | 1 | 1 | 1 | 71 | 1 | 1 | 66 |
| 81 | 83 | 2 | 2 | 2 | 72 | 2 | 2 | 67 |
| 82 | 84 | 3 | 3 | 3 | 73 | 3 | 3 | 68 |
| 83 | 85 | 4 | 4 | 74 | 4 | 68A | ||
| 84 | 86 | 5 | 75 | 5 | 68B | |||
| 85 | 87 | 4 | 5 | 6 | 76 | 4 | 6 | 69 |
| 86 | 88 | 5 | 6 | 7 | 77 | 5 | 7 | 70 |
| 87 | 89 | 6 | 7 | 8 | 78 | 6 | 8 | 71 |
Mapping from various antibody numbering schemes to the numbering scheme used (1 to N where N is the length of the CDR loop considered. AHo indicates the Honegger-Plückthun numbering scheme
DE loop canonical families
| Gene | Cluster | Rama. | Common | # PDB | # PDB chains (≥0.75 EDIA) | % chains (all) | % chains (≥0.75 EDIA) | Unique seqs (all) | Unique seqs (≥0.75 EDIA) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| κ | L4-6-1 | EBEABB | GSGTDF | 3,854 | 1,393 | 64.5 | 66.6 | 77 | 48 | 120, 170 | −164, 158 | 73, −100 | −118, −11 | −119, 122 | −132, 153 | ||
| λ/κ* | L4-6-2 | BBEABB | KSGTTA | 1,377 | 562 | 23.0 | 26.9 | 81 | 67 | −141, 136 | −144, 114 | 64, −120 | −100, 10 | −120, 136 | −115, 148 | ||
| κ | L4-6-3 | BBAABB | GSGTDF | 95 | 48 | 1.6 | 2.3 | 7 | 4 | 163, 168 | −90, −144 | −88, −23 | −128, −13 | −125, 125 | −131, 151 | ||
| λ | L4-6-4 | BBLLBB | LIGGKA | 31 | 14 | 0.5 | 0.7 | 6 | 5 | −110, 130 | −138, 123 | 51, 48 | 70, 12 | −124, 156 | −86, 145 | ||
| λ/κ | noise | - | - | 622 | 76 | 10.4 | 3.5 | 74 | 29 | - | - | - | - | - | - | ||
| λ5/λ6 | L4-8-1 | BBAAALBB | IDSSSNSA | 90 | 40 | 100.0 | 100.0 | 8 | 6 | −128, 135 | −120, 100 | −65, −31 | −66, −34 | −101, 2 | 54, 44 | −134, 156 | −111, 151 |
| H | H4-6-1 | BBAABB | RTSTTV | 35 | 21 | 76.1 | 100.0 | 4 | 3 | −134, 152 | −119, −172 | −64, −34 | −124, 0 | −138, 158 | −127, 133 | ||
| H | noise | - | - | 11 | 0 | 23.9 | 0.0 | 5 | 0 | - | - | - | - | - | - | - | - |
| H | H4-7-1 | BABAABB | 37 | 17 | 100.0 | 100.0 | 7 | 4 | −94, 112 | −90, −22 | −160, −176 | −71, −15 | −125, 5 | −135, 137 | −124, 140 | ||
| H | H4-8-1 | BBAAALBB | RDNSKNTA | 6,269 | 1,953 | 94.0 | 96.8 | 646 | 333 | −143, 149 | −122, 110 | −65, −30 | −67, −34 | −102, 2 | 53, 47 | −129, 141 | −113, 144 |
| H | H4-8-2 | BBAALABB | RDNSKSTA | 60 | 19 | 0.9 | 1.1 | 26 | 12 | −136, 155 | −80, 169 | −60, −39 | −72, −12 | 64, 28 | −104, −11 | −135, 137 | −115, 145 |
| H | noise | - | - | 347 | 37 | 5.1 | 2.1 | 121 | 13 | - | - | - | - | - | - | - | - |
Properties and frequencies of L4 and H4 structural clusters and noise structures for each length of light chain and heavy chain DE loop. The clustering was performed on the entire PDB and on a subset of structures that pass an electron density cutoff (EDIA≥0.75). ϕ,ψ values (in degrees) are given for each residue in each cluster of L4-6, L4-8, H4-6, H4-7, and H4-8 DE loop length families. Ramachandran map regions are: A = alpha-helix region; B = beta sheet region; E = epsilon region (lower right of Ramachandran map); L = alpha-left region.
* Cluster L4-6-2 is composed of 75% λ chains and 25% κ chains
Figure 4.Structures of all DE loop germline-length clusters
Figure 3.Canonical conformations of L4 and H4
Figure 5.Comparison of H4 and L4 clusters with structural homology
Co-occurrence of L1/L4 pairs from structures in the PDB
| L1 cluster | Gene | # chains | L4-6-1 | L4-6-2 | L4-6-3 | L4-6-4 | L4-8-1 |
|---|---|---|---|---|---|---|---|
| L1-10-1 | κ | 185 | 99.4 | 0.6 | - | - | - |
| L1-10-2 | κ | 93 | 100.0 | - | - | - | - |
| L1-11-1 | κ | 1559 | 90.0 | 9.9 | 0.1 | - | - |
| L1-11-2 | κ | 508 | 91.6 | 8.2 | 0.2 | - | - |
| L1-12-1 | κ | 175 | 98.1 | 1.9 | - | - | - |
| L1-12-2 | κ | 110 | 98.1 | 1.9 | - | - | |
| L1-15-1 | κ | 274 | 98.8 | - | 1.2 | - | - |
| L1-16-1 | κ | 597 | 84.1 | 1.5 | 14.4 | - | - |
| L1-17-1 | κ | 277 | 99.0 | 0.3 | 0.7 | - | - |
| L1-11-3 | λ | 149 | - | 100.0 | - | - | - |
| L1-12-3 | λ | 32 | - | 100.0 | - | - | - |
| L1-13-1 | λ | 248 | - | 100.0 | - | - | - |
| L1-13-2 | λ | 71 | - | - | - | - | 100.0 |
| L1-14-1 | λ | 193 | - | 85.0 | - | 15.0 | - |
| L1-14-2 | λ | 176 | - | 100.0 | - | - | - |
| H1 cluster | Gene | # chains | H4-8-1 | H4-8-2 | H4-6-1 | H4-7-1 | |
| H1-13-1 | H | 4285 | 99.1 | 0.1 | 0.5 | 0.3 | - |
| H1-13-2 | H | 64 | 96.0 | 4.0 | - | - | - |
| H1-13-3 | H | 101 | 93.0 | 7.0 | - | - | - |
| H1-13-4 | H | 183 | 99.4 | 0.6 | - | ||
| H1-13-5 | H | 86 | 93.8 | 1.2 | 5.0 | - | |
| H1-13-6 | H | 28 | 100.0 | - | - | - | - |
| H1-13-7 | H | 58 | 94.6 | 5.4 | - | - | - |
| H1-13-10 | H | 16 | 81.3 | 18.7 | - | - | - |
| H1-14-1 | H | 102 | 100.0 | - | - | - | - |
| H1-15-1 | H | 173 | 97.1 | 2.9 | - | - | - |
For each L1 or H1 cluster, the distribution among the L4 or H4 clusters is provided in percent (excluding the noise cluster).
Figure 6.Various characteristic hydrogen bonds between the DE loop and CDR1. Hydrogen bonds are labeled CDR4.resnumAtom/CDRn-cluster.resnumAtom (e.g. H4.6O/H1-13-1.2N). If a hydrogen bond is specific to a particular cluster, that is included in the nomenclature
Figure 7.Sequence entropy in naïve human antibodies and human germlines
Average sequence entropies for CDR and framework regions
| germline | CDR1 | CDR2 | FR1 | FR2 | FR3 | CDR4 |
|---|---|---|---|---|---|---|
| IGHV1-18*04 | 0.38 | 0.44 | 0.04 | 0.17 | 0.21 | 0.27 |
| IGHV3-23*01 | 0.42 | 0.74 | 0.03 | 0.15 | 0.21 | 0.26 |
| IGHV4-34*01 | 0.14 | 0.38 | 0.05, | 0.14 | 0.17 | 0.29 |
| IGHV4-39*07 | 0.21 | 0.36 | 0.08 | 0.13 | 0.15 | 0.20 |
| IGKV1-39*01 | 0.29 | 0.22 | 0.20 | 0.12 | 0.13 | 0.11 |
| IGKV3-11*01 | 0.25 | 0.22 | 0.21 | 0.10 | 0.10 | 0.11 |
| IGKV3-20*01 | 0.30 | 0.21 | 0.23 | 0.10 | 0.10 | 0.07 |
| IGKV4-1*01 | 0.30 | 0.15 | 0.24 | 0.09 | 0.08 | 0.07 |
| IGLV1-40*01 | 0.25 | 0.29 | 0.22 | 0.12 | 0.05 | 0.07 |
| IGLV1-44*01 | 0.27 | 0.26 | 0.21 | 0.12 | 0.06 | 0.07 |
| IGLV2-14*01 | 0.24 | 0.30 | 0.17 | 0.13 | 0.06 | 0.08 |
| IGLV3-1*01 | 0.28 | 0.26 | 0.20 | 0.10 | 0.06 | 0.13 |
| gene | CDR1 | CDR2 | FR1 | FR2 | FR3 | CDR4 |
| IGH | 0.78 | 1.50 | 0.53 | 0.61 | 0.53 | 0.86 |
| IGK | 0.57 | 0.71 | 0.46 | 0.24 | 0.20 | 0.19 |
| IGL | 0.80 | 0.63 | 0.43 | 0.33 | 0.28 | 0.52 |
Average sequence entropies partitioned by CDR or framework region, excluding CDR3. Bolded values are those where the CDR4 average sequence entropy either compares to CDR1/CDR2, or exceeds the values for FR1, FR2, and FR3
HIV-1 bNAbs with insertions in L4 and/or H4
| Gene | PDB Chains | Length | DE loop sequence | Antibody/Germline |
|---|---|---|---|---|
| Heavy | 4toyH 4tvpD* 5cezD* 5fyjD* 5fykD* 5fylD* 5t3sD* 5u7oD* 5u7mD* 5um8D* 5utfD* 5utyD* 5v7jD* 5w6dD* 5wduH,M,U* 6ce0D* 6ch7D 6ch8D 6ch9D 6ck9D* 6de7D* 6ieqD 6mcoD 6mtjD 6mdtD 6mtnD 6mu6D 6mu7D 6mu8D 6mufD 6mugD 6nm6E 6nnfD 6nnjD | 16 | TDTEVPVTSFTSTGAA | 35O22 |
| Heavy | 4jb9H | 15 | RLFSQDLYYPDRGTA | VRC06 |
| Heavy | 3se8H 4cc8F,H,I 5jxaH 6cde5,Q,q* 6cdi8,Q,q* 6cue7,Q,q 6cuf8,Q,q 6e5pI,O,V 6mpg8,Q,q 6mphQ,f,g 6n1vQ,f,g 6n1w8,Q,q 6nf2C,N,V 6osy8,F,P 6ot1J,S,q 6v8zC,I,O | 15 | RQLSQDPDDPDWGVA | VRC03 |
| Heavy | 4s1qH | 15 | RQLSQDPDDPDWGIA | VRC01.H03 + 06.D-001739 |
| Heavy | 6nm6U | 15 | RQLSQDPDDPDWGIA | N6 FR3-03 |
| Heavy | 6nnfU | 15 | RQLSQDPDDPDWGTA | VRC01 FR303 |
| Heavy | 4xnzB,E,H | 15 | RQLSQDPDDPDWGVA | VRC06B |
| Heavy | 4p9hH 4p9mH 5a7xN,P,R 5a8hF,L,R. 5c7kE 5cjxA,D,H 5js9E 5jsaE 5thrP,R,T 5viyK,M,I. 5vj6M,O,Q 6cm3P,R,T 6eduP,R,T 6nqdC,G,K | 12 | AVDLTGSSPPIS | 8ANC195 |
| Heavy | 4jpvH 4lsvH 5v8lG,H,I 5v8mH,R,S | 12 | RHASWDFDTYSF | 3BNC117 |
| Heavy | 3rpiA,H 4gw4A,H | 12 | RQASWDFDTYSF | 3BNC60 |
| Lambda | 4jy6A,C | 9 | PDFRPGTTA | PGT123 |
| Lambda | 4fq2L 6ccbE,L 6ck9L* 5ceyA,C 5cezL* 5t3xL 5t3zL 5v7jL* 5w6dL* 6mtjL* 6mtnL* 6mu6L* 6mu7L* 6mu8L* 6mufL* 6mugL* 6nm6L* 6nnfL* 6nnjL* | 9 | PDINFGTRA | 10–1074 |
| Lambda | 4r26L 4r2gC,I,M 5t3sL* 5um8L* 6ce0L* 6ieqL* 6mcoL* 6mdtL* | 9 | PDINFGTTA | PGT124 |
| Lambda | 5cexC | 9 | PDSNFGTTA | 32H + 109L |
| Lambda | 4fq1L 4fqcL 4jy4A | 9 | PDSPFGTTA | PGT121 |
| Lambda | 3jcbB 3jccB 4jy5L 4ncoG,K,C 4tvpL* 5d9qL,E,M 5fyjL* 5fykL* 5fylL* 5i8hJ,L 5u7mL* 5u7oL* 5utfL* 5utyL* 5wduB,K,S* 6b0nL 6cdeN,8,n* 6cdiN,6,n* 6cue6,n,N* 6cuf6,n,N* 6de7L* 6nf2F,T,K* 6osyD,N,6* 6ot1R,I,n* | 9 | PGSTFGTTA | PGT122 |
The table lists all of the antibody structures in the PDB with insertions in L4 or H4 related to bNAbs along with their sequences, germline, and bNAb lineage. Entries with an asterisk (*) represent structures with insertions in both the light and heavy chains. There is only one other antibody with an inserted DE loop in the PDB: the engineered nanobody toward Higb2 toxin in cholera (PDBID 5mje, DE sequence RDSAEDSAKNTV). It is not listed in the Table. The amino acid sequences of the inserted DE loop in the PDB and the germline were aligned by locating the insertion in their DNA sequences.
Figure 8.Alignment of a subset of gp120 binding HIV-1 bNAbs representing all unique DE loop sequences
Figure 9.Buried surface area for each CDR at the antibody-antigen interface of HIV-1 bNAbs that bind to gp120 in the PDB
Figure 10.DE loop and DE loop adjacent insertions from a large antibody sequencing data set from HIV-infected individuals