| Literature DB >> 20846958 |
U Ravn1, F Gueneau, L Baerlocher, M Osteras, M Desmurs, P Malinge, G Magistrelli, L Farinelli, M H Kosco-Vilbois, N Fischer.
Abstract
In recent years, unprecedented DNA sequencing capacity provided by next generation sequencing (NGS) has revolutionized genomic research. Combining the Illumina sequencing platform and a scFv library designed to confine diversity to both CDR3, >1.9 × 10(7) sequences have been generated. This approach allowed for in depth analysis of the library's diversity, provided sequence information on virtually all scFv during selection for binding to two targets and a global view of these enrichment processes. Using the most frequent heavy chain CDR3 sequences, primers were designed to rescue scFv from the third selection round. Identification, based on sequence frequency, retrieved the most potent scFv and valuable candidates that were missed using classical in vitro screening. Thus, by combining NGS with display technologies, laborious and time consuming upfront screening can be by-passed or complemented and valuable insights into the selection process can be obtained to improve library design and understanding of antibody repertoires.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20846958 PMCID: PMC2995085 DOI: 10.1093/nar/gkq789
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic representation of immunoglobulin heavy and light chain variable regions and CDR3 diversification strategy. (A) Framework regions (FR1 to FR4) and CDR regions (H1 to H3 and L1 to L3) are indicated. Stars indicate the location of Type IIS restriction sites in the stuffer fragment located between FR3 and FR4 and in the flanking regions of the diversified VHCDR3 and L3. The sizes of the designed VHCDR3 and L3 are 9–15 and 8–11 amino acids, respectively. The arrows indicate the location of the primers used for next generation sequencing and the dashed arrows indicate the region covered by the 76 bp reads. The heavy chain primer, which is located in the FR4 region, permits partial sequencing of the FR3 region (VHFR4-CDR3-FR3). In contrast, as the light chain primer is located downstream of the FR4 region, the sequencing read does not cover the full CDR3 sequence (VLCDR3-FR4). (B) ScFv rescue strategy based on VHCDR3 sequences. Arrows indicate complementary primers corresponding to a VHCDR3 sequence as well as primers flanking the scFv coding region. These primers were used to amplify and assemble the scFv sequence from the pool of clones following the third selection round.
Summary of NGS results for the heavy and light chains
| AE1 library (%) | Round 1 (%) | Round 2 (%) | Round 3 (%) | |
|---|---|---|---|---|
| Selection on 5E3 | ||||
| VH(FR3-CDR3-FR4) | ||||
| Total number of sequences | 5 078 705 | 352 778 | 561 296 | 642 878 |
| Unique sequences | 5 007 022 (99) | 124 909 ( | 130 382 ( | 105 011 ( |
| Single occurrence sequences | 4 938 237 (99) | 89 523 (72) | 103 009 (79) | 82 525 (79) |
| Repeated sequences | 68 785 ( | 35 386 ( | 27 373 ( | 22 486 ( |
| Highest frequency | 42 (0) | 1439 (0.4) | 9870 ( | 37 017 ( |
| Identified VH family | 4 680 882 (92) | 332 201 (94) | 538 706 (96) | 619 707 (96) |
| In frame inserts | 4 237 321 (91) | 322 273 (97) | 527 932 (98) | 612 130 (99) |
| VL(CDR3-FR4) | ||||
| Total number of sequences | 4 412 636 | 1 531 261 | 1 302 154 | 1 649 977 |
| Unique sequences | 3 612 120 (82) | 733 282 (48) | 493 490 ( | 576 062 ( |
| Identified VL group | 4 051 197 (92) | 1 470 871 (96) | 1 269 674 (98) | 1 597 046 (97) |
| Selection on IFNγ | ||||
| VH(FR3-CDR3-FR4) | ||||
| Total number of sequences | 1 176 998 | 1 484 395 | 1 247 375 | |
| Unique sequences | 1 075 049 (91) | 167 900 ( | 67 936 ( | |
| Single occurrence sequences | 1 008 532 (91) | 109 445 (65) | 50 841 (75) | |
| Repeated sequences | 66 517 ( | 58 455 ( | 17 095 ( | |
| Highest frequency | 13 139 ( | 19 571 ( | 112 452 ( | |
| Identified VH family | 1 133 917 (96) | 1 426 351 (96) | 1 234 160 (99) | |
| In frame inserts | 1 039 842 (92) | 1 314 621 (92) | 1 233 561 (100) |
aSingle occurrence sequences: number of sequences occurring a single time in the set of unique sequences.
bRepeated sequences: number of sequence occurring a multiple times in the set of unique sequences.
cIdentified VH family: number of single occurrence sequences for which FR4 information allowed unambiguous VH family identification.
dIn frame inserts: number of sequences with an identified VH family that contain an in frame insertion.
eIdentified VL group: number of sequence that could be unambiguously identified as κ or λ based on FR4 sequence.
Figure 2.Germline gene family analysis. (A) Frequency of heavy chain variable gene families identified and (B) proportion of V κ and V λ light chains in the AE1 library and after each selection round (R1–R3) against the target, 5E3. (C) Frequency of heavy chain variable gene families identified in the AE1 library and after each selection round (R1–R3) against the target, hIFNγ. Sequences were considered as undetermined if they did not match exactly the signature sequence used for family assignment.
Figure 3.Frequency of VHCDR3 lengths and distribution within the three VH families included in the AE1 library and after each selection round against different targets. VHCDR3 lengths are expressed as amino acids. VHCDR3 lengths of 9–15 amino acids were included in the library design. Bars appearing between these defined lengths correspond to non-functional VHCDR3 that are out of frame due to errors in oligonuclotide synthesis or cloning artifacts. (A) library, (C–D) selection rounds against 5E3, (E–G) selection rounds against hIFNγ.
Figure 4.Frequency and evolution of top 10 sequences. Frequency of the 10 VHCDR3 sequences that were the most abundant after the third selection round against 5E3 (A) and against hIFNγ. (B) Amino acid sequence, length of the VHCDR3 according to the IMGT nomenclature and VH family are shown. Sequences that were also identified during screening by ELISA as well as clones that were rescued based on their VHCDR3 sequence and frequency are indicated.
Figure 5.Frequency and evolution of CDR3 sequences identified by classical binding screening against the target 5E3. Frequency evolution of VHFR4-CDR3-FR3 (A) and VLCDR3-FR4 (B) sequences corresponding to six scFv binding specifically to the monoclonal 5E3. The EC50 for binding of the soluble scFv to the target in ELISA and the frequency ranking after the third round of selection are indicated below each clone.
Binding experiments for scFv displayed on the surface of phage or expressed in different soluble formats
| scFv | Phage | scFv Supernatant | scFv Periplasmic fraction | Purified scFv EC50 (nM) | scFv-yield (mg/l) |
|---|---|---|---|---|---|
| 5E3R-1 | + | + | + | 1.9 | 5 |
| 5E3R-2 | + | – | − | − | 0.25 |
| 5E3R-3 | + | + | + | 2 | 2.2 |
| 5E3R-4 | + | − | − | 2.3 | 0.34 |
| 5E3R-5 | + | − | − | 140 | 0.21 |
| 5E3R-6 | + | + | + | 3.6 | 0.4 |
| IFNR-1 | + | − | − | 37.9 | 5 |
| IFNR-2 | + | − | − | 41.0 | 13.4 |
| IFNR-3 | + | + | + | 1.3 | 1.7 |
| IFNR-4 | + | + | + | 0.16 | 10.6 |
| IFNR-5 | + | + | + | 1.4 | 0.2 |
| IFNR-6 | + | + | + | 0.75 | 6.3 |
| IFNR-7 | + | + | + | 0.36 | 2.9 |
| IFNR-8 | + | − | − | 139.3 | 16 |
| IFNR-9 | + | − | − | 4.7 | 7.5 |
| IFNR-10 | + | + | + | 0.1 | 23.8 |