| Literature DB >> 29326697 |
Audrey Hemadou1, Véronique Giudicelli2, Melissa Laird Smith3, Marie-Paule Lefranc2, Patrice Duroux2, Sofia Kossida2, Cheryl Heiner3, N Lance Hepler3, John Kuijpers3, Alexis Groppi4, Jonas Korlach3, Philippe Mondon5, Florence Ottones1, Marie-Josée Jacobin-Valat1, Jeanny Laroche-Traineau1, Gisèle Clofent-Sanchez1.
Abstract
Phage-display selection of immunoglobulin (IG) or antibody single chain Fragment variable (scFv) from combinatorial libraries is widely used for identifying new antibodies for novel targets. Next-generation sequencing (NGS) has recently emerged as a new method for the high throughput characterization of IG and T cell receptor (TR) immune repertoires both in vivo and in vitro. However, challenges remain for the NGS sequencing of scFv from combinatorial libraries owing to the scFv length (>800 bp) and the presence of two variable domains [variable heavy (VH) and variable light (VL) for IG] associated by a peptide linker in a single chain. Here, we show that single-molecule real-time (SMRT) sequencing with the Pacific Biosciences RS II platform allows for the generation of full-length scFv reads obtained from an in vivo selection of scFv-phages in an animal model of atherosclerosis. We first amplified the DNA of the phagemid inserts from scFv-phages eluted from an aortic section at the third round of the in vivo selection. From this amplified DNA, 450,558 reads were obtained from 15 SMRT cells. Highly accurate circular consensus sequences from these reads were generated, filtered by quality and then analyzed by IMGT/HighV-QUEST with the functionality for scFv. Full-length scFv were identified and characterized in 348,659 reads. Full-length scFv sequencing is an absolute requirement for analyzing the associated VH and VL domains enriched during the in vivo panning rounds. In order to further validate the ability of SMRT sequencing to provide high quality, full-length scFv sequences, we tracked the reads of an scFv-phage clone P3 previously identified by biological assays and Sanger sequencing. Sixty P3 reads showed 100% identity with the full-length scFv of 767 bp, 53 of them covering the whole insert of 977 bp, which encompassed the primer sequences. The remaining seven reads were identical over a shortened length of 939 bp that excludes the vicinity of primers at both ends. Interestingly these reads were obtained from each of the 15 SMRT cells. Thus, the SMRT sequencing method and the IMGT/HighV-QUEST functionality for scFv provides a straightforward protocol for characterization of full-length scFv from combinatorial phage libraries.Entities:
Keywords: IMGT/HighV-QUEST; Pacific Biosciences sequencing; human antibody; immunoglobulin; immunoinformatics; next-generation sequencing; phage combinatorial library; single chain fragment variable
Year: 2017 PMID: 29326697 PMCID: PMC5742356 DOI: 10.3389/fimmu.2017.01796
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Human single chain fragment variable-phagemid combinatorial library construction (A) and in vivo phage display selection (B).
Figure 2Obtaining recombinant single chain fragment variable-phagemid bacteria from the AAR3 fraction, storage for next-generation sequencing, quantification, and picking of individual clones.
Figure 3Primer design (A) and quality control presequencing (B) and postsequencing (C). (A) Primers designed on the phagemid vector and used for single chain fragment variable (scFv) PCR amplification. The scFv (VH-LINKER-VL) length range is between ~720 and ~800 bp [variable heavy (VH) between ~350 and ~400 bp and variable light (VL) between ~320 and ~350 bp]. The linker is 53 bp including the EcoRI and XbaI sites. The PCR products are expected to be ~1,000 bp on average, including the 5′ and 3′ region and the primers. (B) Agarose gel electrophoresis of PCR products. The DNA was amplified from the AAR3 fraction and PCR products were analyzed on 1.2% (w/v) agarose gel. The band at ~1,000 bp corresponds to the expected size for scFv amplicons. S1, S2, S3, and S4 correspond to the samples 1, 2, 3, and 4, respectively. The Bioanalyzer trace of the four samples shows the purity of amplicons with a high-quality single peak. (C) Pacific Biosciences RS II CCS2 read length distribution using P6-C4 chemistry for 1 SMRT cell (similar results were obtained for the 15 SMRT cells). Data are based on a 1-kb size-selected scFv library using a 6 h movie.
IMGT/HighV-QUEST analysis of the scFv Pacific Biosciences (PacBio) reads.
| PCR sample no. | Number of reads with 99.9% predicted accuracy | Mean number of passes | Number of movies | SMRT cell no. | Number of PacBio CCS2 analyzed reads | Number of scFv candidates in analyzed reads | % of scFv candidates in analyzed reads, i.e., coverage | Number of filtered-in reads | % of scFv in filtered-in reads, i.e., coverage | % of scFv in analyzed reads, i.e., coverage |
|---|---|---|---|---|---|---|---|---|---|---|
| s1 | 91,828 | 24 | 3 | 1 | 29,224 | 25,419 | 86.98 | 22,906 | 90.11 | 78.38 |
| 2 | 32,240 | 28,120 | 87.22 | 25,228 | 89.72 | 78.25 | ||||
| 3 | 30,364 | 26,496 | 87.26 | 23,799 | 89.82 | 78.38 | ||||
| s2 | 129,640 | 23 | 4 | 4 | 34,082 | 29,729 | 87.23 | 26,657 | 89.67 | 78.21 |
| 5 | 33,510 | 29,213 | 87.18 | 26,407 | 90.39 | 78.80 | ||||
| 6 | 31,980 | 27,874 | 87.16 | 25,032 | 89.80 | 78.27 | ||||
| 7 | 30,068 | 26,289 | 87.43 | 23,695 | 90.13 | 78.80 | ||||
| s3 | 115,446 | 24 | 4 | 8 | 34,890 | 30,183 | 86.51 | 26,990 | 89.42 | 77.36 |
| 9 | 29,373 | 25,314 | 86.18 | 22,468 | 88.76 | 76.49 | ||||
| 10 | 26,465 | 22,776 | 86.06 | 20,044 | 88.00 | 75.74 | ||||
| 11 | 24,718 | 21,358 | 86.41 | 18,741 | 87.75 | 75.82 | ||||
| s4 | 113,644 | 24 | 4 | 12 | 25,128 | 21,881 | 87.08 | 19,120 | 87.38 | 76.09 |
| 13 | 23,693 | 20,515 | 86.59 | 17,756 | 86.55 | 74.94 | ||||
| 14 | 32,293 | 28,093 | 86.99 | 24,762 | 88.14 | 76.68 | ||||
| 15 | 32,530 | 28,395 | 87.29 | 25,054 | 88.23 | 77.02 | ||||
| Total | 450,558 | 450,558 | 391,655 | 86.93 | 348,659 | 89.02 | 77.38 | |||
Each CCS read counts as “1× coverage” over the scFv molecules of interest. The coverage is given in percentage of the scFv of interest (analyzed reads, filtered-in reads and analyzed reads).
Pacific Biosciences (PacBio) reads 100% identical to the aligned P3 Sanger sequence and 100% identical between them on 977 bp (53 reads) or 939 bp (7 reads).
| PCR sample no. | Number of P3 PacBio reads per PCR sample | SMRT cell no. | Number of P3 PacBio reads per SMRT cell | 100% on 977 bp (53 reads) | 100% on 939 bp (7 reads) | GenBank/ENA/DDBJ accession number |
|---|---|---|---|---|---|---|
| s1 | 15 | 1 | 6 | 1, 2, 3, 4, 6, 8 | MG272208 | |
| 2 | 5 | 10, 30 | 11, 12, 13 | |||
| 3 | 4 | 16, 17, 19 | 15 | |||
| s2 | 14 | 4 | 2 | 32, 33 | ||
| 5 | 2 | 36 | 35 | |||
| 6 | 3 | 37, 38, 39 | ||||
| 7 | 7 | 20, 41, 44, 45, 46, 47, 48 | ||||
| s3 | 15 | 8 | 6 | 49, 50, 51, 52, 53, 54 | ||
| 9 | 4 | 56, 57, 59, 60 | ||||
| 10 | 4 | 63, 65, 66 | 64 | |||
| 11 | 1 | 68 | ||||
| s4 | 16 | 12 | 4 | 70, 72, 73 | 71 | |
| 13 | 2 | 75, 76 | ||||
| 14 | 3 | 21, 80, 83 | ||||
| 15 | 7 | 23, 24, 25, 26, 27, 28, 29 | ||||
| Total | 60 | 60 | 53 | 7 | ||
Mutations observed at the 5′ and 3′ end of the seven Pacific Biosciences (PacBio) reads with 100% identity on 939 bp (positions 3–941).
| PacBio read no. (assigned in the list 1–85) | PCR sample no. | SMRT cell no. | Mutation description | Mutation localization | GenBank/ENA/DDBJ accession number |
|---|---|---|---|---|---|
| 13 | s1 | 2 | One 1 nt-deletion (g2 > del) | 5′ end of the 5′ primer | MG272209 |
| 11 | s1 | 2 | Two 1 nt-deletion (t975 > del, a977 > del) | 3′ end of the 3′ primer | MG272210 |
| 15 | s1 | 3 | One 1 nt-substitution (c956 > t) | Vicinity of the 3′ primer | MG272211 |
| 12 | s1 | 2 | One 1 nt-deletion (a942 > del) | Vicinity of the 3′ primer | MG272212 |
| 35 | s2 | 5 | |||
| 71 | s4 | 12 | |||
| 64 | s3 | 10 | Two 1 nt-deletion (a942 > del), | Vicinity of the 3′ primer, | MG272213 |
Positions of the primers are 1–23 and 958–977.
.
.
Mutation heterogeneity observed in 25 P3-related PacBio reads, in contrast with the 60 P3 identical PacBio reads with 100% identity on the complete scFv.
| Read categories | Pacific Biosciences (PacBio) read no. (assigned in the list 1–85) | PCR sample no. | SMRT cell No. | Number of reads/mutation type | Mutation type | Mutation description | GenBank/ENA/DDBJ accession number |
|---|---|---|---|---|---|---|---|
| A (15 reads) | 7, 18, 43, 67, 69, 79 | 1, 2, 3, 4 | 1, 3, 7, 11, 12, 14 | 6 | Four 1 nt-substitution | a545 > g (VL), g686 > a (VL), a757 > g (VL), c838 > g (VL) | MG272218 |
| 58 | 3 | 9 | 1 | Four 1 nt-substitution with, in 3′, a large deletion | a545 > g (VL), g686 > a (VL), a757 > g (VL), c838 > g (VL), a886-a977 > del (92 nt) | MG272219 | |
| 61 | 3 | 9 | 2 | Four 1 nt-substitution | c741 > t (VL), g837 > a (VL), c838 > g (VL), g843 > t (VL) | MG272220 | |
| 74 | 4 | 13 | Four 1 nt-substitution | c741 > t (VL), g837 > a (VL), c838 > g (VL), g843 > t (VL), | MG272221 | ||
| 5, 14, 31, 34, 42, 82 | 1, 2, 4 | 1, 2, 4, 7, 14 | 6 | Two 1 nt-substitution | c720 > t (VL), t744 > c (VL) | MG272223 | |
| B (10 reads) | 85 | 2 | 4 | 2 | One 1 nt-deletion | c242 > del (VH) | MG272216 |
| 40 | 2 | 6 | One 1 nt-deletion | g600 > del (VL) | MG272217 | ||
| 77 | 4 | 13 | 6 | One 1 nt-substitution | g495 > a (linker) | MG272227 | |
| 22 | 4 | 14 | One 1 nt-substitution | t624 > c (VL) | MG272224 | ||
| 55 | 3 | 9 | One 1 nt-substitution | a627 > g (VL) | MG272225 | ||
| 62 | 3 | 10 | One 1 nt-substitution | g736 > a (VL) | MG272226 | ||
| 78 | 4 | 13 | One 1 nt-substitution | t599 > g (VL), | MG272228 | ||
| 81 | 4 | 14 | One 1 nt-substitution | MG272222 | |||
| 84 | 2 | 5 | 1 | Two 1 nt-insertion | 209^210 > ins^a (VH), 762^763 > ins^t (VL) | MG272214 | |
| 9 | 1 | 2 | 1 | One 1 nt-substitution in VH, one 2 nt-insertion + two 1 nt-substitution in VL | c322 > t (VH), 658^659 > ins^cc (VL), c659 > t (VL), t660 > a (VL) | MG272215 | |
| Total: 25 | Total: 25 | ||||||
Positions of the primers are 1–23 and 958–977. Category A: 15 P3-related reads of potential biological interest (mutations due to the VL diversity originating from the combinatorial library). Pink, green, and blue colors highlight groups of reads with in common identical substitution mutations. Category B: 10 P3-related reads with undefined origin of the mutations.
.