Literature DB >> 29472918

Next-Generation Sequencing of Antibody Display Repertoires.

Romain Rouet¹, Katherine J L Jackson¹, David B Langley¹, Daniel Christ^1,2.

Abstract

In vitro selection technology has transformed the development of therapeutic monoclonal antibodies. Using methods such as phage, ribosome, and yeast display, high affinity binders can be selected from diverse repertoires. Here, we review strategies for the next-generation sequencing (NGS) of phage- and other antibody-display libraries, as well as NGS platforms and analysis tools. Moreover, we discuss recent examples relating to the use of NGS to assess library diversity, clonal enrichment, and affinity maturation.

Entities: Chemical Disease Gene Species

Keywords: antibody display technology; antibody libraries; antibody therapeutics; in vitro selection; next-generation sequencing; phage display

Mesh：

Substances：
Antibodies

Year: 2018 PMID： 29472918 PMCID： PMC5810246 DOI： 10.3389/fimmu.2018.00118

Source DB: PubMed Journal: Front Immunol ISSN： 1664-3224 Impact factor: 7.561

Introduction

The development of antibody display technology such as phage (1), ribosome (2), yeast (3), and mammalian display (4) has enabled the rapid selection of binders from diverse libraries. These technologies bypass the use of animals and allow for the enrichment of binders within days to weeks. The power of in vitro selection technologies relies on a direct physical link between phenotype (displayed antibody construct) and genotype (antibody variable domain genes), allowing for the identification of binders through sequencing of their encoding genes. Multiple rounds of selection are generally required to identify antigen-specific binders, either by binding to a solid support or through cellular sorting (5). In many cases, later rounds of selections tend to be dominated by a handful of clones, which are then further characterized for affinity. Such clonal dominance can reflect genuine selection for high antigen affinity but might also reflect other properties such as superior expression or display. Consequently, clones with superior affinity may be present at low frequency and may not be readily detectable using traditional screening methods such as ELISA (6). Recent advances in DNA sequencing technologies and computing power over the last decade has led to a dramatic reduction in the cost of sequencing and has simplified data analyses (7). Although initially developed for genomics applications, such as whole-genome sequencing, transcriptome sequencing, and epigenetics, next-generation sequencing (NGS) technology is now increasingly being applied other fields, including to basic and applied immunology. This includes the sequencing of the paired human heavy and light chain repertoire from isolated naïve (8, 9) and antigen-specific B-cells (10, 11), as well as T-cell receptor (12) and antibody display repertoires (13). While most NGS platforms were originally designed for short reads, technology is evolving rapidly, extending both read length and depth. Here, we review recent advances in NGS technology and key applications to phage display and other in vitro selection technologies.

Strategies for NGS of Antibody Repertoires

Traditionally, antibody display libraries are analyzed by isolation of 102–103 clones in combination with Sanger sequencing (5). Although this approach is sufficient to identify dominant clones after selection, or to broadly validate design objectives, the data obtained represent only a limited snapshot of actual library diversity. By contrast, NGS approaches allow for far-greater insights into library diversity by providing up to 107 sequences (approximately 10,000-fold more sequences than Sanger sequencing). One of the main challenges in the use of NGS for the analysis of antibody selection systems relates to the size of the encoded genes: the smallest antibody fragments (variable domains) range between 300 and 400 bp in length, while the commonly used scFv and Fab antibody fragment formats range from 700 to 800 bp to over 1,500 bp, respectively. While NGS technologies are particularly well suited for high throughput sequencing of short reads (less than 100 bp), many platforms can nevertheless sequence up to 300–400 bp with reasonable throughput. In particular, Illumina Miseq and Hiseq, 454 GS FLX (instrument discontinued), and Ion Torrent PMG are suited for this task (8, 14, 15); in addition, PacBio sequencing generates particularly long reads at the cost of reduced read numbers (Table 1) (16). Long sequences can also be generated by using paired-end reads: this method is particularly useful for scFv formats, enabling the sequencing of multiple CDR regions of VH and VL domains. In addition to analysis of longer antibody fragment sequences, some studies have focused on sequencing the relatively short VH CDR3 repertoire only (23) [which forms the center of the antigen binding site and is a major determinant of antigen binding (24)].

Table 1

Next-generation sequencing platforms for the analysis of display libraries.

Platform	Read length	Max. depth	Error type (percentage)	Reference
Illumina Miseq	300 bp PE	40 × 10⁶ reads	Substitutions (~0.1)	(15, 17)
Illumina Hiseq 2500	250 bp PE	600 × 10⁶ reads	Substitutions (~0.1)	(18, 19)
Ion Torrent PMG	400 bp	5.5 × 10⁶ reads	Indels (~1)	(14, 20)
454 GS FLX	Up to 1 kb	1 × 10⁶ reads	Indels (~1)	(13, 21)
PacBio	250 bp–40 kb	0.4 × 10⁶ reads	Indels (~1)	(22)

Next-generation sequencing platforms for the analysis of display libraries. The use of NGS requires particular attention be paid to sequencing errors (25). DNA amplification inevitably results in polymerase errors, which can be context dependent. Although the error rates of polymerases are generally low (10−5–10−6 per base), errors will inevitably be present in large NGS datasets that encompass billions of bases. In addition, the NGS technologies themselves can be susceptible to the introduction of errors, such as cluster misamplification and base misincorporation, with frequencies ranging from 10−2 (PacBio, Ion Torrent) to 10−3 (Illumina). To help identify PCR and sequencing errors, unique molecular identifiers (UMIs; stretches of 8–10 degenerate DNA bases) can be added to primers during the first two cycles of PCR amplification. Reads that share the same UMI have a high probability of being derived from the same original template. Such reads can then grouped after sequencing and used for error correction (26).

Bioinformatic Tools to Analyze NGS Data

While the analysis of the limited number of clones obtained by Sanger sequencing can be carried out manually, the larger sample size of NGS approaches necessitates the use bioinformatics tools. Following confirmation of the quality of the NGS read data by a tool such as FastQC (27), the data are further processed to clean up reads before analysis of antibody or antibody fragment sequences. The steps undertaken will be highly dependent on the NGS platform utilized and the format of the amplicons but generally will focus on the: removal of adapter sequences [e.g., PRINSEQ (28)], de-multiplexing (if barcodes were used), UMI identification and consensus building, and error correction (26), read quality trimming and filtering [e.g., Trimmomatic (29)] and, if paired-end sequencing was performed, the merging of the read pairs with a program such as PEAR (30). Separate analysis of heavy and light chains may be required for antibody formats such as scFv, where the presence of synthetic linkers can complicate analyses. Programs such as IMPre (31), IgBLAST (32), IMGT/High V-QUEST (33, 34), and ImmundiveRsity (35), which were originally developed for the analysis of B and T cell receptor repertoires, identify VH and VL germlines as well as VH and VL CDRs. The selection of a tool will depend on the number of NGS reads being analyzed and the computational skill level of the researcher. IgBLAST and IMGT/High V-QUEST are both available as web-based submission systems, with IMGT/High V-QUEST permitting a larger number of reads to be analysis per submission. IMGT/High V-QUEST returns an output format compatible with programs such as Microsoft Excel or OpenOffice, whereas IgBLAST output is text based. The tools use different alignment algorithms, BLAST (IgBLAST) and modified Smith–Waterman (V-QUEST), but both restrict the germline gene repertoires to those defined by the tool’s creators. A stand-alone version of IgBLAST is available, and it has no restriction on the number of input reads, permits the user-defined germline gene databases, provides additional output formats, and can be parallelized on a cluster for processing of large datasets; however, its use does require some command line basics. Postprocessing of the output of tools such as IgBLAST and IMGT/High V-QUEST is required to generate information about the clone structure within the dataset, and to pair VH and VL domain sequences. Clone structures can be inferred by applying sequence clustering tools, such as CD-hit (36) or UCLUST (37) to CDR3s alone, at either the amino acid or nucleotide sequence level, or to the full-length sequence, to group closely related sequences into “clonal” groups. The choice of parameters will depend on the diversity of the library. Finally, scripts can be used to analyze and summarize the diversity and other compositional characteristics of the library. A custom pipeline as described earlier requires a level of informatics skills not always available to researchers, therefore, specialized pipelines for the analysis of recombinant antibody libraries, either naïve or in vitro selected against particular antigens, have been developed. The AbMining Toolbox is particularly suited for identifying VH CDR3, which is determined by using a hidden Markov model (HMM) that captures the conserved sequences upstream and downstream of the CDR3 (38). N2GSAb can rapidly identify germline and VH CDR3 and provides a tool for clustering unique sequences (39). VDJFasta uses an HMM to accurately predict all VH and VL CDRs, as well as the GS linker sequence for scFv fragments, and can generate library diversity plots (13). The ImmuneDB package aligns sequences based on a query sequence, such as a framework region, to delineate CDR regions (40). ImmuneDB also performs mutational and statistical analysis on the sequence library and can construct lineage trees to aid in the interpretation of antigen-selected libraries. More recently, DEAL was developed to better predict library diversity by identifying and correcting sequencing errors (17). In the published example, the library was not generated by PCR but rather by ligation of adapters to avoid any amplification bias and focus on sequencing errors. Reads are clustered using seed sequences of 10–20 bp and analyzed by binary comparison. The clusters are then compared with each read, taking into account the Phred quality score for each base and the error rate of the Phi-X control to identify sequencing errors. A list of software is outlined in Table 2.

Table 2

Software for next-generation sequencing analysis.

Software package	Strengths	Reference
IMGT/High V-Quest	Fast germline identification, CDR determination, and batch submission	(34)
ImmunediveRsity	Quality filtering and noise correction	(35)
VDJFasta	Hidden Markov model to determine all CDRs and frequency analysis, very rapid analysis	(13)
N²GSAb	Rapid germline and V_H CDR3 determination and sequence clustering	(39)
ImmuneDB	Alignment based on sequence query to determine CDRs and frequency	(40)
DEAL	Sequencing error correction before analysis	(17)

Software for next-generation sequencing analysis. It is also possible to outsource the sequencing and/or analysis of antibody libraries to commercial suppliers. Examples include (but are not limited to) CD Genomics and Molecular Cloning Laboratories. Such companies offer a range of options from basic consulting on designing primers for multiplexing and sequencing, to complete analysis from purified DNA or phages.

Application to Design Validation and the Analyses of Naïve Antibody Libraries

When generating antibody display repertoires, either synthetic or derived from immunized animals, it is important to assess the clonal diversity of the library before selection. Several studies have demonstrated the use of NGS to measure diversity to validate the design of displayed libraries. In an early example, Novimmune designed scFv libraries with both synthetic diversity, using degenerated oligonucleotides, and semi-synthetic diversity, from human or rabbit donors (39). Sequencing of VH CDR3 using the Illumina platform revealed that the synthetic libraries had many more unique clones compared with donor-derived libraries, with between 1–16 and 31–69% clonal redundancy, respectively (39). Intriguingly, the extent of clonal redundancy in the donor-derived libraries suggested an upper limit of human VH diversity of around 2–3 × 106 unique clones. This figure correlates with other NGS studies aimed at determining human B-cell diversity (3–9 × 106) (41). In addition, the Novimmune sequencing results also validated the VH CDR3 length distribution in human antibodies, which closely matched that of the IMGT repertoire (39). A second example, relates to the Ylanthia synthetic antibody library developed by MorphoSys (21). The library was designed to encode a range of VH CDR3 lengths to closely match the natural human antibody repertoire and was analyzed by the Roche 454 sequencing platform (21). The library was found to be composed of about 95% unique clones, and there was no indication of amplification biases during antibody library construction. In addition, the authors used NGS data to validate VH CDR3 diversity and length, as well as VH and VL germline frequencies. High throughput sequencing approaches are not limited to human sequences, with a recent study assessing the diversity of rabbit (VH and VL) Fab libraries by NGS (Ion PGM) (20). Surprisingly, and unlike human libraries derived from donors, these studies detected very low levels of redundancy within the rabbit libraries, with over 98% of VH clones being of a unique nature (~3 × 109 sequence reads were analyzed). Next-generation sequencing has also been used to accurately determine library size. A recent study generated a donor-derived VH library for this purpose, which was then sequenced using Illumina adapter ligation (circumventing the need for PCR amplification) (17). Sequencing depth for the VH library exceeded the library size by three-fold suggesting that the diversity was well represented in the NGS output. The authors estimated the minimal functional diversity to be 1.2 × 106 individual unique clones representing just one-fifth of the original number of bacterial clones.

Application to Affinity Maturation and Epitope Mapping

Next-generation sequencing can also be used to guide selection toward high affinity clones. For example, one seminal study employed NGS to guide maturation of an scFv fragment directed against ErbB2 to a final affinity of 25 pM (resulting in a 158-fold improvement over wild type) (18). Guided by structure-based design, individual CDR regions (excluding VL CDR2) were randomized, selected against ErbB2 antigen, and analyzed by NGS before and after panning. This revealed enrichment of novel sequence motifs at diversified CDR positions, with the exception of VH CDR3, which was enriched toward the wild-type motif (suggesting an already optimal sequence). Next, the most frequent CDR substitutions were combined to generate a secondary library (VH CDR3 being reverted to wild type), which was selected against the target. This resulted in improved affinities of between 300 and 25 pM, compared with the wild-type affinity of 4 nM, highlighting the power of this stepwise approach for affinity maturation. In a further study, deep mutational scanning analysis using NGS was performed on a humanized version of the anti-EGFR monoclonal cetuximab (42). More specifically, independent VH and VL libraries (encoding over 1,000 single amino acid substitutions at 59 different positions—32 in VH and 27 in VL) were selected by mammalian cell display and flow cytometry. Gated populations were analyzed by NGS to identify permissive mutations and to generate a heat map of the antigen binding site. Overall, this strategy identified 67 substitutions that increased affinity, including one mutation with a five-fold KD improvement. Similar strategies can also be used to map epitope surfaces, as exemplified by the interaction of S. aureus toxin with neutralizing antibodies (43).

Conclusion

Next-generation sequencing holds great promise for the development of therapeutic monoclonal antibodies, by allowing unprecedented insights into library diversity and clonal enrichment. Although current NGS platforms were not designed with antibody libraries in mind, the technologies are now at a stage where unique sequence insights into all stages of the selection process can be obtained. Moreover, with ongoing advances in sequencing technology, depth and read length is improving continuously: for instance, the PacBio Sequel system generates approximately seven times more sequences than the previous RS II system but maintains its long-read capability (Pacific Biosciences), while nanopore systems such as the MinIOn (Oxford Nanopore) offer the promise of real-time DNA sequencing in combination with ultra-long reads. We conclude that, with NGS technology evolving at a rapid pace, its importance in the sequence analyses of phage- and other antibody-display libraries is likely to continue to increase.

Author Contributions

RR wrote the manuscript. KJ, DL, and DC edited the manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

42 in total

1. Construction of a rationally designed antibody platform for sequencing-assisted selection.

Authors: H Benjamin Larman; George Jing Xu; Natalya N Pavlova; Stephen J Elledge
Journal: Proc Natl Acad Sci U S A Date: 2012-10-11 Impact factor: 11.205

2. A fully synthetic human Fab antibody library based on fixed VH/VL framework pairings with favorable biophysical properties.

Authors: Thomas Tiller; Ingrid Schuster; Dorothée Deppe; Katja Siegers; Ralf Strohner; Tanja Herrmann; Marion Berenguer; Dominique Poujol; Jennifer Stehle; Yvonne Stark; Martin Heßling; Daniela Daubert; Karin Felderer; Stefan Kaden; Johanna Kölln; Markus Enzelberger; Stefanie Urlinger
Journal: MAbs Date: 2013-04-09 Impact factor: 5.857

Review 3. Coming of age: ten years of next-generation sequencing technologies.

Authors: Sara Goodwin; John D McPherson; W Richard McCombie
Journal: Nat Rev Genet Date: 2016-05-17 Impact factor: 53.242

4. Mining Naïve Rabbit Antibody Repertoires by Phage Display for Monoclonal Antibodies of Therapeutic Utility.

Authors: Haiyong Peng; Thomas Nerreter; Jing Chang; Junpeng Qi; Xiuling Li; Pabalu Karunadharma; Gustavo J Martinez; Mohammad Fallahi; Jo Soden; Jim Freeth; Roger R Beerli; Ulf Grawunder; Michael Hudecek; Christoph Rader
Journal: J Mol Biol Date: 2017-08-14 Impact factor: 5.469

5. High-throughput sequencing of the zebrafish antibody repertoire.

Authors: Joshua A Weinstein; Ning Jiang; Richard A White; Daniel S Fisher; Stephen R Quake
Journal: Science Date: 2009-05-08 Impact factor: 47.728

6. Developmental pathway for potent V1V2-directed HIV-neutralizing antibodies.

Authors: Nicole A Doria-Rose; Chaim A Schramm; Jason Gorman; Penny L Moore; Jinal N Bhiman; Brandon J DeKosky; Michael J Ernandes; Ivelin S Georgiev; Helen J Kim; Marie Pancera; Ryan P Staupe; Han R Altae-Tran; Robert T Bailer; Ema T Crooks; Albert Cupo; Aliaksandr Druz; Nigel J Garrett; Kam H Hoi; Rui Kong; Mark K Louder; Nancy S Longo; Krisha McKee; Molati Nonyane; Sijy O'Dell; Ryan S Roark; Rebecca S Rudicell; Stephen D Schmidt; Daniel J Sheward; Cinque Soto; Constantinos Kurt Wibmer; Yongping Yang; Zhenhai Zhang; James C Mullikin; James M Binley; Rogier W Sanders; Ian A Wilson; John P Moore; Andrew B Ward; George Georgiou; Carolyn Williamson; Salim S Abdool Karim; Lynn Morris; Peter D Kwong; Lawrence Shapiro; John R Mascola
Journal: Nature Date: 2014-03-02 Impact factor: 49.962

7. Precise and efficient antibody epitope determination through library design, yeast display and next-generation sequencing.

Authors: Thomas Van Blarcom; Andrea Rossi; Davide Foletti; Purnima Sundar; Steven Pitts; Christine Bee; Jody Melton Witt; Zea Melton; Adela Hasa-Moreno; Lee Shaughnessy; Dilduz Telman; Lora Zhao; Wai Ling Cheung; Jan Berka; Wenwu Zhai; Pavel Strop; Javier Chaparro-Riggers; David L Shelton; Jaume Pons; Arvind Rajpal
Journal: J Mol Biol Date: 2014-10-02 Impact factor: 5.469

8. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy.

Authors: Tom Smith; Andreas Heger; Ian Sudbery
Journal: Genome Res Date: 2017-01-18 Impact factor: 9.043

9. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing.

Authors: Charles M Forsyth; Veronica Juan; Yoshiko Akamatsu; Robert B DuBridge; Minhtam Doan; Alexander V Ivanov; Zhiyuan Ma; Dixie Polakoff; Jennifer Razo; Keith Wilson; David B Powers
Journal: MAbs Date: 2013-05-29 Impact factor: 5.857

10. PEAR: a fast and accurate Illumina Paired-End reAd mergeR.

Authors: Jiajie Zhang; Kassian Kobert; Tomáš Flouri; Alexandros Stamatakis
Journal: Bioinformatics Date: 2013-10-18 Impact factor: 6.937

28 in total

1. High-throughput retrieval of physical DNA for NGS-identifiable clones in phage display library.

Authors: Jinsung Noh; Okju Kim; Yushin Jung; Haejun Han; Jung-Eun Kim; Soohyun Kim; Sanghyub Lee; Jaeseong Park; Rae Hyuck Jung; Sang Il Kim; Jaejun Park; Jerome Han; Hyunho Lee; Duck Kyun Yoo; Amos C Lee; Euijin Kwon; Taehoon Ryu; Junho Chung; Sunghoon Kwon
Journal: MAbs Date: 2019-02-12 Impact factor: 5.857

Review 2. Nucleic Acid-Barcoding Technologies: Converting DNA Sequencing into a Broad-Spectrum Molecular Counter.

Authors: Glen Liszczak; Tom W Muir
Journal: Angew Chem Int Ed Engl Date: 2019-02-06 Impact factor: 15.336

3. Reconstruction of full antibody sequences in NGS datasets and accurate V_L:V_H coupling by cluster coordinate matching of non-overlapping reads.

Authors: Jorge Moura-Sampaio; André F Faustino; Remi Boeuf; Miguel A Antunes; Stefan Ewert; Ana P Batista
Journal: Comput Struct Biotechnol J Date: 2022-05-31 Impact factor: 6.155

4. Yeast Surface Display: New Opportunities for a Time-Tested Protein Engineering System.

Authors: Maryam Raeeszadeh-Sarmazdeh; Eric T Boder
Journal: Methods Mol Biol Date: 2022

Review 5. Ancient species offers contemporary therapeutics: an update on shark V_NAR single domain antibody sequences, phage libraries and potential clinical applications.

Authors: Hejiao English; Jessica Hong; Mitchell Ho
Journal: Antib Ther Date: 2020-01-21

6. Quantifying the nativeness of antibody sequences using long short-term memory networks.

Authors: Andrew M Wollacott; Chonghua Xue; Qiuyuan Qin; June Hua; Tanggis Bohnuud; Karthik Viswanathan; Vijaya B Kolachalama
Journal: Protein Eng Des Sel Date: 2019-12-31 Impact factor: 1.650

7. Multifaceted antibodies development against synthetic α-dystroglycan mucin glycopeptide as promising tools for dystroglycanopathies diagnostic.

Authors: Thais Canassa-DeLeo; Vanessa Leiria Campo; Lílian Cataldi Rodrigues; Marcelo Fiori Marchiori; Carlos Fuzo; Marcelo Macedo Brigido; Annamaria Sandomenico; Menotti Ruvo; Andrea Queiroz Maranhão; Marcelo Dias-Baruffi
Journal: Glycoconj J Date: 2019-12-10 Impact factor: 2.916

Review 8. Current state of in vivo panning technologies: Designing specificity and affinity into the future of drug targeting.

Authors: Heather H Gustafson; Audrey Olshefsky; Meilyn Sylvestre; Drew L Sellers; Suzie H Pun
Journal: Adv Drug Deliv Rev Date: 2018-06-28 Impact factor: 15.470

9. High-Throughput Sequencing of Phage Display Libraries Reveals Parasitic Enrichment of Indel Mutants Caused by Amplification Bias.

Authors: Sander Plessers; Vincent Van Deuren; Rob Lavigne; Johan Robben
Journal: Int J Mol Sci Date: 2021-05-24 Impact factor: 5.923

Review 10. Development of Therapeutic Antibodies and Modulating the Characteristics of Therapeutic Antibodies to Maximize the Therapeutic Efficacy.

Authors: Seung Hyun Kang; Chang-Han Lee
Journal: Biotechnol Bioprocess Eng Date: 2021-06-28 Impact factor: 2.836