| Literature DB >> 35832623 |
Jorge Moura-Sampaio1,2, André F Faustino1, Remi Boeuf3, Miguel A Antunes1, Stefan Ewert3, Ana P Batista1.
Abstract
Next-generation sequencing (NGS) is an indispensable tool in antibody discovery projects. However, the limits on NGS read length make it difficult to reconstruct full antibody sequences from the sequencing runs, especially if the six CDRs are randomized. To overcome that, we took advantage of Illumina's cluster mapping capabilities to pair non-overlapping reads and reconstruct full Fab sequences with accurate VL:VH pairings. The method relies on in silico cluster coordinate information, and not on extensive in vitro manipulation, making the protocol easily deployable and less prone to PCR-derived errors. This work maintains the throughput necessary for antibody discovery campaigns, and a high degree of fidelity, which potentiates not only phage-display and synthetic library-based discovery methods, but also the NGS-driven analysis of naïve and immune libraries.Entities:
Keywords: CDR; Diversity; Fab; Next-generation Sequencing; Phage-display; Randomization; Synthetic Libraries
Year: 2022 PMID: 35832623 PMCID: PMC9168528 DOI: 10.1016/j.csbj.2022.05.054
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Schematic representation of paired-end reading and sequence coordinate matching to retrieve correlated V. Read Generation: Big Fab amplicons comprise both VL and VH, with CL in between. In the case of scFv sequences, the total length of the amplicon decreases to about 900 bp. The R1 and R2 reads add up to a maximum of 600 bp (using Illumina’s MiSeq system), with R1 shorter than R2 in this case due to the bigger HCDR2 and HCDR3 loops than LCDRs. Read Matching: During a sequencing run, clusters are identified on the MiSeq flow cell images and their coordinates are stored on the first row of each information block on the FASTQ raw data files, as follows:
Fig. 2Intersecting sequences between concatenated and frequency-inferred datasets. Each of the 23 affinity maturation projects was analyzed and their VL and VH reads matched by the sequence coordinate matching method and by the frequency-based method. Independently analyzed R1 and R2 were combined to generate VL:VH pairs based on the frequency of each clone (inferred dataset – grey circles). The concatenated and frequency-based datasets were ordered by the occurrences of each sequence and, compared to discover identical sequences on the Top10, 25, 50 and 100, for each affinity maturation project (see also Tables S2 and S3).