Literature DB >> 34805683

uPIC-M: Efficient and Scalable Preparation of Clonal Single Mutant Libraries for High-Throughput Protein Biochemistry.

Mason J Appel1, Scott A Longwell2, Maurizio Morri3, Norma Neff3, Daniel Herschlag1, Polly M Fordyce2,3,4,5.   

Abstract

New high-throughput biochemistry techniques complement selection-based approaches and provide quantitative kinetic and thermodynamic data for thousands of protein variants in parallel. With these advances, library generation rather than data collection has become rate-limiting. Unlike pooled selection approaches, high-throughput biochemistry requires mutant libraries in which individual sequences are rationally designed, efficiently recovered, sequence-validated, and separated from one another, but current strategies are unable to produce these libraries at the needed scale and specificity at reasonable cost. Here, we present a scalable, rapid, and inexpensive approach for creating User-designed Physically Isolated Clonal-Mutant (uPIC-M) libraries that utilizes recent advances in oligo synthesis, high-throughput sample preparation, and next-generation sequencing. To demonstrate uPIC-M, we created a scanning mutant library of SpAP, a 541 amino acid alkaline phosphatase, and recovered 94% of desired mutants in a single iteration. uPIC-M uses commonly available equipment and freely downloadable custom software and can produce a 5000 mutant library at 1/3 the cost and 1/5 the time of traditional techniques.
© 2021 The Authors. Published by American Chemical Society.

Entities:  

Year:  2021        PMID: 34805683      PMCID: PMC8600632          DOI: 10.1021/acsomega.1c04180

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Recent technological advances enable the biochemical interrogation of many protein variants in parallel with the precision and versatility needed to dissect mechanisms of function. These techniques, termed broadly here as high-throughput biochemistry (HTB), report quantitative kinetic and thermodynamic measurements for thousands of individual protein sequences. This advance is made possible by developments in programmable automated liquid handling that increase the scale of plate-based assays,[1,2] and recently, by a miniaturized microfluidic platform that allows parallel measurement of thousands of variants on one microscope slide.[3−5] For basic enzymology and biophysics studies, HTB approaches using mutational scanning libraries allow identification of the effects of all residues on folding, stability, binding, and catalysis. To advance precision medicine, libraries comprising human allelic variants[6] can be assayed for folding and function and “variants of uncertain significance” can be classified by their biophysical propensity to drive disease or respond to therapeutics.[4,7] Finally, within evolutionary biology, measurements of many extant orthologs and ancestral reconstructions can elucidate the molecular underpinnings of evolutionary adaptation (Figure A).[8−11] Each of these applications requires moving beyond simply identifying mutants with desired properties from large-scale screens to directly linking each of many sequence perturbations with its functional effects.
Figure 1

Overview of the uPIC–M pipeline to generate user-defined clonal mutant libraries. (A) Examples of clonal libraries from uPIC–M and potential high-throughput biochemistry applications. Applications are listed along with examples of the types of variants involved. (B) Comparison of cost (including materials and labor) of conventional mutagenesis vs uPIC–M for libraries of 50–20,000 mutants. A uPIC–M clone sampling rate of 384 per 50 desired mutants (7.68-fold excess) was used for these calculations. uPIC–M (modified) represents a lower cost version of uPIC–M with the addition of pipet tip washing for plate liquid transfer steps. See Table S1 for full time and cost calculations. (C) Workflow for generating uPIC–M libraries in three phases: (1) Mutagenic oligos are synthesized for ∼50 residue windows on a pooled array and selective PCR amplification of each window generates a primer pool used for QuikChange; (2) pooled QuikChange reactions are transformed and plated, with each plate containing a mixture of ∼50 possible single mutants, facilitating colony picking into multiwell plates to isolate clonal libraries of unidentified variants; (3) clonal libraries are prepared and sequenced by NGS to reveal the genotype and location of each variant.

Overview of the uPIC–M pipeline to generate user-defined clonal mutant libraries. (A) Examples of clonal libraries from uPIC–M and potential high-throughput biochemistry applications. Applications are listed along with examples of the types of variants involved. (B) Comparison of cost (including materials and labor) of conventional mutagenesis vs uPIC–M for libraries of 50–20,000 mutants. A uPIC–M clone sampling rate of 384 per 50 desired mutants (7.68-fold excess) was used for these calculations. uPIC–M (modified) represents a lower cost version of uPIC–M with the addition of pipet tip washing for plate liquid transfer steps. See Table S1 for full time and cost calculations. (C) Workflow for generating uPIC–M libraries in three phases: (1) Mutagenic oligos are synthesized for ∼50 residue windows on a pooled array and selective PCR amplification of each window generates a primer pool used for QuikChange; (2) pooled QuikChange reactions are transformed and plated, with each plate containing a mixture of ∼50 possible single mutants, facilitating colony picking into multiwell plates to isolate clonal libraries of unidentified variants; (3) clonal libraries are prepared and sequenced by NGS to reveal the genotype and location of each variant. As high-throughput biochemistry tools increase the throughput of quantitative protein measurements by 102–103-fold, generating the requisite variant libraries has emerged as the new bottleneck.[1−5] For HTB to provide measurements for rationally chosen protein variants, input libraries must be user-defined clonal mutant libraries in which individual mutants are sequence-validated and physically isolated from one another for downstream assays. Conventional site-directed mutagenesis generates user-defined, isolated variants by performing each mutagenesis reaction, plasmid isolation step, and downstream sequencing within physically separated reactions. This approach results in high control (the ability to create only mutants of interest), but is prohibitively costly and labor-intensive for applications requiring >100 variants (Figure B, Table S1). Conversely, existing techniques for generating mutant libraries, while powerful, are typically not suited for generating large-scale user-defined clonal mutant libraries.[12] For example, error-prone PCR[13−15] and mutagenic oligos containing degenerate codons[16,17] allow generation of extremely large mutant libraries (107–109) at relatively low cost; such libraries are ideal for selecting constructs with desired characteristics, but these mutagenesis strategies do not allow generation of a desired set of defined sequences. Here, we introduce uPIC–M (User-designed Physically Isolated Clonal–Mutant) libraries, a method to prepare the needed mutant libraries that dramatically reduces the time and cost of conventional mutagenesis to empower high-throughput biochemistry (Figure C). uPIC–M can create 102–104 mutants at a material and labor cost of ∼$11 USD/mutant in 40 days for 5000 mutants, compared to an estimated $26 USD/mutant in 200 days for conventional mutagenesis (Figure B, Table S1). The uPIC–M pipeline includes three stages of library production: (i) user-directed pooled mutagenesis, using commercially available oligo arrays; (ii) isolation of mutant clones with widely available robotic pickers; and (iii) next-generation sequencing (NGS) to identify clone sequences and their locations, leveraging recent automation developments from single-cell sequencing.[18] uPIC–M uses the robust and accessible Illumina sequencing platform and a combination of existing open-source and custom analyses available on public software repositories to rapidly identify and evaluate library variants. To develop and test uPIC–M, we set out to produce a scanning mutant library encoding single substitutions for every position in a 541 amino acid enzyme. Guided by stochastic sampling simulations, we picked a total of 4992 colonies to yield 3530 fully sequenced clones containing 507 desired single alanine and valine mutants, representing a library coverage of 94%. The efficiency and speed of this platform will accelerate the adoption and expand the scope of HTB.

Results and Discussion

Overview of uPIC–M

The uPIC–M library generation pipeline consists of three stages (Figure C, 1–3) over approximately 8 days (Figure S1). During stage 1 (“generate mutant plasmids”), E. coli are transformed with pooled libraries of mutant plasmids generated via QuikChange-HT mutagenesis using user-defined, array-synthesized mutagenic oligonucleotides to create the specified variants. During stage 2 (“isolate mutant clones”), transformed E. coli are plated to isolate individual mutant colonies, which are then picked and used to inoculate liquid cultures within multiwell plates. During stage 3 (“sequence and identify clones”), mutant DNA is amplified and “barcoded” with well-specific primer sequences (“barcodes”) prior to pooling for NGS. For amplicons longer than 600 nucleotides (the maximum read length of typical paired end Illumina sequencing reads), amplified sequences can be fragmented using Tn5 transposase prior to barcoding to ensure the ability to acquire and associate reads spanning the complete amplicon. This barcoding strategy allows parallel sequencing while (i) preserving the plate-well origin of each read and (ii) providing a means to group reads for reconstructing the full-length sequence of each clone. After sequencing, NGS reads are first demultiplexed according to the library barcode (here: 4992 barcodes); reads are then grouped by the barcodes specifying each well and aligned to the WT “reference” amplicon sequence, and variants are “called” from these aligned sequences. uPIC–M thus reports the full-length ORF sequence, physical well location, and quality information of clonal library variants, allowing users to select thousands or more single mutant clones of interest to create curated libraries for downstream high-throughput biochemistry applications.

Design of Tiling ORF Windows Allows Selective Mutagenesis from Oligo Arrays

We used QuikChange-HT mutagenesis, an oligo array-based strategy that provides rationally chosen mutants and offers the following advantages: (1) a simple experimental procedure, thus increasing throughput; (2) the ability to selectively amplify distinct mutagenic oligo subsets from the same array, permitting the use of the same source array for different experiments and targets; and (3) the ability to implement a design strategy that disfavors the production of double and higher-order mutants during pooled mutagenesis reactions, reducing otherwise-costly downstream sampling of clones to identify the desired single mutants.[19] Other previously reported methods can generate large libraries of rationally chosen mutants from oligo arrays but lack these time- and cost-saving features.[20,21] QuikChange-HT generates mutants by a straightforward approach, the same as conventional PCR mutagenesis, but uses a unique mutagenic oligo design strategy that meets our needs. Coding regions are first divided into ∼200–300 nucleotide “windows” (with the exact length dependent on maximum oligonucleotide synthesis length and cost/nt). The 5′ and 3′ termini of each window (∼25 nt each) act as universal primer sites for amplification of that window from the pooled arrays and the ∼150 nt intervening sequence carries user-defined codon substitutions across ∼50 residues that will be introduced by QuikChange (Figure A). Overlapping adjacent windows by ∼20–30 bp makes it possible to uniquely amplify all mutagenic oligonucleotides within single windows with a corresponding primer pair (Figure S2). This strategy allows mutagenic oligos for many uPIC–M targets to be encoded by the same parent array, greatly reducing the cost (per oligo) and makes it possible to continue to generate mutagenic oligos from the array via PCR. Downstream mutagenesis reactions use the amplified mutagenic oligos as primers to produce pooled mutant sublibraries and proceed by iteratively denaturing double-stranded plasmid DNA, annealing the oligonucleotide that encodes the desired mutation and extending via a high-fidelity polymerase. After rounds of annealing and extension, parental methylated and hemi-methylated strands are digested via DpnI prior to transformation. The window approach reduces the likelihood of double and higher-order mutants by dividing sublibraries into separate reactions, which contain only mixtures of mutagenic primers that share the same termini sequences. As such, pooled mutagenic primers bind competitively to the same sequence of template DNA, reducing the likelihood of double and higher-order mutants arising at this step.
Figure 2

Tiling window strategy for uPIC–M mutagenic oligo array design. (A) Tiling window strategy (see Figure C) divides the ORF from the protein of interest into mutagenic sublibrary regions, with sublibrary oligo length constrained by DNA synthesis limits. Each window contains unique forward and reverse priming sites (dark shading, here ∼25 nt each) at the 5′- and 3′-termini surrounding a mutational region (light shading, here ∼150 nt). For a scanning library, each codon along the length of a sublibrary mutational region is substituted via an individual mutagenic oligo. (B) Selective amplification of oligos from a single window (sublibrary 11). Forward and reverse primers specific to a single sublibrary are used to amplify oligos from the resuspended array material, yielding an oligo pool containing ∼50 codon substitutions from the same mutagenic window.

Tiling window strategy for uPIC–M mutagenic oligo array design. (A) Tiling window strategy (see Figure C) divides the ORF from the protein of interest into mutagenic sublibrary regions, with sublibrary oligo length constrained by DNA synthesis limits. Each window contains unique forward and reverse priming sites (dark shading, here ∼25 nt each) at the 5′- and 3′-termini surrounding a mutational region (light shading, here ∼150 nt). For a scanning library, each codon along the length of a sublibrary mutational region is substituted via an individual mutagenic oligo. (B) Selective amplification of oligos from a single window (sublibrary 11). Forward and reverse primers specific to a single sublibrary are used to amplify oligos from the resuspended array material, yielding an oligo pool containing ∼50 codon substitutions from the same mutagenic window. To develop and demonstrate the capabilities of uPIC–M, we designed a library of mutagenic primers to mutate each residue of the 541 amino acid alkaline phosphatase SpAP to Ala or Val. This enzyme, from the organism Sphingomonas. sp. Strain BSAR-1, was selected for its compatibility with a high-throughput assay platform previously reported by our groups.[5,22] For this assay format, SpAP is fused to a C-terminal eGFP reporter, which was not targeted for mutagenesis. The design process generated 13 mutational windows to efficiently encode the selected valine or alanine substitution at each position (Table S2).

QuikChange-HT Mutagenesis

Subsets of mutants are created in sublibrary pools, with one mutagenesis reaction carried out per sublibrary. To generate the material for each of 13 mutagenesis reactions for SpAP, we first amplified mutagenic primers for a given “window” from the total oligonucleotide pool via PCR and window-specific primers (Figure B, Table S2). Following spin-column purification, these amplified primers were used directly as QuikChange-HT mutagenic primers. Agilent-designed (see Materials and Methods) primers resulted in clean amplification of sublibrary mutagenic primer pools (Figure S2) from an array containing scans for SpAP as well as four additional genes (see data repository for full array sequence) with purified yields of ∼14–50 nM each (Table S3). We then performed mutagenesis reactions for each sublibrary following standard QuikChange protocols (linear PCR amplification of WT template followed by DpnI digestion).

Simulated Mutant Sampling to Predict Screening Requirements

For randomly sampled clones from a pool of variants, one needs more than the number of desired mutants to obtain complete or near-complete sampling, as the probability of obtaining a novel variant (one that has not already been sampled) decreases with increased sampling (similar to “the birthday problem” or the related “coupon collector’s problem” in probability theory). To estimate the number of clones that must be sampled to recover a given fraction of mutants from a specified variant population, we simulated stochastic sampling experiments in which we sampled clones N times from a pool of M variants without replacement. For libraries of 50, 500, and 5000 variants, 110, 1150, and 11,600 draws were required to recover ≥90% of desired clones, respectively (Table S4). To consider how the presence of WT clones or unwanted variants (e.g., undesired single mutants and/or higher-order mutants) affect recovery rates, an additional term was added specifying the probability that any given draw returns a single mutant (Figure A). As expected, lower rates of single mutant recovery led to a requirement for more clone sampling to obtain equivalent library coverage (Table S4).
Figure 3

Simulated sampling of pooled single mutant libraries. (A, B) Simulation of the number of unique mutants obtained as a function of the number of clones sampled for pooled libraries containing 50 (A) or 541 (B) unique single mutants with single mutant frequencies from 0.1 to 1.0. The remaining fraction of each pool represents all other variants (e.g., WT, indels, double, and higher-order mutants). Each curve represents the average of 103 simulations; shaded bands represent the 95% confidence interval; horizontal dashed lines (A, B) indicate the total possible number of unique mutants; vertical line (B) indicates the number of colonies picked for the SpAP library constructed herein (for legend, see A). (C–E) Simulated picking results for a sublibrary containing 50 single mutants at equal relative abundances sampled 384 times with a single mutant frequency of 0.5. (C) Simulated positional frequencies of single mutants; the results of five sampling simulations were chosen at random. (D) Histogram of expected mutant yields and (E) histogram of expected yields per sublibrary position (from 103 sampling events).

Simulated sampling of pooled single mutant libraries. (A, B) Simulation of the number of unique mutants obtained as a function of the number of clones sampled for pooled libraries containing 50 (A) or 541 (B) unique single mutants with single mutant frequencies from 0.1 to 1.0. The remaining fraction of each pool represents all other variants (e.g., WT, indels, double, and higher-order mutants). Each curve represents the average of 103 simulations; shaded bands represent the 95% confidence interval; horizontal dashed lines (A, B) indicate the total possible number of unique mutants; vertical line (B) indicates the number of colonies picked for the SpAP library constructed herein (for legend, see A). (C–E) Simulated picking results for a sublibrary containing 50 single mutants at equal relative abundances sampled 384 times with a single mutant frequency of 0.5. (C) Simulated positional frequencies of single mutants; the results of five sampling simulations were chosen at random. (D) Histogram of expected mutant yields and (E) histogram of expected yields per sublibrary position (from 103 sampling events). We used these stochastic simulations to estimate the number of clones required to recover ≥90% of desired mutants within the 541 amino acid scanning mutagenesis library (V and A substitutions) for the SpAP construct (Figure B). To estimate the rates at which QuikChange-HT mutagenesis returns desired single mutants, we performed a preliminary pooled mutagenesis reaction, plated transformed E. coli, and Sanger sequenced 96 isolated clones. This preliminary sampling experiment returned 11 WT, 60 single mutant, and 5 double, triple, and greater mutant constructs, and 20 additional clones with indels and/or sequencing errors, suggesting an approximate single mutant rate of 63% (Table S5). We elected to oversample each sublibrary, with up to 384 possible clones for each set of up to 50 desired mutants. Simulating this sampling ratio with a single mutant rate of 50% for 50 possible mutants predicts a 92–100% yield (46–50 mutants) (Figure C,D). The distribution of the expected number of mutants per position obtained from random sampling revealed expected distributions of 0–11 mutants recovered at each position with a median of 4 (95% confidence interval of 0–8) (Figure E). The SpAP sublibraries encoded variable numbers of single mutants (range of 25–48 possible mutants each, Table S2). Sampling at an approximately 384:50 clone to mutant ratio is a compromise as the increase in time is negligible (e.g., compared to sampling half as many clones) and still results in substantial savings in costs compared to conventional mutagenesis (Figure B).

Clonal Mutant Isolation from Plasmid Libraries by a Pick-and-Grow Step

Clones must be physically isolated, both as a requirement of downstream high-throughput biochemistry assays and to permit sequence identification and validation (Figure C, (2)). To facilitate separation via robotic colony picking, we transformed chemically competent E. coli with pooled mutagenesis reactions for each sublibrary mutational window and then plated transformations on LB agar plates (150 mm) supplemented with antibiotic for outgrowth overnight at 37 °C. These reactions produced a range of colonies (25–440 colonies/plate, Table S6) despite the use of identical concentrations of WT template and sublibrary primer concentrations in each (15 nM stock concentrations). As robotic colony selection by imaging requires colonies within a narrow range of size, shape, and density (∼300–500 evenly spaced colonies per 150 mm plate), we re-plated sublibrary transformations that were outside of this range at higher or lower density (6 of 13); for reactions that still yielded insufficient densities (3 of 13), we successfully repeated QuikChange reactions at the highest stock sublibrary primer concentrations (Table S6). Guided by our stochastic sampling simulations, we selected ∼384 colonies for each sublibrary, with a throughput of 8–10 384 well plates/day, for a total of ∼1.5 days for the 13 SpAP sublibrary plates. The robotic colony picker occasionally picked at the interface of multiple colonies, likely leading to mixtures of multiple variants within some wells. For significantly larger mutant libraries, alternative robotic systems that allow automated agar source and multiwell destination plate handling or single microbe[23] or single droplet-based[24] methods for cell sorting could significantly enhance throughput.

Preparation of Mutant DNA Amplicons

DNA derived from mutant plasmids in E. coli clones must be amplified and enriched prior to NGS library preparation (Figure C, (3)). We generated amplicons by PCR (instead of isolating plasmids), as PCR requires minimal sample handling and produces linearized products that are directly compatible with downstream steps (sequencing library preparation and cell-free expression for HTB assays; Figure A,C). We amplified a 2525 bp region from each clone using universal primers complementary to the 5′- and 3′-UTR regions surrounding the SpAP-eGFP coding sequence (see Materials and Methods). To reduce contamination from E. coli genomic DNA in the final library, we systematically diluted liquid culture templates and measured contamination by qPCR (Figure S3). A 1:1000 dilution of six sample mutant cultures into H2O (corresponding to a final dilution of 1:5000) reduced E. coli DNA contamination to the limit of detection. To generate uniform amounts of PCR product from variable amounts of DNA templates within culture plates, we performed 25 cycles of PCR using a high-fidelity polymerase (Figure S4). We selected these conditions for amplification of mutant DNA from SpAP sublibrary plates. After dilution and amplification, DNA concentrations measured for half of the wells within each 384 well plate varied by ∼5-fold across all samples, and with median concentrations of ∼20–60 ng/μL (Table S7, Figure S5).
Figure 4

Schematic of uPIC–M sequencing library preparation. Preparation of sequencing libraries takes place in multiwell plate format (96 or 384) via the following steps: (i) ORF regions of target plasmids are amplified from each clone using universal primers to obtain enriched amplicon DNA (A); (iia) For amplicons ≤600 bp, universal Illumina adapters may be ligated directed to amplicons or added by amplification in a second PCR step; (iib) for amplicons >600 bp, DNA is fragmented and tagged using adapter-loaded Tn5 transposase, i.e., tagmented; (iii) amplicons or fragments are further amplified with Nextera primers that incorporate dual-unique i7 and i5 index barcodes; (iv–vii) amplified and barcoded clonal libraries are pooled for NGS, purified, sequenced, and barcodes are used to report the plate-well location and genotype of each variant (B). Mutant amplicons generated at (i) can be used directly for high-throughput biochemistry applications (shown here: cell-free expression and fluorogenic assay of an enzyme library using a microfluidic platform to obtain kinetic parameters) (C).

Schematic of uPIC–M sequencing library preparation. Preparation of sequencing libraries takes place in multiwell plate format (96 or 384) via the following steps: (i) ORF regions of target plasmids are amplified from each clone using universal primers to obtain enriched amplicon DNA (A); (iia) For amplicons ≤600 bp, universal Illumina adapters may be ligated directed to amplicons or added by amplification in a second PCR step; (iib) for amplicons >600 bp, DNA is fragmented and tagged using adapter-loaded Tn5 transposase, i.e., tagmented; (iii) amplicons or fragments are further amplified with Nextera primers that incorporate dual-unique i7 and i5 index barcodes; (iv–vii) amplified and barcoded clonal libraries are pooled for NGS, purified, sequenced, and barcodes are used to report the plate-well location and genotype of each variant (B). Mutant amplicons generated at (i) can be used directly for high-throughput biochemistry applications (shown here: cell-free expression and fluorogenic assay of an enzyme library using a microfluidic platform to obtain kinetic parameters) (C).

Tagmentation and Barcoding of Mutant Amplicons

Mutational regions spanning <600 nucleotides can simply be barcoded, and the entire region can be sequenced using 2 × 300 paired end Illumina reads (Figure B, iia).[25] Longer ORFs, as are common and is the case for our example, require an alternative step to enzymatically fragment DNA and associate well-specific barcodes with each fragment, as used here (Figure B, iib). Critically, both strategies install universal adapter sequences to the DNA within each sample well, providing priming sites for barcodes that are specific to each well in a subsequent amplification step. Following Tn5 tagmentation of the 2525 bp SpAP-eGFP amplicons, we used the universal adapter sequences attached to fragment ends as priming sites to amplify DNA and add sequences required for Illumina sequencing, including (1) sequences required to bind amplicons to sequencing flow cells (p5/p7), (2) plate/well-specific index 1 and index 2 barcodes (i7/i5), and (3) complementary sites for sequencing primers (R1 and R2) (Figure B, iii).[26,27] All barcoded samples can then be pooled and sequenced in a single run via NGS (Figure B, iv–vi). We used portions of an available 7680 member (20 × 384) dual unique indexed i5/i7 Nextera barcode library. However, barcoding oligo costs can be significantly reduced using a combinatorial indexing strategy.[28] Tn5-based library preparation workflows (e.g., for single-cell libraries) often involve a bead-based cleanup and enrichment step of DNA templates prior to quantification, normalization, and tagmentation. This cleanup step is required to remove residual reagents and buffer components from dilute cDNA libraries[18] but adds significant time and cost. We reasoned that the concentrated mutant amplicons (Figure A) used in our workflow could be diluted to reduce residual PCR components to avoid this step for uPIC–M while still affording adequate amounts of DNA templates for tagmentation (typically performed at a template concentration of ∼0.1–1 ng/μL). Initial tests confirmed that for templates at identical concentrations, the yield after tagmentation and subsequent library amplification was comparable for purified and diluted samples (Figure S6A) and that the 0.1–0.5 ng/μL template prior to tagmentation resulted in quantifiable libraries with similar size distributions (Figure S6B). We diluted all SpAP sublibrary plates 1:100 in H2O prior to tagmentation, instead of performing the time-consuming normalization of individual wells, yielding final concentrations of ∼0.1–0.5 ng/μL prior to tagmentation (for stock concentrations, see Table S7). In the case of greater variation (>5-fold) in sample well DNA concentrations, multiple dilutions of the same plate could easily be performed and processed for sequencing with only modest increases in time and cost.

Automated Plate Processing

The above steps are labor-intensive if performed manually, but automated liquid handling techniques that are now used widely for single-cell sequencing applications can readily process >20 × 384 sample plates (or 7680 clones) for sequencing per day (see Materials and Methods). This approach allowed tagmentation and barcoding amplification of the 13 × 384 plate SpAP library in less than 1 day and used Mosquito and Mantis liquid handling systems. However, these liquid handling steps can be performed using a wide variety of other liquid-handling instruments.[29] Moreover, most of the liquid handling steps in the pipeline are associated with the tagmentation reactions required for sequencing ORFs ≥600 bp (see Materials and Methods). For libraries that do not require tagmentation (ORFs <600 bp), many of these liquid handling steps are not necessary, making manual pipetting a feasible option. Even for libraries requiring tagmentation, producing smaller libraries containing <1000 clones via manual pipetting increases library production time ∼2 to 3-fold but still provides a cost advantage compared to conventional mutagenesis.

Sequencing Library QC

The final step prior to pooled NGS involves evaluating library quality by quantifying the final concentration and distribution of fragment sizes. We pooled barcoded and amplified single clone libraries from each plate (i.e., one pooled sample per plate), purified and enriched them via a magnetic bead cleanup with size selection (see Materials and Methods), and then estimated fragment sizes and concentrations by microelectrophoresis (Figure ). The library quality and concentration varied by sample plate (Figure S7, Table S7); 7/13 sublibraries contained clear fragment peaks at 400–500 bp and all samples contained measurable fragments between 400 and 1000 bp without detectable contamination from low-molecular-weight sequencing adapters. This library quality allowed recovery of 65% of barcodes at high read depth across sublibrary plates (see below).
Figure 5

Sequencing library quality control results. (A) Plot of fluorescence (arbitrary units) vs fragment length for sublibary 1 following tagmentation and barcoding amplification. See Figure S7 for analogous data for the other sublibraries. (B) Electropherograms of sublibraries 1–13 (see Table S6 for integrated peak concentrations).

Sequencing library quality control results. (A) Plot of fluorescence (arbitrary units) vs fragment length for sublibary 1 following tagmentation and barcoding amplification. See Figure S7 for analogous data for the other sublibraries. (B) Electropherograms of sublibraries 1–13 (see Table S6 for integrated peak concentrations).

Sequencing Library Analysis

Analyzing results from NGS sequencing requires (1) grouping reads by barcode, (2) eliminating barcodes with low coverage, (3) removing poor quality bases and residual adapter sequences from reads, (4) aligning reads to the “reference” ORF (here, the SpAP-eGFP amplicon sequence), and (5) identifying and evaluating sequence variants (Figure A,B).
Figure 6

NGS data processing and read mapping pipeline and results for the SpAP scanning library. (A) Data processing steps and observed statistics. Raw FASTQ files (demultiplexed and unpaired) are filtered for barcodes containing 1 or more reads followed by adapter sequence trimming and pairing with read mates (if both reads are present and meet length/quality thresholds). Sequence-redundant readthrough read pairs are flagged at this stage and redundant read mates are discarded. (B) Trimmed and paired reads are mapped to the SpAP-eGFP amplicon, E. coli, and full plasmid genomes. (C) Histogram of total reads per barcode across all sublibraries following read trimming and pairing (n = 4645). (D) Barcode counts for each sublibrary plate at several read depth thresholds for the SpAP-eGFP ORF (>0 represents barcodes containing any mapped reads and remaining thresholds represent the minimum number of mapped reads at all positions; only barcodes containing at least one mapped read are included). The horizontal dashed line at 384 barcodes represents the maximum possible number of barcodes.

NGS data processing and read mapping pipeline and results for the SpAP scanning library. (A) Data processing steps and observed statistics. Raw FASTQ files (demultiplexed and unpaired) are filtered for barcodes containing 1 or more reads followed by adapter sequence trimming and pairing with read mates (if both reads are present and meet length/quality thresholds). Sequence-redundant readthrough read pairs are flagged at this stage and redundant read mates are discarded. (B) Trimmed and paired reads are mapped to the SpAP-eGFP amplicon, E. coli, and full plasmid genomes. (C) Histogram of total reads per barcode across all sublibraries following read trimming and pairing (n = 4645). (D) Barcode counts for each sublibrary plate at several read depth thresholds for the SpAP-eGFP ORF (>0 represents barcodes containing any mapped reads and remaining thresholds represent the minimum number of mapped reads at all positions; only barcodes containing at least one mapped read are included). The horizontal dashed line at 384 barcodes represents the maximum possible number of barcodes. From a 25 million capacity MiSeq v3 (2 × 300 bp) run, we obtained 4.5 × 107 total reads (read 1 and read 2) that were demultiplexed by the instrument using supplied i7/i5 barcodes (4992 barcodes total). We then discarded FASTQ files for barcodes with 0 reads (4646 retained barcodes, Figure A), trimmed off the universal Illumina adapter sequences, filtered reads based on quality scores and length using standard criteria,[30] and paired reads with their mate if present. If paired reads were fully redundant, i.e., with readthrough to an adapter sequence on the opposing terminus, one mate was discarded (typically R2).[30] We recovered 2.9 × 107 reads after the trimming and associated filtering step (Figure A), with a median of 6 × 103 reads per barcode (Figure C, Table ). Even for sublibraries with relatively poor tagmentation yields (4, 7, 9, 11–13; Table S7), median reads per barcode were comparable (within <3-fold) to efficiently tagmented sublibraries (Table ). This consistency was likely aided by sample normalization prior to sequencing, which accounted for differences in library concentrations.
Table 1

SpAP Mutational Sublibrary Sequencing Statistics

sublibrarytotal reads (×106)barcodesamedian readsSpAP readsbE. coli readsbplasmid readsbunmapped readsb
all28.6464559740.9600.020.02
12.736977940.9700.020.02
22.537275610.9600.020.02
32.734477710.9400.030.02
41.438334210.9400.020.03
52.335663390.9600.020.02
61.538235600.9600.020.02
72.337767820.9700.010.02
81.73804690.0800.020.89
8c1.618494180.9500.020.03
91.735944710.9300.030.04
103.838497870.9600.020.02
113.437790110.9700.020.01
121.425361710.9600.010.02
131.230937550.9600.020.02

Number of barcodes (out of a possible 4992 total or 384 per sublibrary) with >0 reads (mapped or unmapped).

Reported as the median value across barcodes with >0 reads.

This entry contains only sublibrary 8 barcodes with ≥500 reads.

Number of barcodes (out of a possible 4992 total or 384 per sublibrary) with >0 reads (mapped or unmapped). Reported as the median value across barcodes with >0 reads. This entry contains only sublibrary 8 barcodes with ≥500 reads. Next, we mapped the reads to multiple reference genomes using the Burrows-Wheeler Aligner.[31] For the SpAP mutagenesis library presented here, 95.3% of these reads mapped to the SpAP amplicon reference sequence (this sequence includes the 5′-UTR, eGFP fusion, and the 3′-UTR) and an additional 0.3% mapped to the E. coli genome. The remaining reads were mapped to plasmid-derived sequences outside of the amplicon region (2.1%) or were unmapped (2.4%), likely representing either low quality reads or contamination from human or other sources (Figure B). The ratio of SpAP-eGFP to E. coli reads was highly consistent across sublibrary plates (Table ). The read depth per barcode (calculated as the median read depth for all nucleotide positions of the SpAP-eGFP ORF within each barcode) varied from 0 to 1700 reads, with a median of 433 reads. Across the entire library, 65% of recovered barcodes were sequenced to a depth of ≥100 reads at all positions (Figure D, Table ).
Table 2

SpAP Read Depth Statistics

sublibrarybarcodesadepth ≥1bdepth ≥10bdepth ≥100bdepth ≥1000bkeptc
all457136033386292603530
13693253012620318
23672772602460274
33423002892440298
43672512341940247
53553272992340315
63782762501920264
73712522302090241
83761511391310146
93452632472020260
103843523423310352
113563293203020329
122522182091800216
133092822661990270

Number of barcodes (out of a possible 4992 total or 384 per sublibrary) with ≥1 SpAP reads.

Number of barcodes with at least this read depth at all positions in the SpAP-eGFP genome.

Number of barcodes meeting the depth threshold of at least 1 read at all positions, and ≥10 reads at ≥95% of positions. Only barcodes meeting this depth threshold were carried forward for subsequent variant analyses.

Number of barcodes (out of a possible 4992 total or 384 per sublibrary) with ≥1 SpAP reads. Number of barcodes with at least this read depth at all positions in the SpAP-eGFP genome. Number of barcodes meeting the depth threshold of at least 1 read at all positions, and ≥10 reads at ≥95% of positions. Only barcodes meeting this depth threshold were carried forward for subsequent variant analyses.

Quantifying the Yield of Single Mutants

The next stage of assembling a ready-to-use library for high-throughput biochemistry assays is to identify the clones containing single mutations and map each mutant to plate-well locations. We selected barcodes meeting the following depth threshold for further analysis: ≥1 read at all reference sequence positions and ≥10 reads at ≥95% of all reference sequence positions. Of 4645 barcodes with ≥1 mapped read, 3530 (76%) met this threshold across the entire library. To detect variants associated with each barcode in batch, we applied a SAMtools module[32,33] to process all mapped reads and generate output files (variant call files, .vcf) for each barcode containing SpAP-eGFP variants (single nucleotide substitutions, indels, or null if WT). WT and indel-containing barcodes were discarded (Figure A). For barcodes containing single nucleotide substitutions, we determined the corresponding codon and amino acid changes, assessed whether observed substitutions were intended (correct mutant identity and sublibrary) or unintended, and evaluated whether these barcodes contained single, double, or triple and greater numbers of amino acid substitutions. We also stored variant quality statistics at this stage, including the number of forward and reverse reads containing variant vs WT nucleotide sequence. Among barcodes containing single mutants, most observed mutations were intended (97%) (Figure A).
Figure 7

Characterization of the SpAP alkaline phosphatase scanning mutant library created with uPIC–M. (A) Overview of variant detection analyses and calculated yields (red) for the SpAP mutant library. (B) Overall distribution of single mutants, WT, double mutants, triple and greater mutants, and indels across all mutational sublibraries (indel count reflects variants containing one or more indels). (C) Location and frequency of intended single mutants across the entire SpAP-eGFP ORF. (D) Scatter plot and histograms of variant reads vs WT reads for all intended single mutants. (E) Comparison of simulated and observed single mutant frequency distributions for three sublibraries. The legend specifies the observed yield of unique single mutants and simulated 95% confidence interval from 1000 events; “n” indicates the total number of observed intended single mutants. Results for all sublibraries are shown in Figure S9.

Characterization of the SpAP alkaline phosphatase scanning mutant library created with uPIC–M. (A) Overview of variant detection analyses and calculated yields (red) for the SpAP mutant library. (B) Overall distribution of single mutants, WT, double mutants, triple and greater mutants, and indels across all mutational sublibraries (indel count reflects variants containing one or more indels). (C) Location and frequency of intended single mutants across the entire SpAP-eGFP ORF. (D) Scatter plot and histograms of variant reads vs WT reads for all intended single mutants. (E) Comparison of simulated and observed single mutant frequency distributions for three sublibraries. The legend specifies the observed yield of unique single mutants and simulated 95% confidence interval from 1000 events; “n” indicates the total number of observed intended single mutants. Results for all sublibraries are shown in Figure S9. Across all sublibraries, single mutants comprised 57% of the clones, ranging from 52–68% within each sublibrary (Figure B; Table ), similar to the results of small-scale testing (60%, Table S5) and within the range of mutant picking simulations (10–100%; see “Simulated mutant sampling to predict screening requirements”) (Table ). Double and triple and greater mutants comprised 28% of the sequenced clones (Figure B), higher than the 5% observed during small-scale testing. This higher percentage may arise in part from cross-contamination between single mutant clones during plate handling steps prior to barcode introduction, which was absent during small scale testing. Variant:WT read ratios across single, double, and triple and greater mutants are consistent with this model (Figure S8).
Table 3

Variant Content of the SpAP Scanning Library

sublibraryresiduestotal positionsbarcodesasingle totalsingle intendedsingle unintendeddoubletriple+indelsWT
all2–54254135302056199660761212174327
12–4140318178175361182041
242–894827415414866617829
390–13748298168161765201431
4138–18548247135131443272418
5186–23247315175169685112123
6233–27947264141138360232317
7280–3264724114113475912524
8327–3563014694913215224
9357–40246260148142650111635
10403–4484635221020468124730
11449–4914332923022466015519
12492–51726216120120052201212
13518–5422527016215935891724

Number of barcodes meeting the depth threshold of at least 1 read at all positions, and ≥10 reads at ≥95% of positions.

Number of barcodes meeting the depth threshold of at least 1 read at all positions, and ≥10 reads at ≥95% of positions. Next, we examined the identity and location of single mutant variants and found that mutants were evenly distributed with minimal positional bias (Figure C). Overall, we recovered 507 of 541 desired single mutants (94% coverage) from 3530 colonies, with coverage ranging from 87 to 98% within sublibraries (Table S8). These data demonstrate that uPIC–M is capable of producing user-defined single mutant libraries at high coverage.

Assessment of Single Mutant Purity

High-throughput biochemistry demands different levels of mutant purity depending on the application. Quantitative measurements of variant stabilities or ligand affinities can tolerate low amounts of contamination, as this contamination leads to accordingly low errors on thermodynamic parameters. By contrast, measurements of enzyme turnover are highly sensitive to contamination as a small (∼1%) fraction of WT enzyme could dominate apparent rates when attempting to measure a catalytically impaired mutant with activity that is ≪1% of WT. Above, we classified all barcodes containing only one amino acid mutation as single mutants without considering the fraction of mutant reads at that position. Here, we assess mutant purity by quantifying the number of forward and reverse reads containing either the mutant or WT nucleotide at the mutated position. Across this library, single mutants contained a wide range (10–1000) of variant reads and relatively few but detectable WT reads (Figure D). To calculate mutant purities, we devised a quality threshold representing the minimum number of variant reads and the minimum ratio of variant:WT reads (Table S8); as many barcodes contained 0 WT reads, this threshold represents a lower limit on single mutant purities. Library yield dropped from 507 mutants (94%) to 498 (92%) mutants and 484 (89%) mutants when applying thresholds of 10 and 100, respectively.

Evaluation of Method Performance

To measure the success of uPIC–M compared to picking simulations, we calculated expected unique mutant yields per SpAP sublibrary. We repeated simulations, this time substituting values for experimental variables that were assumed using only generic estimates in the initial simulations described above. These parameters were (1) the number of sequenced barcodes per sublibrary, which is then used as the number of simulated draws, and for each sublibrary was fewer than the 384 possible, (2) observed single mutant frequency, as this defines the chance of drawing a single mutant among a pool containing multiple categories of variants, and (3) the total number of possible unique mutants, which varied by sublibrary (Table S2) and is proportional to the number of draws necessary to achieve a desired level of library coverage. For each sublibrary, we ran 104 picking simulations to estimate the expected median number of unique mutants (and 95% confidence interval) and distribution of mutants per position (Figure E, selection of 3 sublibraries; Figure S9, all sublibraries). Observed unique single mutant yields matched expectations for 10 of 13 sublibraries (2–8, 11–13) (Table S9). The remaining 3 plates (1, 9, 10) yielded one fewer single mutant than expected at the 95% confidence interval, suggesting that an unequal abundance of some single mutants slightly decreased library coverage. Together, these results establish that the picking simulations presented here can accurately guide experimental pipelines for generating uPIC–M libraries.

Conclusions

uPIC–M delivers user-designed, clonal, single mutant libraries at a significant savings of time and cost compared to conventional mutagenesis by combining commercially available oligonucleotide arrays with commonly used automated liquid handling platforms and barcoded Illumina sequencing strategies. Here, we used this method to rapidly generate a single mutant scanning library of a 541 amino acid enzyme and achieved a >90% yield of desired variants. This method is immediately applicable to new protein targets, and we provide a detailed workflow for rigorous characterization of library sequences using a collection of open-source and custom data analysis tools. In future work, uPIC–M can readily be extended to a variety of applications beyond generating single mutant libraries. The relatively long mutagenic primers used in QuikChange-HT mutagenesis hybridize efficiently and specifically even in the presence of several nucleotide substitutions, making it possible to introduce multiple mutations within the same mutagenic window.[34,35] uPIC–M’s window design strategy can also be adapted to create deletion or insertion libraries using assembly-based strategies in pool.[36,37] A software tool under development by the Fordyce group will facilitate the design of single mutant and other variant libraries.[38] Finally, uPIC–M can be readily adapted for sequencing with other Illumina instruments or long-read sequencing approaches.[39] High-throughput biochemistry reveals biophysical insights on an unprecedented scale, but constructing variant libraries has become the new rate-limiting step.[4] The ability of uPIC–M to generate the required variant libraries rapidly and efficiently will expand HTB to the study of new protein targets, systems, and questions.

Materials and Methods

Description of Plasmid

The plasmid mutagenized here encoded the SpAP alkaline phosphatase family monoesterase[22,40] (Uniprot KB – A1YYW7) fused to a C-terminal eGFP via a 10 amino acid ser-gly linker. SpAP residues 1–540 (full-length sans signal peptide, original numbering with signal sequence = 20–559) were subcloned into the manufacturer-supplied plasmid from the PURExpress In Vitro Protein Synthesis Kit (New England Biolabs, Ipswich, MA, USA) using the Gibson assembly method with synthetic E. coli codon-optimized SpAP DNA (IDT, Coralville, IA, USA). The full plasmid map (Figure S10), DNA sequence (Figure S11), and the SpAP-eGFP fusion protein sequence (Figure S12) are provided in the Supplemental Information.

Design of Mutagenic Oligo Arrays

Oligo arrays were designed using the Agilent Technologies (Santa Clara, CA, USA) eArray web program (https://earray.chem.agilent.com/earray/). The full sequence of the PURExpress-SpAP-eGFP was provided as input to the eArray software (Figure S11), and mutational regions were manually adjusted until primer sequences for all sublibrary mutational windows (13 total) passed thermodynamic thresholds calculated by this software. Oligos were selected to mutate all non-valine residues from positions 2–326 to valine; all valine residues from positions 2–326 to alanine; all non-alanine residues from positions 327–542 to alanine; and all alanine residues from positions 327–542 to synonymous alanine codons, corresponding to 541 total mutants (Table S2). Each mutagenic oligo was synthesized in duplicate within a 7500 oligo capacity high-fidelity array (Agilent Technologies, see Table S10 for array sizes and sample pricing). The forward and reverse primer sequences (Table S2) required to amplify each mutational window from the pooled array were also obtained from the eArray design output and were purchased from IDT. The full array sequence is provided in an accompanying data repository (https://osf.io/k3rjy/).

Preparation of Sublibrary Mutagenic Primer Pools and PCR Mutagenesis

Lyophilized oligo arrays were resuspended in 200 μL of 10 mM Tris–HCl, pH 8.5 (EB) and then further diluted 1:100 with EB. Sublibrary oligo pools were amplified individually using window-specific primer pairs and KAPA HiFi HotStart ReadyMix (Roche, Indianapolis, IN, USA) with a final template concentration of 1:2000 resuspended oligo array and annealing temperatures of either 60 or 65 °C (depending on performance of individual primer pairs) for 25 PCR cycles. Resulting PCR products (“sublibrary mutagenic primers”) were purified using the StrataPrep PCR Purification Kit (Agilent Technologies) and analyzed for quality and concentration using TapeStation electrophoresis with HSD1000 ScreenTapes (Agilent Technologies). For initial mutagenesis reactions, primer stocks were normalized to a uniform concentration, measured by UV absorbance, of 15 nM, that of the lowest concentration sublibrary pool (Table S3). PCR mutagenesis was performed using the QuikChange Lightning enzyme (Agilent Technologies) at an annealing temperature of 60 °C for 18 cycles with the following components: 2.5 μL of 10× manufacturer-supplied buffer, 1 μL of supplied dNTP mix, 0.75 μL of QuikSolution additive, 1 μL of 25 ng/μL PURExpress-SpAP-eGFP template plasmid, 15 μL of 15 nM sublibrary pool, 1 μL of QuikChange Lightning enzyme, and 3.75 μL of H2O. Following PCR, the template WT plasmid was digested by the addition of 1 μL of DpnI (Agilent Technologies) for 5 min at 37 °C. For sublibraries that provided an insufficient number of transformants, mutagenesis was repeated using the undiluted purified sublibrary primer pools.

Transformation, Plating, Colony Picking & Growth

DpnI-digested PCR mutagenesis reactions were transformed into chemically competent NEB 5-alpha E. coli. Transformations were plated on 15 cm LB agar plates containing 100 μg/mL ampicillin and grown overnight at 37 °C. Transformation ratios of 1:20–1:3.33 PCR product:cells were used to obtain a desired yield of ∼400–500 colonies per 15 cm plate. Colonies were picked manually (sublibraries 1 and 5 only) or using a PIXL robotic colony picker (Singer Instrument Company, Somerset, UK) at the Stanford University School of Medicine Genome Technology Center (Palo Alto, CA, USA). Single colonies were picked from source LB agar plates into 384 well (120 μL) destination microwell plates containing 60 μL of LB containing ampicillin. At least 384 colonies were picked and grown from each sublibrary window (Figure ). Microwell plates were sealed with gas-permeable AeraSeal film (MilliporeSigma, Burlington, MA, USA) and grown to saturation with shaking at 37 °C.

qPCR Detection of E. Coli Genomic DNA

E. coli cultures from clonal mutants were pooled and diluted from 101 to 104-fold to assay for genomic DNA concentration using qPCR with the commercial NEB Luna 2X MasterMix. A previously reported primer set to the rodA gene was used (forward: 5’-GCAAACCACCTTTGGTCG-3′; reverse: 5’-CTGTGGGTGTGGATTGACAT-3′).[41] Library samples were quantified using a standard curve of purified E. coli O157:H7 genomic DNA (Zeptometrix, Buffalo, NY, USA) at concentrations of 0.0001–1 ng/μL.

Preparation of Enzyme ORF Amplicons

Saturated clonal E. coli cultures in 384 well plates were diluted 1:1000 with H2O, by serial dilution using a 96-well Rainin Liquidator (Mettler-Toledo, Columbus, OH, USA). Dilutions and additional amplicon preparation steps were performed in 384 well Bio-Rad HSP3801 plates (Bio-Rad, Hercules, CA, USA). Primers were designed to amplify a 2525 bp region including the SpAP-eGFP ORF and 5′- and 3’-UTR segments (Figure S10). Forward (5′- gatctacactctttccctacacgacgctcttccgatctCCCGCGAAATTAATACGACTCACTATAGG-3′) and reverse (5’-gtctcgtgggctcggagatgtgtataagagacagGCACCACCTTAATTAAAGGCCTCC-3′) primers also contained Illumina Read 1 and Read 2 overhangs, respectively, shown in lowercase. Note: the underlined nucleotides in the Read 1 overhang were included inadvertently but did not affect downstream steps. Diluted cultures were used as PCR templates and amplified with KAPA HiFi HotStart polymerase at a scale of 4 μL:0.8 μL 1:1000 dilute culture template, 2 μL of KAPA HiFi HotStart ReadyMix (Roche), 0.96 μL of H2O, 0.12 μL of 10 μM forward primer, 0.12 μL of 10 μM reverse primer (see sequences above). Thermal cycling conditions were as follows: 95 °C, 5 min; 25 × [98 °C, 20 s; 60 °C, 15 s; 72 °C, 2 min]; 72 °C, 2 min.

Fluorescence Quantification of Amplicon DNA

Amplicon DNA concentrations after PCR were determined using the Quant-iT PicoGreen dsDNA assay (Thermo-Fisher Scientific, Waltham, MA, USA). In brief, a master mix containing 1:200 PicoGreen reagent (from stock concentration as supplied) in 10 mM Tris–HCl, pH 8.5 was prepared immediately before use and kept from light. Amplicon samples were prepared by the addition of 1.5 μL of 1:5 PCR reaction:H2O to 34 μL of the master mix. Standards were prepared by the addition of 1.5 μL of λ phage DNA ranging in concentration from 0 to 100 ng/μL to 34 μL of the master mix. Standard curves were included, in duplicate, on each sample plate. Samples were incubated ∼5 min at room temperature and read by fluorescence (λex = 480 nm, λem = 520 nm) using a Synergy H1 plate reader (BioTek, Winooski, VT, USA).

Tn5 Tagmentation

A library preparation procedure adapted from Picelli et al.,[18,26] with modifications, was used to generate barcoded, sequencing-ready libraries. Commercial pA-Tn5 (protein A-Tn5) was purchased pre-loaded with sequence adapters from Diagenode (Denville, NJ, USA) and diluted to a working concentration of 1:50 with dilution buffer (40 mM Tris–HCl, pH 7.5, 40 mM MgCl2). Amplicons in 384 well plates were diluted 1:100 in H2O to provide template concentrations suitable for tagmentation. For Tn5 reactions, 1.2 μL of master mix (0.2 μL 1:50 pA-Tn5; 1 μL 1.6× buffer containing 16 mM Tris–HCl, pH 8.0, 8 mM MgCl2, and 16% (v/v) dimethylformamide, sparged with N2 and added immediately prior to use) was added to each well of a new 384 well plate using a Mantis microfluidics liquid handler robot (Formulatrix, Bedford, MA, USA). Amplicon templates were added to Tn5 reaction mixtures (0.4 μL 1:100 template each) using a Mosquito LV pipetting robot (SPT Labtech, Boston, MA, USA). Reaction plates were sealed, briefly vortexed, collected by centrifugation, and incubated for 7 min at 55 °C. Tn5 reactions were stopped by the addition of 0.4 μL of 0.1% (w/v) sodium dodecyl sulfate, using the Mantis liquid handler.

i7/i5 Barcoding PCR and Library Cleanup

Tagmented libraries were barcoded and amplified using KAPA HiFI polymerase and a collection of Nextera XT 12mer dual unique index sequencing primers (purchased from IDT and supplied by CZ-Biohub). First, 1.2 μL of a master mix containing 0.08 μL of KAPA HiFi (1 U/μL), 0.8 μL of 5× buffer (manufacturer supplied), 0.12 μL of 10 mM dNTP mix (2.5 mM each), and 0.2 μL of H2O was added to each 2 μL SDS-halted Tn5 reaction using the Mantis liquid handler. Next, 0.8 μL of unique i5/i7 primer mix (2.5 μM each) was transferred from source plates to sample using the Mosquito instrument. Reaction plates were sealed, briefly vortexed, collected by centrifugation, and amplified with the following thermal cycler conditions: 72 °C, 3 min; 95 °C, 30 s; 12 × [98 °C, 10 s; 55 °C, 15 s; 72 °C, 1 min]; 72 °C, 5 min. Resulting libraries were pooled (with the Mosquito) and treated with AMPure XP magnetic beads at a ratio of 0.8:1 beads:sample volume to purify DNA (Beckman Coulter, Brea, CA, USA). Library yield and quality were determined by TapeStation electrophoresis with an HSD1000 ScreenTapes (Agilent Technologies).

Next-Generation Sequencing

Sequencing was performed by SeqMatic (Fremont, CA, USA). Libraries were sequenced using Miseq v3 2x300 bp, with the addition of 1% PhiX control DNA. Samples were submitted with i7/i5 barcodes corresponding to each tagmented mutant and demultiplexed by the instrument.

Picking Simulations

Pools of mutants were simulated as numeric lists containing unique elements equal in number to desired simulated mutant pool. The random module (Python 3[42]) was used to sample from this pool pseudo-randomly, thereby simulating a large (compared to sampling events) mutant pool with identical distributions of each unique mutant (https://github.com/FordyceLab/uPICM). Simulation of sampling from a variant pool containing additional non-single mutants was accomplished by the above strategy with the addition of a preceding step. This step introduced a pseudo-random draw from a pool containing specified fractions of single mutants (0.1–1.0) and non-single mutants (0–0.9), with only draws picking among the single mutant fraction carried forward.

NGS Data Processing, and Analysis

NGS data were processed using open source software tools, executed with the Snakemake workflow tool.[43] First, sequences of Illumina adapters were trimmed and redundant (read-through) read mates were disposed from demultiplexed fastq read files using the Trimmomatic package.[30] Trimmed reads were aligned to the PURExpress-SpAP-eGFP plasmid, and separately, the full E. coli genome (NCBI Reference Sequence: NC_000913.3) using BWA-MEM,[31] with the output mapped, sorted, and indexed with SAMtools.[32] Variant base calls were identified against the PURExpress-SpAP-eGFP plasmid genome using the BCFtools utility of the SAMtools package.[33] Alignments were visualized using Integrative Genomics Viewer.[44] Subsequent analyses were performed with custom code using Python 3, available at (https://github.com/FordyceLab/uPICM). Raw sequencing files are provided in our data repository (https://osf.io/k3rjy/).
  39 in total

1.  Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism.

Authors:  Qingling Tang; Aron W Fenton
Journal:  Hum Mutat       Date:  2017-06-16       Impact factor: 4.878

Review 2.  Best Practices for Illumina Library Preparation.

Authors:  Iraad F Bronner; Michael A Quail
Journal:  Curr Protoc Hum Genet       Date:  2019-06

3.  Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis.

Authors:  Alex Nisthal; Connie Y Wang; Marie L Ary; Stephen L Mayo
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-01       Impact factor: 11.205

4.  Construction and analysis of randomized protein-encoding libraries using error-prone PCR.

Authors:  Paulina Hanson-Manful; Wayne M Patrick
Journal:  Methods Mol Biol       Date:  2013

5.  An improved PCR-mutagenesis strategy for two-site mutagenesis or sequence swapping between related genes.

Authors:  R D Kirsch; E Joly
Journal:  Nucleic Acids Res       Date:  1998-04-01       Impact factor: 16.971

6.  Cloning and overexpression of alkaline phosphatase PhoK from Sphingomonas sp. strain BSAR-1 for bioprecipitation of uranium from alkaline solutions.

Authors:  Kayzad S Nilgiriwala; Anuradha Alahari; Amara Sambasiva Rao; Shree Kumar Apte
Journal:  Appl Environ Microbiol       Date:  2008-07-18       Impact factor: 4.792

Review 7.  Methods for enzyme library creation: Which one will you choose?: A guide for novices and experts to introduce genetic diversity.

Authors:  Lorea Alejaldre; Joelle N Pelletier; Daniela Quaglia
Journal:  Bioessays       Date:  2021-07-15       Impact factor: 4.345

8.  X-ray structure reveals a new class and provides insight into evolution of alkaline phosphatases.

Authors:  Subhash C Bihani; Amit Das; Kayzad S Nilgiriwala; Vishal Prashar; Michel Pirocchi; Shree Kumar Apte; Jean-Luc Ferrer; Madhusoodan V Hosur
Journal:  PLoS One       Date:  2011-07-28       Impact factor: 3.240

9.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

10.  Ancestral sequence reconstruction produces thermally stable enzymes with mesophilic enzyme-like catalytic properties.

Authors:  Ryutaro Furukawa; Wakako Toma; Koji Yamazaki; Satoshi Akanuma
Journal:  Sci Rep       Date:  2020-09-23       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.