Literature DB >> 35211637

Rapid, accurate mapping of transgene integration in viable rhesus macaque embryos using enhanced-specificity tagmentation-assisted PCR.

Junghyun Ryu¹, William Chan¹, Jochen M Wettengel^2,3, Carol B Hanna¹, Benjamin J Burwitz^3,4, Jon D Hennebold^1,5, Benjamin N Bimber^4,6.

Abstract

Genome engineering is a powerful tool for in vitro research and the creation of novel model organisms and has growing clinical applications. Randomly integrating vectors, such as lentivirus- or transposase-based methods, are simple and easy to use but carry risks arising from insertional mutagenesis. Here we present enhanced-specificity tagmentation-assisted PCR (esTag-PCR), a rapid and accurate method for mapping transgene integration and copy number. Using stably transfected HepG2 cells, we demonstrate that esTag-PCR has higher integration site detection accuracy and efficiency than alternative tagmentation-based methods. Next, we performed esTag-PCR on rhesus macaque embryos derived from zygotes injected with piggyBac transposase and transposon/transgene plasmid. Using low-input trophectoderm biopsies, we demonstrate that esTag-PCR accurately maps integration events while preserving blastocyst viability. We used these high-resolution data to evaluate the performance of piggyBac-mediated editing of rhesus macaque embryos, demonstrating that increased concentration of transposon/transgene plasmid can increase the fraction of embryos with stable integration; however, the number of integrations per embryo also increases, which may be problematic for some applications. Collectively, esTag-PCR represents an important improvement to the detection of transgene integration, provides a method to validate and screen edited embryos before implantation, and represents an important advance in the creation of transgenic animal models.

Entities: Chemical

Keywords: ggene editing; integration site mapping; lentiviral transduction; piggyBac transposase; transgenic embryos

Year: 2022 PMID： 35211637 PMCID： PMC8829455 DOI： 10.1016/j.omtm.2022.01.009

Source DB: PubMed Journal: Mol Ther Methods Clin Dev ISSN： 2329-0501 Impact factor: 6.698

Introduction

Randomly integrating gene delivery vectors, such as lentivirus- or transposase-based methods (i.e., piggyBac or Sleeping Beauty), are powerful and simple systems for genome engineering.1, 2, 3, 4 They are commonly used for in vitro or in vivo integration of genetic material into the host genome., The integration of an exogenous gene can disrupt endogenous genes, causing unwanted effects. In some settings, such as delivery of a reporter construct, controlling transgene copy number is important. The relative importance of insertional mutagenesis and copy number vary by application. For in vitro experiments with immortalized cells, the implications of unintended disruption of endogenous genes may be minimal; however, controlling unwanted effects may be critical when generating the progenitor for a transgenic animal model. When generating complex transgenic organisms, such as non-human primates, the time to sexual maturity, number of oocytes retrieved, and efficiency of successful pregnancy following implantation are all rate-limiting steps. Thus, it is critical to validate proper transgene delivery or genomic editing of the embryo prior to transplantation and to minimize the risk for deleterious off-target mutations. There are many methods for detecting integration events, each with advantages and disadvantages. Most are forms of PCR, including inverse PCR (iPCR), ligation-mediated PCR, and linear amplification PCR (LAM-PCR).9, 10, 11, 12, 13, 14 These methods each involve using restriction enzymes to digest the input DNA, which is then ligated to terminal adapter sequences. Therefore, these methods are limited to detecting integration events proximal to the chosen restriction enzyme(s). A newer version of the method, termed nonrestrictive LAM-PCR (nrLAM-PCR), does not require the use of restriction enzymes and was reported to provide comprehensive mapping of integration events; however, this is a time-consuming protocol. More recently, an alternative protocol was published that uses Tn5 transposase to randomly fragment the DNA (a process termed tagmentation) and add adapters in one step. This protocol, termed tagmentation-assisted PCR (Tag-PCR), is streamlined relative to LAM-PCR; however, the amplification strategy of Tag-PCR can result in the amplification and sequencing of non-target genomic DNA fragments, reducing efficiency and lowering sensitivity. Other tagmentation-based methods have been published that rely on non-commercial transposomes, loaded with custom adapters.18, 19, 20 Here we present an alternative method for tagmentation-assisted PCR, which we term enhanced-specificity tagmentation-assisted PCR (esTag-PCR). By redesigning the PCR enrichment strategy, we demonstrate considerably lower off-target amplification and higher efficiency than alternative tagmentation-based protocols. We further show that this method can be applied to low-input DNA from trophectoderm (TE) biopsies of edited rhesus macaque embryos, providing a practical method to screen embryos prior to implantation.

Results

Design of esTag-PCR

A schematic of the design of esTag-PCR and comparison with two previously described protocols for tagmentation-based transgene detection are shown in Figure 1. Tagmentation refers to the process by which Tn5 transposase is loaded with short adapter sequences, then used to randomly fragment DNA and ligate terminal adapters. Commercially available transposases, such as Illumina Nextera (also sold as TDE1), are generally loaded with two distinct adapters. In the case of Nextera/TDE1, these adapters partially match the Illumina Read 1 and Read 2 sequencing primers. As Tn5 can insert in either orientation, this results in DNA fragments with R1/R1, R1/R2, R2/R1, or R2/R2 terminal adapters (illustrated in red and purple in Figure 1).

Figure 1

Overview of Tag-PCR and esTag-PCR designs

(A) In both Tag-PCR and esTag-PCR, DNA is incubated with adapter-loaded transposomes, which fragment the DNA and add terminal adapters. Because the transposome can insert in either orientation, the resulting fragments can contain either Nextera-R1 + Nextera-R2, Nextera-R1 + Nextera-R1, or Nextera-R2 + Nextera-R2 adapters. In the original Tag-PCR protocol, PCR is performed using a transgene-specific primer (blue) and a primer complementary to one of the Nextera adapters (red). The transgene-specific primer is designed to add the other Nextera adapter (purple). This will specifically amplify fragments containing the transgene, creating molecules with Nextera-R1 and Nextera-R2 terminal adapters. However, the original pool of tagmentation products will contain non-specific fragments that also have Nextera-R1 and Nextera-R2 adapters from the initial fragmentation. A second round of PCR is performed using primers that bind the Nextera-R1 and Nextera-R2 sequences, adding P5 and P7 adapters. These primers can amplify both transgene-containing fragments and original tagmentation products, creating the potential for sequencing of non-specific fragments. The esTag-PCR protocol was designed to introduce several layers of specificity. The primary PCR is conducted using a transgene-specific primer, but no additional adapter is added. A second PCR is performed using a nested transgene-specific primer, which adds the P5 adapter and TruSeq + P7 adapters (a distinct Illumina-compatible sequencing adapter). (B) Alternative forms of tagmentation-assisted PCR are reported that use custom Tn5, loaded with two copies of a single terminal adapter. Similar to Tag-PCR, PCR is performed using a gene-specific primer and a primer targeting the Nextera-R1 adapter. A second round of PCR is performed using primers that bind the Nextera-R1 and Nextera-R2 sequences, adding P5 and P7 adapters.

Overview of Tag-PCR and esTag-PCR designs (A) In both Tag-PCR and esTag-PCR, DNA is incubated with adapter-loaded transposomes, which fragment the DNA and add terminal adapters. Because the transposome can insert in either orientation, the resulting fragments can contain either Nextera-R1 + Nextera-R2, Nextera-R1 + Nextera-R1, or Nextera-R2 + Nextera-R2 adapters. In the original Tag-PCR protocol, PCR is performed using a transgene-specific primer (blue) and a primer complementary to one of the Nextera adapters (red). The transgene-specific primer is designed to add the other Nextera adapter (purple). This will specifically amplify fragments containing the transgene, creating molecules with Nextera-R1 and Nextera-R2 terminal adapters. However, the original pool of tagmentation products will contain non-specific fragments that also have Nextera-R1 and Nextera-R2 adapters from the initial fragmentation. A second round of PCR is performed using primers that bind the Nextera-R1 and Nextera-R2 sequences, adding P5 and P7 adapters. These primers can amplify both transgene-containing fragments and original tagmentation products, creating the potential for sequencing of non-specific fragments. The esTag-PCR protocol was designed to introduce several layers of specificity. The primary PCR is conducted using a transgene-specific primer, but no additional adapter is added. A second PCR is performed using a nested transgene-specific primer, which adds the P5 adapter and TruSeq + P7 adapters (a distinct Illumina-compatible sequencing adapter). (B) Alternative forms of tagmentation-assisted PCR are reported that use custom Tn5, loaded with two copies of a single terminal adapter. Similar to Tag-PCR, PCR is performed using a gene-specific primer and a primer targeting the Nextera-R1 adapter. A second round of PCR is performed using primers that bind the Nextera-R1 and Nextera-R2 sequences, adding P5 and P7 adapters. Tagmentation-assisted PCR first performs PCR on the tagmented DNA using a primer that targets one of the terminal tagmentation adapters and a second transgene-specific primer. The transgene-specific primer re-adds the second tagmentation adapter sequence. A second indexing PCR is then performed using primers targeting the two original tagmentation adapter sequences. These secondary primers add the Illumina P5 and P7 sequences. Although this will enrich for transgene-containing fragments, the indexing PCR will also amplify non-specific fragments left over from tagmentation, which do not contain the transgene. One possible mechanism to circumvent this unwanted amplification is to use custom transposomes, loaded with two copies of the same adapter (typically Nextera Read 1), as has been published in multiple studies.18, 19, 20 Because DNA tagmented with these transposomes will contain only the R1 adapter, these methods will eliminate the potential for unwanted amplification of tagmented molecules that contain both R1 and R2 adapters. Although we are not aware of commercially available transposomes pre-loaded with single adapters, adapter-free Tn5 has recently become commercially available, such as Lucigen EZ-Tn5, which could be loaded with custom adapters. As a comparison for this study, we performed a version of Tag-PCR using single-adapter Tn5, which we term single-adapter Tag-PCR (saTag-PCR), that is most similar to TagMap or UDiTaS., We designed esTag-PCR as an alternative approach to increase amplification specificity, while relying on commercially available reagents. In the esTag-PCR protocol, tagmentation is performed using commercial dual-adapter Tn5 transposase, followed by first-round PCR using a primer targeting one of the tagmentation adapters and a transgene-specific primer; however, this PCR does not re-add the original adapter sequence. A second nested PCR is performed using a separate internal transgene-specific primer and a primer targeting the tagmentation adapter. The transgene-specific primer adds a TruSeq adapter (an alternate Illumina-compatible sequencing primer). Because both PCR steps require transgene-specific priming, this scheme should considerably reduce the possibility of off-target amplification. In this study, all three methods are performed using a cocktail of gene-specific primers targeting both the 5′ and 3′ ends of the transgene, which allows detection and mapping of both ends of the integrated transgene.

Data analysis and novel software

The detection of transgene integration into the genome is not a standard type of sequence analysis. Accurate mapping of transgene integration requires determining both the genomic location and orientation of the transgene, because both lentivirus and piggyBac transposase can integrate the transgene in either genomic orientation. Although in certain cases simply creating a table of integration sites is adequate, there are instances when independent validation of specific integration sites is needed. Our analysis scheme, described in detail in materials and methods, is shown in Figure 2. There are two general phases to the analysis: read filtering and alignment, followed by identification and scoring of predicted integration sites. For these analyses, we created two novel tools that are specific for the identification of transgene integration events. As part of pre-alignment processing, we created a utility to filter reads or read pairs on the basis of target sequences, using fuzzy matching (i.e., allowing a limited number of mismatches). For these analyses, we filter the input FASTQ data to retain only reads containing the terminal 15-mer from either end of the transgene, allowing up to 2 mismatches (the read pair is retained if either read contains the sequence). The passing reads, which span the genome-transgene junction (or represent non-integrated vector) are then aligned to the reference genome. We next created another novel tool, IntegrationSiteMapper, which encapsulates many of the steps needed for accurate mapping and validation of integration events. Using a customizable transgene definition (i.e., the sequence and orientation of the transgene terminal regions), the tool inspects the alignments to determine the location, orientation, and number of reads associated with each integration event. The tool also scans and reports reads matching the backbone of the transgene delivery vector, which is customizable, on the basis of the transgene definition. The latter can be useful for plasmid-based delivery systems, as non-integrated plasmid will be detected by esTag-PCR. The tool can optionally reconstruct and output the genome-transgene junctions as an annotated GenBank file and optionally design amplification primers to span each junction. Because this tool accepts a customizable transgene definition, it should be readily adaptable to any delivery system, as well as for mapping of other genomic elements such as transposons.

Figure 2

Overview of sequence analysis and integration site mapping

Data analysis is divided into two main phases: pre-processing and DNA alignment, followed by integration site detection. In the first phase, raw sequence reads are trimmed using quality scores and filtered to retain only reads containing a sequence matching the transgene terminal end(s). The resulting reads are aligned to the host genome using BWA-Mem, creating a BAM file. In the second phase, IntegrationSiteMapper scans the aligned reads for the transgene terminal sequence(s) and determines integration orientation and genomic position of each event. A summary table of unique integration events is created. Additionally, this tool can reconstruct and output the sequence around each genome-transgene junction in the proper orientation. The tool can optionally use these sequences to design amplification primers to span each junction by running Primer3Plus and BLAST.

Overview of sequence analysis and integration site mapping Data analysis is divided into two main phases: pre-processing and DNA alignment, followed by integration site detection. In the first phase, raw sequence reads are trimmed using quality scores and filtered to retain only reads containing a sequence matching the transgene terminal end(s). The resulting reads are aligned to the host genome using BWA-Mem, creating a BAM file. In the second phase, IntegrationSiteMapper scans the aligned reads for the transgene terminal sequence(s) and determines integration orientation and genomic position of each event. A summary table of unique integration events is created. Additionally, this tool can reconstruct and output the sequence around each genome-transgene junction in the proper orientation. The tool can optionally use these sequences to design amplification primers to span each junction by running Primer3Plus and BLAST.

Efficiency of esTag-PCR

We first sought to validate the esTag-PCR protocol and compare it with other published tagmentation-based protocols, including the previous-generation Tag-PCR protocol that relies on commercial dual-adapter Tn5 and a related protocol that uses custom Tn5 transposase loaded with a single adapter, termed saTag-PCR in this paper., We transfected HepG2 cells with a piggyBac transposase expression plasmid and a piggyBac-compatible transposon/transgene plasmid expressing moxGFP-P2A-NanoLuc luciferase. A panel of clonally derived cell lines that stably express the transgene were established. Because piggyBac-mediated integration is random, each clonal line is expected to have a unique set of integration events. Clones were selected to encompass a range of moxGFP/NanoLuc expression levels. To compare efficiency and accuracy, we performed Tag-PCR, saTag-PCR, and esTag-PCR on these clones and the parent HepG2 cells (non-transfected). Figure 3 shows a summary of the resulting sequence reads. Reads are first categorized as to whether they contain the transgene-terminal sequence. Reads lacking this sequence likely represent non-specific background amplification. For all samples, esTag-PCR produced considerably less background than other methods, which is expected given the increased amount of gene-specific PCR amplification used in esTag-PCR (Figure 1). For each cell line, esTag-PCR produced 8–13 times more on-target reads (spanning the transgene-genome junction) than Tag-PCR or saTag-PCR. Collectively, this demonstrates superior transgene-specific amplification from esTag-PCR.

Figure 3

Efficiency of Tag-PCR versus esTag-PCR

The piggyBac transposon system was used to generate HepG2 cells that stably express a moxGFP/luciferase expression construct. A panel of clonal cells lines were generated. Tag-PCR, saTag-PCR, and esTag-PCR were performed on these clonal lines, along with the parent HepG2 cells. The graph illustrates the fraction of sequence reads for each sample that fall into each of the following categories, indicated by the color legend: (1) reads lacking the terminal transgene sequence (likely PCR background), (2) reads containing this sequence but producing low-quality alignments, and (3) reads that span the transgene-genome junction. Relative to the other methods, esTag-PCR had a considerably higher fraction of on-target reads.

Efficiency of Tag-PCR versus esTag-PCR The piggyBac transposon system was used to generate HepG2 cells that stably express a moxGFP/luciferase expression construct. A panel of clonal cells lines were generated. Tag-PCR, saTag-PCR, and esTag-PCR were performed on these clonal lines, along with the parent HepG2 cells. The graph illustrates the fraction of sequence reads for each sample that fall into each of the following categories, indicated by the color legend: (1) reads lacking the terminal transgene sequence (likely PCR background), (2) reads containing this sequence but producing low-quality alignments, and (3) reads that span the transgene-genome junction. Relative to the other methods, esTag-PCR had a considerably higher fraction of on-target reads.

Accuracy of esTag-PCR

The integration events predicted by Tag-PCR, saTag-PCR, and esTag-PCR were then evaluated in the same clonal HepG2 lines. Each method returns a set of putative integration events (genomic location and orientation) and the number of associated reads. Figure 4 illustrates the results from each method, showing the proportion of reads matching each putative integration site (relative to total junction-spanning reads). A site is reported if at least four reads are detected for this location, which is a relatively permissive threshold. Clone-6, which contained four predicted sites, had zero background in the esTag-PCR and saTag-PCR data. In contrast, while Tag-PCR also identified the same four sites at somewhat high frequency, it predicted 150 additional sites, most of which are present at low levels. Although the additional sites predicted by Tag-PCR could be removed by stricter filtering, the lowest true integration (10 reads) was only slightly above this threshold, and raising the filter threshold would increase the potential of filtering out true integration sites. For any given integration site, it is possible to detect the 5′ terminal junction, 3′ terminal junction, or both (Figure 4, triangles versus circles). Sites at which both junctions are detected are presumably of higher confidence, and this might be suitable as a filtering strategy. For the majority of sites predicted by esTag-PCR and saTag-PCR, both 3′ and 5′ junctions were detected (Figure 4, triangles). In contrast, Tag-PCR detected both junctions in only three of four integration sites for Clone-6 (a single transgene end was detected for chromosome X position 130,165,194), which suggests that detection of both junctions would not be a reliable filter criterion for Tag-PCR data.

Figure 4

Accuracy of Tag-PCR versus esTag-PCR

Tag-PCR, saTag-PCR, and esTag-PCR were performed on a panel of HepG2 clonally derived cell lines, where each clone contains a unique set of piggyBac-mediated integration events. (A–C) Each graph illustrates the predicted integration events for one clone, displaying the results from Tag-PCR (top), saTag-PCR (middle), and esTag-PCR (bottom). The y-axis indicates the fraction of reads for each predicted site. Sites are colored on the basis of several categories. If the site is predicted in a given sample by two methods, it is categorized as “shared.” For a subset of sites, primers were designed to span both the predicted 5′ and 3′ junction sites, followed by PCR amplification of the transgene-genome junction and Sanger sequencing. All of these sites were confirmed to be true integrations (red dots). Sites detected in esTag-PCR alone and validated by PCR and Sanger sequencing are colored orange. Any remaining shared sites (for which sequence validation was not performed) are colored blue. Sites were further categorized as to whether a site was predicted at the identical genomic position in multiple samples, which is unlikely for a randomly integrating vector (green dots). Remaining sites, which are only predicted in one sample and by one method, are colored gray. The shape of the glyph indicates whether both 5′ and 3′ transgene-genome junctions were detected (triangles) or whether only a single junction was detected (circles).

Accuracy of Tag-PCR versus esTag-PCR Tag-PCR, saTag-PCR, and esTag-PCR were performed on a panel of HepG2 clonally derived cell lines, where each clone contains a unique set of piggyBac-mediated integration events. (A–C) Each graph illustrates the predicted integration events for one clone, displaying the results from Tag-PCR (top), saTag-PCR (middle), and esTag-PCR (bottom). The y-axis indicates the fraction of reads for each predicted site. Sites are colored on the basis of several categories. If the site is predicted in a given sample by two methods, it is categorized as “shared.” For a subset of sites, primers were designed to span both the predicted 5′ and 3′ junction sites, followed by PCR amplification of the transgene-genome junction and Sanger sequencing. All of these sites were confirmed to be true integrations (red dots). Sites detected in esTag-PCR alone and validated by PCR and Sanger sequencing are colored orange. Any remaining shared sites (for which sequence validation was not performed) are colored blue. Sites were further categorized as to whether a site was predicted at the identical genomic position in multiple samples, which is unlikely for a randomly integrating vector (green dots). Remaining sites, which are only predicted in one sample and by one method, are colored gray. The shape of the glyph indicates whether both 5′ and 3′ transgene-genome junctions were detected (triangles) or whether only a single junction was detected (circles). Similar patterns were seen in Clone-13 and Clone-16, although each of these clones contained considerably more integration events. The concordance among esTag-PCR, saTag-PCR, and Tag-PCR was strong, with sites predicted by esTag-PCR also detected in saTag-PCR and Tag-PCR (Figure 4, blue icons); however, the Tag-PCR results contained significant low-frequency background, as before. Even though the set of predicted sites was nearly identical between esTag-PCR and saTag-PCR, esTag-PCR achieves this with considerably higher rates of transgene-containing reads, resulting in more efficient sequencing (Figure 3). As a result, the detection limit of esTag-PCR for a given sequencing depth should be greater. In fact, we identified a small number of sites in Clone-13 and Clone-16 predicted by esTag-PCR, but not detected using saTag-PCR (Figure 4, orange and gray dots in esTag-PCR panels). Although it is likely that with sufficient read depth, saTag-PCR would also detect these sites, this highlights the benefit of increased gene-specific amplification. To validate predicted integration events, 26 sites that spanned a range of frequencies were selected for independent validation (Table S2). For each site, we designed PCR primers to amplify both 3′ and 5′ transgene-genome junctions. We performed PCR, Sanger-sequenced the amplicons, and used this sequence to validate the sequence of each genome-transgene junction. In every case tested, the site predicted by esTag-PCR was confirmed to match the predicted genomic location (Figure 4, red icons). Notably, esTag-PCR and saTag-PCR, but not Tag-PCR, predicted an integration in Clone-13 at chromosome 10, position 27,036,080, which was detected in 0.2% of reads. Despite the small number of supporting reads, this site was validated by Sanger sequencing, emphasizing the accuracy and lack of background achievable with either of these methods. Many integration sites predicted by Tag-PCR, but not esTag-PCR, were identical across multiple clones (Figure 4, green dots). True integration at the identical position in multiple samples is unlikely for a randomly integrating vector; however, if the technique produces experimental artifacts, such as amplification of fragments by mis-priming, then specific genomic sites might be more prone to amplification. In fact, we found that 23% of sites predicted by Tag-PCR were detected in multiple clonal cell lines at the identical position, which is highly implausible for random integration. These sites included a cluster of relatively high-frequency predicted integration events in mitochondrial DNA (MT; positions 880 and 2,922). In all three clones, these predicted integration sites were present at a higher frequency than many true integrations, which underscores that filtering integrations on the basis of frequency is not adequate to differentiate true integration from noise with the original Tag-PCR protocol. Collectively, these data demonstrate high accuracy of integration detection using esTag-PCR or saTag-PCR, with minimal background. Although the set of predicted integration events was nearly identical between esTag-PCR and saTag-PCR, the on-target efficiency of esTag-PCR was considerably higher. This might be expected because esTag-PCR contains two rounds of gene-specific nested PCR, in contrast to one round of gene-specific PCR in saTag-PCR. The higher efficiency should reduce the sequencing requirements for esTag-PCR and should increase the sensitivity of esTag-PCR over saTag-PCR for a given amount of raw sequence reads. Although it is possible that saTag-PCR efficiency could be improved with optimization, such as switching to two rounds of gene-specific amplification as with esTag-PCR, the fact that esTag-PCR can be performed using commercially available reagents represents an advantage.

Single-end versus paired-end sequencing for esTag-PCR

For these studies, esTag-PCR was performed using single-end 150 bp Illumina sequencing, which was selected over paired-end sequencing because Read 1 contains the complete transgene/genome junction, and single-end sequencing offers reduced cost and data size. Because paired-end sequencing should generate more sequence coverage over the flanking genomic region, it could improve the ability of the DNA aligner to uniquely place some reads within the genome and might increase detection or accuracy. To test this, we sequenced Clone-6, Clone-13, and Clone-16 using paired-end sequencing and then performed analysis using either Read 1 alone or both reads. The results were virtually identical (data not shown), which suggests that at least in these samples, the genomic sequence provided by single-end 150 bp reads was adequate for placement within the genome. We cannot rule out the possibility that integrations in certain genomic loci, such as highly repetitive or duplicated regions, would benefit from the additional information provided by paired-end data. Furthermore, as paired-end data would provide information about the length of the original tagmented fragment, on the basis of the alignment position of the paired end, paired data can also provide information about the number of unique input molecules for each predicted integration site. This extra information could be especially useful for the evaluation of rare events or those with limited sequence read support.

Sensitivity of integration site detection

To evaluate the sensitivity of esTag-PCR, we performed a serial dilution experiment. DNA from Clone-6, which has four integration sites, was serially diluted using DNA from either non-transduced HepG2 cells or Clone-13, which has a large number of integrations. Each tagmentation reaction used a total of 100 ng input DNA, with the amount of Clone-6 DNA at each dilution step ranging from 100 to 0.2 ng. We performed esTag-PCR using each dilution step, then quantified the fraction of reads or alignments that matched each of the four Clone-6 integration sites (Figure 5). When the dilution was performed using HepG2 DNA, which lacks transgene, the fraction of total reads harboring the Clone-6 integration sites decreased with the decreased amount of Clone-6 DNA, although all four sites were detected at all dilution steps except the lowest (0.2 ng Clone-6 DNA into 100 ng total DNA). This mirrors Figure 3 and indicates that if there are not sufficient on-target molecules, only background sequences will be obtained. Nonetheless, even with very low input DNA from Clone-6, the fraction of aligned reads from each of the four integration sites was high and remained fairly stable across dilution steps. This indicates that a rare integration event in a background of wild-type cells, such as a cell population with low-efficiency transduction, might be detected. When Clone-6 DNA was diluted with Clone-13 DNA, a sample with many integration events that will compete with the Clone-6 integration events, detection of the Clone-6 events was reduced; however, we could detect reads from all four junctions at all dilution steps except the lowest. Unlike the dilution using HepG2 DNA, the fraction of alignments matching Clone-6 decreased substantially with each step, which is due to competition from the Clone-13 transgene-containing reads. This result is intuitive and suggests that if a given integration event is rare within a mixed population of transgene-containing cells, it will be more difficult to detect.

Figure 5

Sensitivity of esTag-PCR

To test the sensitivity of esTag-PCR, DNA from Clone-6, which has 4 verified integration events, was serially diluted using DNA from either untransduced parent HepG2 cells or DNA from Clone-13, which has many integration events. Each tagmentation reaction used 100 ng input DNA, with the amount of Clone-6 DNA at each dilution step ranging from 100 to 0.2 ng. (A) The graphs display the fraction of total reads matching each of the four integration sites from Clone-6 (y-axis), relative to the amount of Clone-6 DNA (x-axis). (B) Similar graph as (A), except that the y-axis displays the fraction of alignments matching each of the four integration sites from Clone-6.

Sensitivity of esTag-PCR To test the sensitivity of esTag-PCR, DNA from Clone-6, which has 4 verified integration events, was serially diluted using DNA from either untransduced parent HepG2 cells or DNA from Clone-13, which has many integration events. Each tagmentation reaction used 100 ng input DNA, with the amount of Clone-6 DNA at each dilution step ranging from 100 to 0.2 ng. (A) The graphs display the fraction of total reads matching each of the four integration sites from Clone-6 (y-axis), relative to the amount of Clone-6 DNA (x-axis). (B) Similar graph as (A), except that the y-axis displays the fraction of alignments matching each of the four integration sites from Clone-6.

Transgene expression relative to copy number

Because our panel of clonal HepG2 lines was generated by random integration, each clone contains a variable transgene copy number, with randomly spaced integration events. The transgene encodes both moxGFP and a secreted form of NanoLuc luciferase (Figure S1). This provides the opportunity to compare transgene expression level with copy number. Using a panel of eight clones, we measured luciferase expression relative to transgene copy number (Figure 6). Although there is a clear trend between copy number and luciferase levels, as might be expected, this pattern is absent for clones with a smaller transgene copy number (<5). This discrepancy is most likely due to unequal expression across integration events, based on the genomic context of each site. It is also possible that some integration events are partially silenced or otherwise not capable of expressing the reporter protein. There was no clear correlation between the locations of integration events relative to the nearest gene and expression level (data not shown). Irrespective of the cause, this demonstrates that reporter gene expression is not strictly correlated with copy number, at least with lower transgene copy numbers. Reporter gene expression by itself may also not be a reliable screening tool for integration, particularly if transgene copy number is a concern.

Figure 6

Transgene expression in clonal cell lines

A panel of eight clonal HepG2 cells were generated, each stably expressing a moxGFP/luciferase expression construct, using piggyBac. (A) The graph displays relative luminescence for each clone. Values are background-subtracted relative to the parent HepG2 cells. (B) The graph shows the number of transgene integration events for each clone. RLU, relative luminescence unit.

Transgene expression in clonal cell lines A panel of eight clonal HepG2 cells were generated, each stably expressing a moxGFP/luciferase expression construct, using piggyBac. (A) The graph displays relative luminescence for each clone. Values are background-subtracted relative to the parent HepG2 cells. (B) The graph shows the number of transgene integration events for each clone. RLU, relative luminescence unit.

Adaptation of esTag-PCR for lentiviral transduction

The esTag-PCR protocol can easily be adapted for any gene delivery system, provided that transgene-specific primers can be designed. To demonstrate applicability to lentiviral systems, we transduced HepG2, HEK293, and primary normal human dermal fibroblast (NHDF) cells with a lentiviral vector encoding the identical moxGFP-P2A-NanoLuc luciferase cassette as the prior piggyBac experiments (Figure S1). The sequences of the transgene-specific primers are listed in Table S1. Although lentiviral transduction efficiency was low in all cases, on the basis of direct visualization using fluorescent microscopy, moxGFP-expressing cells were detected. Bulk cells were collected, which represent a mosaic population with primarily non-transduced cells, and assessed using esTag-PCR. These data returned a high rate of on-target (genome-transgene-spanning) reads. Our analysis software was easily adapted to the structure of the lentiviral transgene, and we successfully mapped integration sites in all cell types (Figure 7).

Figure 7

Adaptation of esTag-PCR for lentiviral transduced cells

Primary normal human dermal fibroblasts (NHDF), HepG2, HEK293, and cells were each transduced with a lentiviral vector. We performed esTag-PCR on each heterogeneous population using primers specific to the lentiviral LTRs. (A–C) Graphs denote the integration sites detected in each cell population, indicated on the graph, with the y-axis indicating the fraction of sequence reads detected per site.

Adaptation of esTag-PCR for lentiviral transduced cells Primary normal human dermal fibroblasts (NHDF), HepG2, HEK293, and cells were each transduced with a lentiviral vector. We performed esTag-PCR on each heterogeneous population using primers specific to the lentiviral LTRs. (A–C) Graphs denote the integration sites detected in each cell population, indicated on the graph, with the y-axis indicating the fraction of sequence reads detected per site.

Integration site mapping in transgenic macaque embryos

Because of the time, effort, and cost associated with generation and manipulation of transgenic embryos, accurate characterization of transgene integration sites, while preserving embryo viability, is critical. To demonstrate the suitability of esTag-PCR for screening transgenic embryos, we generated a panel of rhesus macaque blastocysts from zygotes that were injected with piggyBac transposase mRNA and a transgene/transposon plasmid (Figure 8A). Presumptive zygotes were injected with piggyBac mRNA (30 ng/μL) and transgene-encoding vector (30 ng/μL). TE biopsies were obtained from embryos that developed into blastocysts, with the biopsies being used for whole-genome amplification (WGA) and esTag-PCR. The results for two representative samples are shown in Figures 8B and 8C. Because esTag-PCR returns a high fraction of genome-transgene-spanning reads (in contrast to Tag-PCR or saTag-PCR), these data are presented as the fraction of reads per integration site relative to total sequence reads, as opposed to only junction-spanning reads. We further increased filtering stringency to require at least 10 reads from a given site. For E1118, two sites were predicted by esTag-PCR. Both of these were verified by PCR and Sanger sequencing of the vector-genome junctions (primers listed in Table S2). Importantly, these data demonstrate the ability of esTag-PCR to accurately detect integrations using small DNA input, while preserving the viability of the embryo itself.

Figure 8

esTag-PCR using rhesus macaque embryos

Rhesus macaque embryos were injected with a piggyBac-compatible plasmid and piggyBac RNA. (A) Schematic of transgene delivery and integration site mapping in rhesus macaque embryos. (B and C) Representative plots are shown for each of two embryos, illustrating the genomic location of transgene integration events, detected using esTag-PCR. (D) esTag-PCR was performed on TE biopsies from embryos injected with transposon/transgene plasmid at either 30 or 100 ng/μL. The graph illustrates the number of integration events obtained for each plasmid concentration, demonstrating that the number of integration events is reduced with lower plasmid concentration. (E) A boxplot is shown comparing the number of integrations per cell for embryos that arrested prior to reaching the blastocyst stage (n = 19) compared with embryos that developed to the blastocyst stage (n = 24). (F) The pie chart displays a summary of the location of integration events relative to genes. Sites are categorized as within an exon, within an intron, intergenic, downstream (within 5,000 bp of the transcription start site), upstream (within 5,000 bp of the transcript end), or gene/other (within the gene body, but none of the other categories).

esTag-PCR using rhesus macaque embryos Rhesus macaque embryos were injected with a piggyBac-compatible plasmid and piggyBac RNA. (A) Schematic of transgene delivery and integration site mapping in rhesus macaque embryos. (B and C) Representative plots are shown for each of two embryos, illustrating the genomic location of transgene integration events, detected using esTag-PCR. (D) esTag-PCR was performed on TE biopsies from embryos injected with transposon/transgene plasmid at either 30 or 100 ng/μL. The graph illustrates the number of integration events obtained for each plasmid concentration, demonstrating that the number of integration events is reduced with lower plasmid concentration. (E) A boxplot is shown comparing the number of integrations per cell for embryos that arrested prior to reaching the blastocyst stage (n = 19) compared with embryos that developed to the blastocyst stage (n = 24). (F) The pie chart displays a summary of the location of integration events relative to genes. Sites are categorized as within an exon, within an intron, intergenic, downstream (within 5,000 bp of the transcription start site), upstream (within 5,000 bp of the transcript end), or gene/other (within the gene body, but none of the other categories). We next sought to test whether the concentration of transgene/transposon vector used for injection would affect integration efficiency, measured using the transgene copy number per embryo. Presumptive zygotes were injected with piggyBac mRNA (30 ng/μL) and transgene/transposon plasmid at either 30 ng/μL (n = 175) or 100 ng/μL (n = 95). Injection volume was estimated to be 50–100 pL on the basis of pipette dimensions and calculated flow rate. As such, embryos injected with more concentrated transgene/transposon plasmid should receive more copies of the plasmid. The transgene used in these experiments constitutively expresses the red fluorescent protein mCherry. Embryos were cultured for 7–9 days post-injection and sorted on the basis of red fluorescence. Because non-integrated plasmid can express mCherry, red fluorescence does not necessarily indicate stable transgene integration; however, fluorescence provides a screening method to identify successful injection and the potential for transgene integration. Embryos injected with more concentrated transgene/transposon plasmid had nearly a 2-fold increase in mCherry-positive embryos (11 embryos [6.2%] for 30 ng/μL and 13 embryos [13.7%] for 100 ng/μL). TE biopsy was performed on all mCherry-positive blastocysts, followed by esTag-PCR. A summary of the number of integrations per embryo is shown in Figure 8D. As might be predicted, injection with more concentrated transgene/transposon plasmid resulted in a higher fraction of embryos with stable transgene integration. It should be noted that the number of integrations per embryo was also higher when injected with more concentrated plasmid (mean 1.1 for 30 ng/μL versus 2.8 for 100 ng/μL). These data demonstrate that although increasing plasmid concentration will increase the number of embryos with stable transgene integration, if single integration or low copy number is required, then reduced plasmid concentration may be preferable. After injection, some embryos arrest before reaching the blastocyst stage. We performed esTag-PCR on 19 arrested embryos to compare transgene copy number relative to embryos that developed into blastocysts (Figure 8E). On average, arrested embryos had more integrations/cell (mean 9.6 integrations). If the purpose of gene delivery is to create the progenitor for a transgenic line, unintended consequences from transgene integration may be of great importance. A benefit of esTag-PCR data is that embryos can be screened on the basis of the location and putative impact of integrations. From the panel of macaque embryos, we categorized 443 piggyBac-mediated integration events according to their predicted effects on transcription and protein coding (Figure 8F). Most integration events occurred in intergenic regions (43.0%), although the next highest category was integration into an intron (41.8%). The two categories that might be predicted to have the greatest effect on protein expression, integration into an exon or immediately upstream of a gene (near enhancers and promoters), represented 2.8% and 4.2% of sites, respectively. Although the location of integrations mediated by piggyBac or lentivirus cannot be controlled, these esTag-PCR data demonstrate that embryos can be categorized on the basis of the location and number of integration events, providing a means to screen and select embryos for implantation.

Discussion

Here we present a novel strategy for precise mapping of genomic integration sites. We demonstrate high efficiency and accuracy, with the ability to detect integration location, orientation, and copy number. By using two rounds of gene-specific PCR, esTag-PCR provides a high rate of on-target reads, thereby reducing sequence requirements, reducing cost per experiment, and providing easier discrimination between true integration events and false positives. We validated this method using piggyBac transposase and lentiviral delivery systems. We further demonstrate that esTag-PCR provides an effective means to characterize and screen putative transgenic embryos. Following injection of piggyBac-based editing machinery, embryos are allowed to develop to the blastocyst stage, followed by a TE biopsy and vitrification of the embryo. We show that DNA amplified from TE cells can be used for esTag-PCR, providing the complete set and location of transgene integrations. Although piggyBac and lentiviral integration are quasi-random and can insert into genes, the data obtained by esTag-PCR can be used to exclude embryos with potentially disruptive transgene integrations, and prioritize which embryos are used for implantation. One of the key advantages of esTag-PCR is the relative simplicity and speed of the protocol. Starting with purified DNA, the sequence-ready library can be created in 4–5 h. Because of the high on-target efficiency we demonstrate from esTag-PCR, fewer total sequence reads are required per sample. As such, it is practical to multiplex and sequence esTag-PCR libraries on small-format instruments, such as the Illumina MiSeq or iSeq, further improving data turnaround. Because freezing embryos prior to implantation is a common technique, it is therefore practical to bank and screen panels of putatively edited embryos. The results of esTag-PCR can be used to select optimal embryos for implantation, on the basis of the quantity and location of integration events. Furthermore, because results from esTag-PCR could theoretically be obtained in less than 24 h, it may even be possible to screen samples while maintaining the embryo in culture, thereby avoiding vitrification and allowing transfer of fresh embryos, if desired. Although we demonstrate that esTag-PCR can detect transgene integration mediated by either piggyBac or lentivirus, esTag-PCR can easily be adapted to other delivery systems, or completely different applications. For example, mapping of transposable elements or the integration of viruses (such as HIV/SIV) could be accomplished with esTag-PCR. Furthermore, the analysis software created for this study is powerful and highly adaptable. This software could also be used to analyze data generated by methods besides esTag-PCR. Although not developed specifically for transgene detection, many tagmentation-based protocols have been published in recent years that could be adapted for transgene detection. For example, tagmentation-based tag integration site sequencing (TTISS) is a method to identify Cas-mediated cleavage sites. Similar to GUIDE-seq, TTISS uses donor DNA to tag double-stranded DNA breaks. After tagging, tagmentation is performed using single-adapter Tn5, followed by two rounds of nested PCR targeting the donor sequence. Another potentially useful modification is the use of unique molecular index (UMI) in the Tn5 adapters, as demonstrated by UDiTaS. In this system, the Tn5 adapters each encode a unique molecular index. This sequence is incorporated once during the tagmentation step, thereby tagging each fragment. All molecules amplified from a given template molecule will share this UMI. The UMI can be used to collapse the resulting sequence reads on the basis of template molecule, thereby providing more accurate quantitation of each integration site. Finally, versions of this protocol have been published that employ a biotin-tagged oligos and a streptavidin pull-down to increase sensitivity. Although a strength of esTag-PCR is the relative simplicity and ability to use 100% commercially available reagents, modifications such as these could enhance detection and might be advantageous in some applications. Altogether, this study validates a novel method to accurately map integration events into the genome, with broad utility for in vitro and in vivo genome editing. The system is sensitive and versatile, with potential utility for detecting other genomic alterations, such as mapping transposable elements or endogenous retroviruses. We further show direct applicability to the generation of genome-edited non-human primate embryos, for which it enables rapid screening of embryos prior to implantation while preserving viability, providing a critical tool for the creation of novel transgenic animal models.

Materials and methods

Cell lines and cell culture

Experiments used HepG2 cells, an immortalized human hepatocyte cell line (ATCC HB-8065), and HEK293 cells, an immortalized human kidney cell line (American Type Culture Collection [ATCC] CRL-1573). Cells were cultured in DMEM-F12 supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin.

piggyBac-mediated stable transfection of HepG2 cells and generation of clonal cell lines

HepG2 cells were transfected with a piggyBac-compatible vector containing an expression cassette that is under the control of the CAG promoter (GenBank: OK413188). This cassette allows the expression of moxGFP-NLS and NanoLuc luciferase linked by a P2A site, which is followed by an internal ribosome entry site (IRES) region and a puromycin resistance gene (see plasmid map in Figure S1A). Cells were co-transfected with the transgene/transposon plasmid and a plasmid encoding codon-optimized hyperactive piggyBac transpose at a ratio of 3:1 (transposon/transposase). Transfection was performed using Lipofectamine 3000 (L3000008; Thermo Fisher Scientific), according to the manufacturer’s instructions. After 3 days, cells were trypsinized and plated at low density. After an additional 3 days, cells were placed on puromycin selection. After 10–14 days, moxGFP+ cell-single clusters were identified on fluorescent microscopy, and individual clusters were transferred to wells of a 96-well plate by detaching and aspirating them carefully with a 1 mL filtered tip. After 2 days of attachment, cell colonies were broken up into single cells using trypsin. These clonal populations were further expanded and analyzed.

Lentivirus production and lentiviral-mediated transduction

The identical moxGFP-P2A-NanoLuc cassette was cloned into a lentiviral vector (see plasmid map in Figure S1B and GenBank: OK413189). Lentiviral particles were generated by co-transfecting the plasmid with packaging plasmids according to published protocols. Aliquots of lentiviral containing media were thawed and added directly to HepG2 cells in a 6-well plate. Expression of moxGFP was monitored using fluorescent microscopy. Normal human dermal fibroblast cells were obtained from ATCC (PCS-201-012).

Genomic DNA extraction, WGA, and TE biopsies

For experiments with immortalized cells, genomic DNA (gDNA) was extracted using the GeneJET Genomic DNA Purification Kit (K0722; Thermo Fisher Scientific) according to the manufacturer’s protocol. For experiments with rhesus macaque embryos, TE biopsies were performed (described below), followed by WGA using REPLIg (150345; Qiagen) according to the manufacturer’s instructions. The WGA product was purified using Ampure XP beads (Beckman Coulter) at a bead-to-sample ratio of 0.9.

Tagmentation-assisted PCR and single-adapter tagmentation-assisted PCR

Tagmentation-assisted PCR was performed according to the published protocol. A modified Tag-PCR protocol using custom Tn5, loaded with two copies of the Nextera Read 1 adapter, was used. Custom Tn5 was generated as previously described, loaded with Nextera Read 1 (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG), and tagmentation was performed as described. Briefly, tagmentation was performed with 100 ng gDNA in a tagmentation reaction that included 5.5 μL NIB-HEPES, 2.5 μL 4X TTD buffer, 2 μL DNA, and 1.5 μL Tn5. After preheating the thermocycler to 55°C, the reaction mixture was incubated for 10 min at 55°C, followed by a 10°C hold. After incubation, 1.15 μL 10% SDS (1% final concentration) was added. The fragmented DNA was purified using Ampure XP beads with a bead-to-sample ratio of 0.8 and eluted into 30 μL H2O. Following tagmentation, the first-round PCR was then performed using 25 μL KAPA HiFi HotStart ReadyMix (KK2602; Kapa Biosystems), 1.25 μL of each primer at 10 μM (see Table S1), and 21.25 μL tagmentation product. The following PCR conditions were used: 95°C for 3 min, followed by 30 cycles of 98°C for 30 s, 63°C for 30 s, and 72°C for 1 min, followed by 72°C for 10 min and a 4°C hold. Index PCR was performed using commercial Illumina Nextera XT indexes (product number FC-131-2001). PCR conditions were as follows: 12.5 μL KAPA HiFi HotStart ReadyMix, 1.25 μL of each index primer at 10 μM, 2 μL of the first-round PCR, and 8 μL H2O. PCR conditions for the nested PCR were as follows: 98°C for 45 s, followed by 10 cycles of 98°C for 20 s, 54°C for 30 s, and 72°C for 20 s, ending with 72°C for 1 min and a 4°C hold. The PCR was cleaned using Ampure XP beads at a bead-to-sample ratio of 0.8. Purified samples were quantified using a Qubit Fluorometer (Invitrogen).

Enhanced-specificity Tag-PCR

For esTag-PCR, the initial tagmentation step was performed with 100 ng gDNA or purified WGA product in a tagmentation reaction that included 25 μL TD buffer, 2.5 μL Illumina TDE1 enzyme, 100 ng DNA, and H2O to final volume of 50 μL. After preheating the thermocycler to 58°C, the reaction mixture was incubated for 5 min at 58°C, followed by a 10°C hold. The fragmented DNA was purified using Ampure XP beads with a bead-to-sample ratio of 0.8 and eluted into 30 μL H2O. Following tagmentation, the first-round PCR was then performed using 25 μL KAPA HiFi HotStart ReadyMix, 1.25 μL of each primer at 10 μM (see Table S1), and 21.25 μL tagmentation product. The following PCR conditions were used: 95°C for 3 min, followed by 30 cycles of 98°C for 30 s, 63°C for 30 s, and 72°C for 1 min, followed by 72°C for 10 min and a 4°C hold. A secondary nested PCR was then performed using 12.5 μL KAPA HiFi HotStart ReadyMix, 1.25 μL index primer at 10 μM, 0.75 μL of each transgene-specific primer at 10 μM (see Table S1), 2 μL of the first-round PCR, and 7.5 μL H2O. PCR conditions for the nested PCR were 98°C for 45 s, followed by 10 cycles of 98°C for 20 s, 54°C for 30 s, and 72°C for 20 s, ending with 72°C for 1 min and a 4°C hold. The second PCR adds a unique index to each sample. The PCR was cleaned using Ampure XP beads at a bead-to-sample ratio of 0.8. Purified samples were quantified using a Qubit Fluorometer. Of note, although esTag-PCR was performed in this study using a single Illumina index, the forward primer could be easily adapted to include an index between the P5 and Read 1 sequence. Using the primer PB-3TR-Inner-P5-TruSeq-R1 (Table S1) as an example, an i5 index could be inserted into the TruSeq universal adapter sequence as indicated: AATGATACGGCGACCACCGAGATCTACAC-[i5]-TCTTTCCCTACACGACGCTCTTCCGATCT-ATTTCAAGAATGCATGCGTCA. Although this would require unique primers to be used for each sample in the second PCR, it would significantly increase the number of index combinations and the level of multiplexing that could be performed.

Next-generation sequencing and data analysis

The sequence libraries from Tag-PCR and saTag-PCR were sequenced on an Illumina MiSeq instrument using paired-end 150 bp reads, while esTag-PCR libraries were sequenced using single-end 150 bp reads. The resulting FASTQ data were quality trimmed using Trimmomatic. Next, reads were filtered to retain only reads or read pairs where at least one read contains the terminal 15-mer nucleotide sequence from either terminal end of the transgene, allowing an edit distance of 2 (based on Hamming distance). This was performed using the tool PrintReadsContaining, which we created for this project and distribute as part of the DISCVR-seq software package (https://github.com/bimberlab/discvrseq). For piggyBac experiments, the search sequences AGACTATCTTTCTAGGGTTAA, TTAACCCTAGAAAGATAGTCT, GATTATCTTTCTAGGGTTAA, and TTAACCCTAGAAAGATAATC were used. For lentiviral experiments, the search sequences TGGAAGGGCTAATTCACTCC, AGTGTGGAAAATCTCTAGCA, GGAGTGAATTAGCCCTTCCA, and TGCTAGAGATTTTCCACACT were used. The passing reads were aligned to the reference genome using BWA-mem., Human data were aligned to the GRCh38.p13 genome build (release 98, assembly ID GCA_000001405.28), and rhesus macaque data were aligned to the MMul_10 genome build (release 98, assembly ID GCA_003339765.3). The location and orientation of integration events were then mapped using the tool IntegrationSiteMapper, which we created and made available as part of the DISCVR-seq software package (https://github.com/bimberlab/discvrseq). This tool inspects each alignment, creating a summary table of all unique integration events. Alignments are first grouped by read, and then filtered by mapping quality (MAPQ < 20). Next, alignments are scanned for the sequence of the transgene terminal region(s) (see Table S1) to identify the precise location and orientation of the genome-transgene junctions. Optionally, the tool can export an annotated GenBank-format file containing the reconstructed sequences of the transgene-genome regions. Optionally, it can also design amplification primers for each junction using Primer3Plus. SnpEff was used to categorize the location of integration events, using National Center for Biotechnology Information (NCBI) gene annotations (build 103). All raw sequence data have been submitted to the NCBI Sequence Read Archive (SRA) database under BioProject PRJNA750488. A mapping of sample to SRA ID is available in Table S3. Documentation for IntegrationSiteMapper can be found at https://bimberlab.github.io/DISCVRSeq/toolDoc/com_github_discvrseq_walkers_tagpcr_IntegrationSiteMapper.html.

PCR validation of transgene integration

A subset of integration events was selected for validation by PCR and Sanger sequencing. The list of sites, primers, and annealing temperatures is available in Table S2. For each integration event, PCR was performed in a 25 μL PCR that included 12.5 μL 2X KAPA HiFi HotStart ReadyMix, 10.7 μL ultrapure water, 0.75 μL of each 10 μM forward or reverse primer, and 0.3 μL of 100 ng/μL template gDNA. The cycling protocol was as follows: 95°C for 3 min, then 35 cycles of 98°C for 20 s, amplicon-specific annealing temperature for 15 s, 72°C for 30 s, followed by 72°C for 1 min, and a 4°C hold. Annealing temperatures were gradient-optimized and generally ranged between 68°C and 72°C (see Table S2). PCR products were imaged on a 2% TBE agarose gel. Single bands were PCR-purified using GeneJET PCR Purification Kit (K0702; Thermo Fisher Scientific) and Sanger-sequenced. If there were multiple bands, the band of the correct size was excised and gel-purified using PureLink Quick Gel Extraction Kit (K210012; Thermo Fisher Scientific) and Sanger-sequenced. The resulting sequence traces were analyzed using either SnapGene or Geneious software and mapped to the appropriate species-specific genome using NCBI BLAST+.

Quantification of moxGFP and luciferase expression

Quantification of mean moxGFP fluorescence was performed using flow cytometry. Luciferase expression was monitored by a luminescence-based assay. For this assay, 10 μL cell culture supernatant was plated on a white 96-well plate and combined with 100 μL PBS-T (0.1% Tween 20) containing a 1:1,000 dilution of 1 mM Coelenterazine N stock solution (PJK Biotech) dissolved in acidified methanol. Luminescence was detected directly afterward using a VICTOR X Light Multilabel plate reader (PerkinElmer).

Animals

Rhesus macaques were socially housed at the Oregon National Primate Research Center (ONPRC) in animal biosafety level 2 rooms with autonomously controlled temperature, humidity, and lighting. Rhesus macaques were fed commercially prepared primate chow twice daily and received supplemental fresh fruit or vegetables daily. Fresh, potable water was provided via automatic water systems. Animal care and all experimental protocols and procedures were approved by the ONPRC Institutional Animal Care and Use Committee (IACUC). The ONPRC is a Category I facility. The Laboratory Animal Care and Use Program at the ONPRC is fully accredited by the American Association for Accreditation of Laboratory Animal Care and has an approved assurance (A3304-01) for the care and use of animals on file with the National Institutes of Health Office for Protection from Research Risks. ONPRC adheres to national guidelines established in the Animal Welfare Act (7 U.S. Code §§ 2131–2159) and the Guide for the Care and Use of Laboratory Animals (8th ed.), as mandated by U.S. Public Health Service policy.

Oocyte collection and in vitro fertilization

The ONPRC Assisted Reproductive Technologies (ART) Core provided gametes and performed in vitro fertilization (IVF) according to published protocols. Sexually mature rhesus monkeys with normal ovarian cyclicity were treated with a standard 10 day controlled ovarian stimulation cycle regimen as previously described to produce multiple pre-ovulatory follicles containing mature ova. Prior to ovulatory events, ultrasound-guided percutaneous follicle aspiration was performed, and recovered oocytes were isolated into warmed TALP-HEPES containing 0.3% BSA with 5 IU/mL heparin. Cumulus enclosed metaphase II (MII) ova were washed into pre-equilibrated 100 μL drops of BO-IVF (IVF Bioscience, Falmouth, Cornwall, UK) under oil with 5 ova/drop and incubated for approximately 4 h at 37°C in humidified 5% CO2 in air until insemination. Rhesus macaque semen was collected from a male at the same day of oocyte collection and the sperm washed in warmed TALP-HEPES to a final concentration of 20 million sperm/mL. A standard in vitro fertilization protocol was followed with sperm activated by 1 mM caffeine + 1 mM dibutyryl-cAMP for 15 min before adding 10 μL to each culture drops containing ova.

Injection of plasmid and piggyBac mRNA into rhesus macaque zygotes

Approximately 14 h post-insemination, presumptive zygotes were washed to remove sperm attached to the zona pellucida and transferred into a warmed TALP-HEPES drop covered with oil. Injection material containing 30 or 100 ng/μL piggyBac-compatible transgene plasmid, along with 30 ng/μL of piggyBac mRNA (Hera BioLabs) was back-loaded into a glass-pulled micro-injection pipette. Zygotes were stabilized by gentle suction onto a glass holding pipette (Cooper Surgical), and material was injected under continuous positive flow into the cytoplasm of zygotes, using a Narishige micro-manipulator (Narishige International, Amityville, NY). Injected embryos were then transferred into embryo culture media (BO-IVC; IVF Bioscience), covered with oil, and cultured in a 6%, 5%, 89% mixture of CO2, O2, and N2 at 37°C in humidified air, respectively.

TE cell biopsy from piggyBac-injected blastocysts

Once embryos developed into expanded blastocysts, between days 7 and 9 post-insemination, a TE biopsy was performed as previously described. Briefly, individual blastocysts were placed in a 20 μL drop of warmed TALP-HEPES under oil and stabilized with a holding pipette. An objective-mounted laser (ZYRCOS, Hamilton Thorne, Beverly, MA) was used to create a rent in the zona pellucida, large enough to pass the glass biopsy pipette (Cooper Surgical) to aspirate TE cells following laser dissection of 10–15 cells. Biopsied TE cells were transferred into a PCR tube to conduct WGA and to identify the integration of plasmid. Embryos are subjectively assessed before and during biopsy and graded as poor quality, medium quality, or good quality. Embryos are subjectively assessed again after thawing, mainly on the basis of re-expansion of the blastocoel. The biopsied blastocyst was then vitrified in a 0.25 mL straw (IMV Technologies, Maple Grove, MN) using the DMSO Blastocyst Vitrification Kit (LifeGlobal Group, Guilford, CT) according to the manufacturer’s guidelines. Embryos that arrested during mitosis and failed to develop to the blastocyst stage underwent complete zona pellucida removal by brief exposure to acidic Tyrode’s solution (prepared in house according to Ramsey and Hanna) and were stored for subsequent analysis of transgene integration into the genome.

32 in total

Review 1. Lentiviral vectors: basic to translational.

Authors: Toshie Sakuma; Michael A Barry; Yasuhiro Ikeda
Journal: Biochem J Date: 2012-05-01 Impact factor: 3.857

2. In vivo footprinting of a muscle specific enhancer by ligation mediated PCR.

Authors: P R Mueller; B Wold
Journal: Science Date: 1989-11-10 Impact factor: 47.728

3. High-resolution insertion-site analysis by linear amplification-mediated PCR (LAM-PCR).

Authors: Manfred Schmidt; Kerstin Schwarzwaelder; Cynthia Bartholomae; Karim Zaoui; Claudia Ball; Ingo Pilz; Sandra Braun; Hanno Glimm; Christof von Kalle
Journal: Nat Methods Date: 2007-12 Impact factor: 28.547

4. Efficient mapping of transgene integration sites and local structural changes in Cre transgenic mice using targeted locus amplification.

Authors: Carol Cain-Hom; Erik Splinter; Max van Min; Marieke Simonis; Monique van de Heijning; Maria Martinez; Vida Asghari; J Colin Cox; Søren Warming
Journal: Nucleic Acids Res Date: 2017-05-05 Impact factor: 16.971

5. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition.

Authors: Andrew Adey; Hilary G Morrison; Xu Xun; Jacob O Kitzman; Emily H Turner; Bethany Stackhouse; Alexandra P MacKenzie; Nicholas C Caruccio; Xiuqing Zhang; Jay Shendure
Journal: Genome Biol Date: 2010-12-08 Impact factor: 13.583

6. CRISPR-Cas12a-assisted PCR tagging of mammalian genes.

Authors: Julia Fueller; Konrad Herbst; Matthias Meurer; Krisztina Gubicza; Bahtiyar Kurtulmus; Julia D Knopf; Daniel Kirrmaier; Benjamin C Buchmuller; Gislene Pereira; Marius K Lemberg; Michael Knop
Journal: J Cell Biol Date: 2020-06-01 Impact factor: 10.539