Gene therapies using integrating retrovirus vectors to modify hematopoietic stem and progenitor cells have shown great promise for the treatment of immune system and hematologic diseases. However, activation of proto-oncogenes via insertional mutagenesis has resulted in the development of leukemia. We have utilized cellular bar coding to investigate the impact of different vector designs on the clonal behavior of hematopoietic stem and progenitor cells (HSPCs) during in vivo expansion, as a quantitative surrogate assay for genotoxicity in a non-human primate model with high relevance for human biology. We transplanted two rhesus macaques with autologous CD34+ HSPCs transduced with three lentiviral vectors containing different promoters and/or enhancers of a predicted range of genotoxicities, each containing a high-diversity barcode library that uniquely tags each individual transduced HSPC. Analysis of clonal output from thousands of individual HSPCs transduced with these barcoded vectors revealed sustained clonal diversity, with no progressive dominance of clones containing any of the three vectors for up to almost 3 years post-transplantation. Our data support a low genotoxic risk for lentivirus vectors in HSPCs, even those containing strong promoters and/or enhancers. Additionally, this flexible system can be used for the testing of future vector designs.
Gene therapies using integrating retrovirus vectors to modify hematopoietic stem and progenitor cells have shown great promise for the treatment of immune system and hematologic diseases. However, activation of proto-oncogenes via insertional mutagenesis has resulted in the development of leukemia. We have utilized cellular bar coding to investigate the impact of different vector designs on the clonal behavior of hematopoietic stem and progenitor cells (HSPCs) during in vivo expansion, as a quantitative surrogate assay for genotoxicity in a non-human primate model with high relevance for human biology. We transplanted two rhesus macaques with autologous CD34+ HSPCs transduced with three lentiviral vectors containing different promoters and/or enhancers of a predicted range of genotoxicities, each containing a high-diversity barcode library that uniquely tags each individual transduced HSPC. Analysis of clonal output from thousands of individual HSPCs transduced with these barcoded vectors revealed sustained clonal diversity, with no progressive dominance of clones containing any of the three vectors for up to almost 3 years post-transplantation. Our data support a low genotoxic risk for lentivirus vectors in HSPCs, even those containing strong promoters and/or enhancers. Additionally, this flexible system can be used for the testing of future vector designs.
The first clinical trials demonstrating clear clinical improvement following transplantation of genetically modified hematopoietic stem and progenitor cells were reported 15 years ago, utilizing γ-retrovirus vectors based on the Moloney murine leukemia virus (MLV) to treat congenital immunodeficiency disorders.1, 2 However, soon thereafter, patients developed both myeloid and lymphoid acute leukemias due to insertional activation of nearby proto-oncogenes by vector enhancer sequences.3, 4, 5, 6, 7 Lentivirus vectors (LVs) with deletion of viral enhancers were already in development, and they have subsequently been used in promising hematopoietic stem and progenitor cell (HSPC) gene therapy clinical trials; but, they have also been linked to at least one instance of clonal expansion.8, 9, 10A number of murine models and in vitro assays have been explored for the prediction of HSPC genotoxicity, but each has potential limitations. Tracking of insertion sites following transplantation of transduced normal murine HSPCs requires large numbers of mice to track relatively low numbers of clones, and very long-term follow-up and/or secondary transplants are required to uncover clonal dominance and transformation.11, 12, 13 Murinetumor-prone models have more rapid and penetrant tumor onset; however, the genes linked to tumors in these models do not closely match the responsible loci in human clinical trials, potentially decreasing preclinical utility.14, 15
In vitro immortalization of murine HPSCs has been developed as a more practical and rapid screening approach; however, this assay is difficult to quantitate; captures myeloid, but not lymphoid, transformation; and identifies only insertions activating Evi1 pathways.16, 17, 18Both in vivo and in vitro murine assays may lack predictive value for humans due to differences in HSPC properties between species, including lifetime replicative demand, HSPC frequency, ease of immortalization, telomere length, and spectrum of common hematologic tumors. Non-human primates (NHPs) are phylogenetically closely related to humans, and they have been predictive of human clinical results regarding gene transfer efficacy and insertion site patterns.19, 20 While NHP models are expensive and technically demanding, they can be used to study the behavior of thousands of vector insertion sites in a small cohort of animals over time. However, development of tumors in NHPs is rare and requires long-term follow-up;21, 22 thus, an approach to monitor genotoxic clonal expansion prior to actual malignant transformation would be of utility.Insertion site identification and retrieval methodologies have evolved over time from the detection of restriction fragment length polymorphisms to PCR-based approaches for the retrieval of genomic sequences flanking proviral insertions. Linear amplification-mediated PCR (LAM-PCR) and modifications avoiding bias associated with restriction enzyme fragment generation have been the preferred methods for retrieving proviral integrations in both preclinical and clinical studies, and they have proven invaluable for identifying insertions associated with tumors, as well as providing an overview of insertion patterns with various vector classes.4, 23, 24 However, these insertion retrieval approaches require large amounts of DNA and are semiquantitative, thus unable to reproducibly quantitate several fold changes in individual clonal contributions. An alternative for clonal tracking is the inclusion of high-diversity barcodes within the vector backbone. In both murine and NHP studies, HSPC clonal dynamics have been mapped in a quantitative and efficient manner using this approach.We now utilize barcoding to investigate the impact of specific promoter and enhancer elements within HIV-derived LVs on clonal dynamics. We focused on LVs because they are the current standard for efficient transduction of HSPCs in the clinical setting, having replaced MLV due to higher efficiency and lack of overt genotoxicity to date. We assayed vectors containing one of three promoter/enhancers with a range of predicted genotoxicity. Our goal was to develop an approach for screening genotoxicity in an animal model that is predictive of results in human HSPC gene transfer applications.
Results
Development of an In Vivo Platform to Study Relative Vector Toxicity
We generated three high-diversity barcode libraries containing one of three 6-bp library IDs followed by 35 random base pair barcodes and flanked by sequences complementary to Illumina sequencing primers, and we inserted these cassettes into the U3 region of three different lentiviral vector plasmids (Figure 1).26, 27 Each vector also contains an internal GFP marker for the evaluation of transduction efficiencies and tracking. Following integration, the provirus contains the barcode in both long terminal repeats (LTRs) (Figure 1). The vectors were chosen to reflect a predicted range of genotoxicities. The elongation factor-1 alpha (EF1-α) vector contains an internal relatively weak EF1-α promoter and no additional promoter/enhancer within the LTR, and it would be predicted to be the least genotoxic. The murine stem cell virus (MSCV) vector contains the medium-strength MSCV promoter/enhancer cloned into the U3 region of the LTR. The spleen focus-forming virus (SFFV)-barcoded lentiviral vector contains the very strong SFFV promoter/enhancer within the U3 region of the LTR, predicted to be the most genotoxic LV design based on in vitro immortalization and murinetumor-prone mouse assays.15, 28
Figure 1
Design of Barcoded Vectors
Three high-diversity barcode libraries were cloned upstream of the U3 region of the long terminal repeat (LTR) in three lentiviral vectors. The structure of the proviral integrated form is shown schematically for each vector: (A) Internal human elongation factor 1-α (EF1-α) promoter driving GFP expression, (B) murine stem cell virus (MSCV) promoter/enhancer in the LTR driving GFP expression, and (C) spleen focus-forming virus (SFFV) viral promoter/enhancer driving GFP expression.
Design of Barcoded VectorsThree high-diversity barcode libraries were cloned upstream of the U3 region of the long terminal repeat (LTR) in three lentiviral vectors. The structure of the proviral integrated form is shown schematically for each vector: (A) Internal human elongation factor 1-α (EF1-α) promoter driving GFP expression, (B) murine stem cell virus (MSCV) promoter/enhancer in the LTR driving GFP expression, and (C) spleen focus-forming virus (SFFV) viral promoter/enhancer driving GFP expression.We confirmed that all three vectors contained the expected library ID and retained the ability to transduce and integrate in K562 cells, despite insertion of the barcode cassettes into the LTR (Figure 2). Promoter strength in the three vectors was assessed in transduced K562 populations by measuring GFP expression via flow cytometry. As expected, cells transduced with the SFFV vector had the brightest GFP expression, followed by the MSCV vector and then the EF1-α vector with stepwise dimmer GFP expression (Figure 2A). The larger the separation between GFP+ and GFP− populations, the higher the separation index (SI) value and the stronger the promoter/enhancer. As expected, SFFV-transduced K562 cells had the highest SI followed by MSCV and then EF1-α with the lowest SI value (Figure 2B).
Figure 2
Analysis of Comparative Promoter Strength of Each Lentiviral Vector
(A) Flow cytometric analysis of K562 cells transduced with the different vectors at the same MOI demonstrate varying intensities of GFP expression within the GFP+ cell population, with SFFV > MSCV > EF1α. (B) Separation Index (SI) was evaluated to compare the distance between positive and negative GFP populations for each vector. The higher the SI value, the larger the separation between the peaks and the stronger the expression from each vector’s promoter/enhancer.
Analysis of Comparative Promoter Strength of Each Lentiviral Vector(A) Flow cytometric analysis of K562 cells transduced with the different vectors at the same MOI demonstrate varying intensities of GFP expression within the GFP+ cell population, with SFFV > MSCV > EF1α. (B) Separation Index (SI) was evaluated to compare the distance between positive and negative GFP populations for each vector. The higher the SI value, the larger the separation between the peaks and the stronger the expression from each vector’s promoter/enhancer.Barcode library diversity for each vector construct was assessed via the transduction of K562 cells, barcode retrieval via PCR, and Illumina sequencing. Monte Carlo simulations were performed on K562 cells transduced with each vector preparation to ensure that each of the three barcoded libraries was diverse enough to result in a 95% chance that each barcode was present in a single engrafting HSPC (Figure S1). These simulations indicated that the EF1-α, MSCV, and SFFV lentiviral libraries could transduce 1,547, 1,986, and 1,943 long-term engrafting HSPCs, respectively, with >95% confidence (i.e., p < 0.05) that >95% of individual barcodes would mark a single cell. Based on frequencies of long-term engrafting macaque HSPCs within CD34+ populations estimated via integration site retrieval, flow cytometric phenotyping for primitive markers, or immunodeficientmouse engraftment,23, 29, 30 the diversities of the three libraries were sufficient to transduce the rhesus macaqueCD34+ HSPC grafts utilized in our experimental design.Peripheral blood CD34+ HSPCs collected following G-CSF/plerixafor mobilization from two rhesus macaques were split into three equal fractions, and each fraction was transduced independently with one of the three lentiviral vectors (Figure 3A). A transduction efficiency of no greater than 30% was targeted in order to favor transduction of individual HSPCs with no more than a single vector copy, containing a unique barcode.27, 31 The three transduced cell fractions were then combined and reinfused simultaneously into the autologous rhesus macaque recipient, following delivery of 1,000 rads ablative total body irradiation to the animal (Figure 3A). Information regarding CD34+ cell collection, transduction, and transplantation for both animals is summarized in Table 1. Following infusion of the transduced CD34+ cells, the animals recovered blood counts promptly. GFP expression in peripheral blood granulocytes, monocytes, B cells, and T cells was assessed over time by flow cytometry, and both animals showed relatively low but stable GFP percentages following engraftment (Figure 3B), well within the target range calculated by Poisson distribution analysis predicting the fraction of HSPCs containing a single barcode (Figure S2).
Figure 3
Experimental Design and GFP Marking Post-transplantation
(A) Experimental design. Rhesus macaque peripheral blood stem and progenitor cells were mobilized into the blood with G-CSF and plerixafor and collected via apheresis. CD34+ HSPCs were enriched via immunoabsorption. CD34+ cells were split into 3 equal aliquots and each fraction was transduced with one of three LVs as shown. Following transduction, the cells were collected, combined, and reinfused into the autolous macaque after the macaque completed total-body irradiation (TBI) of 1,000 rads. (B) Percentage of GFP+ cells in peripheral blood granulocytes, B cells, and T cells tracked over time to the longest follow-up post-transplantation in animals ZJ41 (38 months) and ZJ48 (33 months).
Table 1
Macaque Information
Animal ID
ZJ41
ZJ48
Sex
male
female
Date of birth
6/29/2011
7/12/2011
Weight (kg)
5.82
4.86
Mobilization cytokines
G-CSF, AMD3100
G-CSF, AMD3100
Date of transplant
2/11/2015
7/15/2015
Transduction efficiency
NA
SFFV (20.3%)
MSCV (6.82%)
EF1a (7.95%)
CD34+ cells transduced
3.60E+07
2.82E+07
CD34+ cells infused
6.00E+07
2.84E+07
Age day of transplant
43 months 13 days
48 months 3 days
Infused cells/kg
1.03E+07
5.84E+06
SFFV virus titer
8.10E+07
2.90E+08
MSCV virus titer
1.90E+08
1.30E+08
EF1a virus titer
2.50E+08
2.60E+08
Experimental Design and GFP Marking Post-transplantation(A) Experimental design. Rhesus macaque peripheral blood stem and progenitor cells were mobilized into the blood with G-CSF and plerixafor and collected via apheresis. CD34+ HSPCs were enriched via immunoabsorption. CD34+ cells were split into 3 equal aliquots and each fraction was transduced with one of three LVs as shown. Following transduction, the cells were collected, combined, and reinfused into the autolous macaque after the macaque completed total-body irradiation (TBI) of 1,000 rads. (B) Percentage of GFP+ cells in peripheral blood granulocytes, B cells, and T cells tracked over time to the longest follow-up post-transplantation in animals ZJ41 (38 months) and ZJ48 (33 months).Macaque Information
Post-transplantation Analysis of Vector Clonal Contributions over Time
To assess clonal patterns in vivo at time points from 1 month through 33 and 38 months post-transplantation, blood was obtained, and granulocytes, T cells, and B cells were lineage purified by flow cytrometry, followed by DNA extraction, low-cycle PCR with primers bracketing the library IDs and barcodes, high-throughput Illumina sequencing, and data processing to retrieve and quantitate the read fraction for each individual barcode linked to one of the three vector-associated library IDs. To be included on a master clone list for tracking and analysis, a barcode had be retrieved at a read count of at least 2,000 at one or more time points, as detailed in the Materials and Methods, in order to exclude potential false barcodes and to account for sampling constraints.The overall level of contribution for each of the three vectors in a sample was assessed via normalization of the read number of each of the three library IDs to the total overall reads containing a valid library ID in that sample. We observed that the relative contributions of each of the three vectors to ongoing hematopoiesis were not uniform in either animal. In ZJ41, the SFFV vector accounted for over 90% of the reads in almost every sample in all three hematopoietic lineages, with far lower contributions from cells containing the MSCV or EF1-α vectors. In contrast, in animal ZJ48, the MSCV and EF1-α contributions were much higher and quite similar, with much lower contributions from the SFFV (Figure 4A). We have no clear explanation for the observed differences in transduction efficiency of engrafting cells between the different vectors within each animal; but, given that each vector performed well in at least one animal, the variability does not reflect vector design and instead likely resulted from some unknown characteristic of the actual viral preparation used for transplantation or variability in handling of the cell fractions during transduction with each vector. To rule out any technical variability or bias regarding barcode retrieval from each of the three vectors, a range of mixtures of single-copy barcoded K562-transduced cell clones for each of the three vectors was conducted (Figure S3). Barcode retrieval quantitation was closely correlated with the expected barcode frequency based on Pearson correlation coefficients of 0.9961, 0.999, and 0.9994 for EF1-α, MSCV, and SFFV, respectively, as our lab has previously reported for other vector designs.
Figure 4
Vector Contributions and Clonal Diversity of EF1-α, MSCV, and SFFV Barcoded Vectors
(A) The percentage contribution of each vector (EF1-α, MSCV, and SFFV) to total transduced hematopoiesis in granulocytes, B cells, and T cells purified from the blood of animals ZJ41 and ZJ48 over time post-transplantation. The read number of each vector’s library ID in a sample over the total read numbers for all library IDs in the sample ×100 is plotted. (B) Number of unique barcodes for each vector (EF1-α, MSCV, and SFFV) retrieved from all lineages (granulocytes, B cells, and T cells) at a single time point for each vector in ZJ41 and ZJ48. A threshold of 2,000 reads was applied to establish a master list of barcodes for each vector within each animal: to include a barcode on the master list, it must have contributed 2,000 reads at at least one time point. Once established on the master list, this barcode would be counted even at time points where it contributed less than 2,000 reads. The same barcode found contributing to more than one lineage was counted only once. (C) Cumulative number of unique barcodes retrieved for each vector (EF1-α, MSCV, and SFFV) from all three lineages combined (granulocytes, B cells, and T cells) over time in ZJ41 and ZJ48. (D) Simpson’s diversity index for all lineages combined (granulocytes, B cells, and T cells) combined in ZJ41 and ZJ48.
Vector Contributions and Clonal Diversity of EF1-α, MSCV, and SFFV Barcoded Vectors(A) The percentage contribution of each vector (EF1-α, MSCV, and SFFV) to total transduced hematopoiesis in granulocytes, B cells, and T cells purified from the blood of animals ZJ41 and ZJ48 over time post-transplantation. The read number of each vector’s library ID in a sample over the total read numbers for all library IDs in the sample ×100 is plotted. (B) Number of unique barcodes for each vector (EF1-α, MSCV, and SFFV) retrieved from all lineages (granulocytes, B cells, and T cells) at a single time point for each vector in ZJ41 and ZJ48. A threshold of 2,000 reads was applied to establish a master list of barcodes for each vector within each animal: to include a barcode on the master list, it must have contributed 2,000 reads at at least one time point. Once established on the master list, this barcode would be counted even at time points where it contributed less than 2,000 reads. The same barcode found contributing to more than one lineage was counted only once. (C) Cumulative number of unique barcodes retrieved for each vector (EF1-α, MSCV, and SFFV) from all three lineages combined (granulocytes, B cells, and T cells) over time in ZJ41 and ZJ48. (D) Simpson’s diversity index for all lineages combined (granulocytes, B cells, and T cells) combined in ZJ41 and ZJ48.
Clonal Diversity and Clonal Dynamics over Time
Quantitative tracking of clonal contributions from each unique barcode allows tracking of output from large numbers of individual HSPCs, and it allows for the assessment of clonal dominance linked to LV design. Both animals had individual barcode contributions assessed for each vector at each time point through 33–38 months. The number of unique barcodes obtained from granulocytes, B cells, and/or T cells at each time point for the given vectors was counted (Figure 4B). At month 1, a total of over 2,000 unique barcoded contributing HSPC clones, derived from all three vectors, was detected in both ZJ41 and ZJ48. By 2–3 months, the number of barcoded contributing clones detected decreased, as we have previously reported, as contributions from waves of abundant short-term repopulating HSPCs declined and were replaced by contributions from less abundant intermediate and long-term repopulating HSPCs.27, 32 Thereafter, clone numbers contributing at later time points remained relatively stable, with some late gradual decline in both animals, likely due to a loss of intermediate-contributing clones. The cumulative number of barcodes retrieved plateaued from all three vectors in both animals by 6–12 months post-transplantation, indicating clone retrieval is efficient and that a few dormant HSPCs were present that began contributing only at later time points (Figure 4C). As expected, application of different thresholds for the inclusion of clones in the enumeration of clone numbers altered the numbers of clones retrieved, but it did not change the overall patterns obtained over time, the critical issue for genotoxicty analysis (Figure S4).Although each animal had differences in the absolute number of clones tracked among the three vectors, similar total numbers of clones were tracked and assessed for clonal expansion: 7,314 for SFFV, 5,488 for MSCV, and 4,450 for EF1-α when using a threshold of 2,000 reads at one or more time points to be included in the master clone list. While obtaining similar marking levels for all three vectors within an animal would have been ideal, it is not crucial since the barcode permits tracking of contributions from each individual HSPC clone independently.As an overall population measure of in vivo clonal diversity, both the Gini-Simpson’s and Shannon’s diversity indexes were calculated. Both Simpson and Shannon indexes measure diversity by taking into account the richness (the number of barcodes present) and evenness of the measured barcodes. However, the Gini-Simpson diversity index provides the option to calculate diversity solely on the eveness of the barcodes measured, independent of sample richness. Because vector marking was not equally balanced among the 3 different vectors in both animals, the Gini-Simpson diversity index provides a more applicable representation in the context of our data. As shown in Figure 4D, the Simpson diversity index for the vectors with reasonable HSPC engraftment in ZJ41 (SFFV) and ZJ48 (MSCV and EF1-α) maintained a high-diversity index for all three lineages. For comparison, the more commonly used Shannon diversity index was also calculated and demonstrated similar results (Figure S5) as the Gini-Simpson, where all three libraries with reasonable HSPC engraftment maintained a high-diversity index over time, with no evidence for progression to oligoclonality with any of the three vectors.
Clonal Stability of Vectors by Their Top-Contributing Clones
To identify clonal expansion, we evaluated the top-contributing clones for each of the vectors in each lineage, focusing on the largest clones at the time point of longest follow-up as representing potentially expanding clones and tracking each of these clones over all time points. Due to their short half-life and lack of peripheral expansion, granulocytes best represent ongoing hematopoiesis from HSPCs in the context of this study; however, we also performed these analyses on B cells and T cells. Figure 5 shows the contributions mapped back over time from the top 20 clones detected in each lineage for each vector at the latest time point of 33 and 38 months, respectively, in the two animals. As shown in Figure S4, these plots are virtually identical when applying higher or lower thresholds for the inclusion of clones, indicating that the setting of thresholds only impacts clones contributing very low levels at all time points.
Figure 5
Clonal Stability of Top-Contributing Clones
Stacked area plots delineating the percentage contribution of the top 20 contributing clones in each lineage for each vector at the time of longest follow-up (38 months for ZJ41 and 33 months for ZJ48), tracked over all time points. Each of the top 20 clones is shown as a separate shaded region in the stack. Contributions from non-top 20 clones are shown as the remaining single solid color.
Clonal Stability of Top-Contributing ClonesStacked area plots delineating the percentage contribution of the top 20 contributing clones in each lineage for each vector at the time of longest follow-up (38 months for ZJ41 and 33 months for ZJ48), tracked over all time points. Each of the top 20 clones is shown as a separate shaded region in the stack. Contributions from non-top 20 clones are shown as the remaining single solid color.Long-term engrafting clones, as expected from our prior study of the dynamics of HSPC clonal reconstitution, first appeared and began replacing contributions from shorter-term progenitors between 2 and 12 months. The contributions from these long-term clones increased sharply at the time of appearance as expected, and then they began to plateau. There was some relative increase in contributions for all these large long-term clones over time as short- and intermediate-term progenitors eventually disappeared (see Figure 4B), a process that appeared slower in ZJ41. Stabilization of lymphoid production from long-term HSPCs, particularly of T cells, was delayed compared to granulocytes, as previously reported,27, 33 likely due to thymic damage from irradiation. As seen in the stacked area plots for all three vectors, the relative contributions from these clones to granulocytes, T cells, and B cells were then overall relatively stable for the duration of follow-up. No individual clone(s) showed a pattern of disproportionate outgrowth expected for genotoxic expansions, in contrast to the overall slow increases in relative contributions from long-term engrafting cells as compared to the intermediate- or short-term HSPCs observed. No clone size was larger than 4.84%.To assess comparative lineage contributions for these larger clones, we prepared heatmaps showing the contributions of each individual clone to each lineage over time for SFFV in ZJ41 and MSCV and EF1-α in ZJ48. These heatmaps show that the 20 largest clones in each lineage were primarily balanced multipotent clones (Figure 6), and they contributed at relatively similar levels to all lineages, in contrast to clones emerging and becoming malignant in the MLV clinical trials. In conclusion, we observed no patterns of clonal expansion of HSPC clones transduced with any of the three vectors consistent with genotoxicty.
Figure 6
Heatmaps Showing Lineage Contributions of Top-Contributing Clones
Each row in the heatmap corresponds to a barcode and each column to a sample. The barcodes are ordered by unsupervised hierarchical clustering of the Euclidean distance between barcodes’ log fractional abundances in the samples. The color gradient depicts the log fractional contribution of individual barcodes to each sample. The contributions from the 20 most abundant barcodes identified in samples from every linage and time point are plotted over all lineages and time points. The top 20 most abundant barcodes in an individual sample are designed with asterisks, and each column thus contains 20 asterisks. Each row (barcode) contains a minimum of one asterisk, because a barcode must have been a top 20 contributor to at least one sample to be included in the heatmap. (A) Top 20 EF1-α clones at each time point in each lineage mapped for animal ZJ48. (B) Top 20 MSCV clones at each time point in each lineage mapped for animal ZJ48. (C) Top 20 SFFV clones at each time point in each lineage mapped for animal ZJ41. Short-term lineage-restricted progenitors contributing at months 1–3 can be seen being replaced by long-term multilineage progenitors.
Heatmaps Showing Lineage Contributions of Top-Contributing ClonesEach row in the heatmap corresponds to a barcode and each column to a sample. The barcodes are ordered by unsupervised hierarchical clustering of the Euclidean distance between barcodes’ log fractional abundances in the samples. The color gradient depicts the log fractional contribution of individual barcodes to each sample. The contributions from the 20 most abundant barcodes identified in samples from every linage and time point are plotted over all lineages and time points. The top 20 most abundant barcodes in an individual sample are designed with asterisks, and each column thus contains 20 asterisks. Each row (barcode) contains a minimum of one asterisk, because a barcode must have been a top 20 contributor to at least one sample to be included in the heatmap. (A) Top 20 EF1-α clones at each time point in each lineage mapped for animal ZJ48. (B) Top 20 MSCV clones at each time point in each lineage mapped for animal ZJ48. (C) Top 20 SFFV clones at each time point in each lineage mapped for animal ZJ41. Short-term lineage-restricted progenitors contributing at months 1–3 can be seen being replaced by long-term multilineage progenitors.
Discussion
We developed a highly quantitative and sensitive technology to track the output of thousands of individual transduced HSPCs in the rhesus macaque model, and we applied this approach to assess potential genotoxicity of clinically relevant LVs with a range of predicted genotoxicity, based on murine and in vitro models. Inclusion of an internal EF1-α promoter was predicted to have the least genotoxic effect, followed by an MSCV promoter/enhancer located within the lentiviral LTR, and finally a very strong SFFV promoter/enhancer within the LTR.34, 35 In the animals included in this study, we did not achieve balanced initial levels of engrafted HSPCs transduced with each of the three vectors in the individual animals, due to unknown factors resulting in variability during ex vivo transduction or transplantation; however, a total of at least 4,450 clones was tracked for each of the three vectors between the two animals. Clonal diversity and stability were observed in the absence of any ongoing clonal expansions for 33 and 38 months follow-up to date in the two animals.The utility of this type of preclinical in vivo NHP model for insertion-related clonal expansions depends on whether such expansions predict and generally precede later overt clinical transformation to myelodysplastic syndrome (MDS) or leukemia. In trials for both chronic granulomatous disease (CGD) and Wiskott-Aldrich syndrome (WAS) utilizing γ-retrovirus vectors with very strong viral promoter/enhancers, myeloid transformations were preceded by a steady increase in contributions from clones harboring insertions activating MECOM or other proto-oncogenes, preceding overt MDS or acute myeloid leukemia (AML) by a year or more.7, 36, 37 Clinical transformation was accompanied by the acquisition of second hits such as gross chromosomal losses or translocations. Acute T cell leukemias occurring in γ-retroviral clinical trials have been less clearly preceded by long periods of detectable clonal expansions. The initial report from the French X-linked severe combined immunodeficiency (X-SCID) trial detected an expanding clone 2 years before leukemia in one patient; however, the British X-SCID and German WAS trials reported much shorter windows of detection prior to overt T cell acute lymphoblastic leukemia (T-ALL).4, 5, 36 These and other studies using vector insertion site (VIS) retrieval analyses for detection of dominant clones sampled patients infrequently, demonstrated variable sensitivity levels, and were hampered by the need for complex statistical methodologies to estimate clonal contribution over time; thus, clonal expansion prior to overt leukemia may have been missed.25, 38 Our goal was to apply highly sensitive and quantitative barcoding based on the hypothesis that clonal expansion could be detected earlier and more reproducibly using this methodology.Prior studies of NHPs transplanted with HSPCs transduced with integrating retroviral vectors and followed long term suggested that these models closely mimic genotoxicity observed in human clinical trials. Analyses of VIS from human trials and NHP experiments have demonstrated close similarities in integration patterns between species for both retrovirus vectors (RVs) and LVs.39, 40, 41, 42 Both humans and rhesus macaques transplanted with MLV-transduced HSPCs have developed clonal expansions linked to insertions in or near the MECOM (MDS1/EVI1) gene complex, primarily in myeloid cells.20, 37 NHPs transplated with MLV-transduced CD34+ cells have developed clonal dominance proceeding to overt MDS and/or leukemia due to insertional mutagenesis.21, 22, 43 Although overt leukemia with RV transduction of HSPCs in large animal models has been an infrequent event, most studies involved follow-ups shorter than relevant for clinical trials, relatively low transduction efficiency (median of 5% gene marking), and did not clearly provide data demonstrating whether overt malignancy was preceded in time by a long period of detectable clonal expansion. Our group previously demonstrated effective LV gene transfer to repopulating NHP HSPCs and long-term polyclonal hematopoiesis, without evidence of marked clonal expansion, even utilizing a vector with an extremely strong internal viral promoter/enhancer.Other assays used to assess genotoxicity in vitro or in murine models have various strengths and limitations, as summarized in the Introduction. However, these approaches suggested that the SFFV promoter/enhancer within a γ-retroviral backbone is extremely genotoxic compared to other RV constructs, correctly predicting the extremely high rate of leukemogenesis in the WAS trial utilizing this vector design as compared to other RV clinical gene therapy trials using weaker viral promoters and/or enhancers.17, 36 The initial study comparing RV versus LV with an internal strong promoter in a murinetumor-prone model demonstrated no leukemia induction over background for the LV, but a follow-up study did find detectable tumor acceleration when the strong SFFV viral promoter was included in the LV LTR, albeit 10-fold less than with the same SFFV design in an RV, informing our choice of this design as the potentially most genotoxic LV in the current NHP study.15, 46
In vitro immortalization assays comparing different strengths and locations of promoter/enhancers also demonstrated genotoxicty of the SFFV cassette within an LV LTR or internally, compared to almost undectable genotoxicity with internal weaker promoters such as phosphoglycerate kinase (PGK). A recent paper from Aranyossy and coworkers applied a competitive barcoding approach in a murine HSPC transplantation model, very similar to the approach in our rhesus macaque model, to study patterns of reconstitution and clonal dynamics with various RV and LV constructs. Unfortunately, in this model, intermouse variation was very high, and long-term engraftment was generated from a very small number of clones in most animals, precluding genotoxicity comparisons.Clinical trials utilizing LV transduction of HSPCs in a total of over 30 patients to treat WAS38, 39, 48 metachromatic leukodystrophy (MLD), adrenoleukodystrophy (ALD),8, 50 and sickle cell anemia (SCA) have not demonstrated any consistent or persistent clonal expansions following retrieval of many hundreds of thousands of individual VIS; but, tracking has been limited to a relatively few time points, and it has utilized semiquantitative VIS retrieval. These trials used vectors with internal endogenous promoters or promoter/enhancers (WAS and SCA), weak internal constitutive promoters such as EF1-α or PGK (MLD), or a strong internal retroviral promoter (ALD). However, a single thalassemiapatient receiving HSPCs transduced with an LV driving a hemoglobin gene from a strong erythroid-specicfic promoter/enhancer has shown prolonged but eventually transient clonal expansion of cells, linked to an LV insertion resulting in aberrant splicing and overexpression of the gene HMGA2. LV insertions have been shown to result in aberrant splicing and chimeric transcripts.9, 15, 52In sum, these clinical trials, and our similar findings in the NHP model using a more quantitative tracking methodology, are encouraging in terms of the relative lack of genotoxicity of LVs, even those containing strong promoter/enhancer elements such as an LTR-inserted SFFV cassette. However, the risks associated with any integrating vector, including LV, are finite, and genotoxic risks must be considered in the context of disease severity and alternative therapies. All these trials also reported very encouraging clinical results regarding disease amelioration, suggesting that LV HSPC gene therapy approaches will be applied to larger numbers of patients and disease indications in the near future, making safety predictions even more important. While our model did not detect differential genotoxic risk between the LVs utilized in this study, we believe the approach could be useful in the future to test novel vector platforms or pretransplant cell expansion approaches. Sensitivity might be enhanced or clonal expansions accelerated by the application of proliferative stress post-transplant, for instance, with busulfan, which we have previously shown can induce clonal dominance.
Materials and Methods
Construction and Production of Barcoded LVs
The generation and utilization of high-diversity 35-bp barcode libraries preceded by a 6-bp library ID and flanked by PCR primer-binding sites has been previously described. Barcode library cassettes containing primer-binding sites for PCR amplification flanking the library ID and barcodes were subcloned into the U3 region 40 bp from the 5′ end of the LTR in each of three LV plasmids as follows: EF1-α-library ID TCAAGT, vector plasmid pCDH-EF1-MCS-T2A-copGFP (System Biosciences, Palo Alto, CA, CD526A-1); MSCV-library ID GATCTG, vector plasmid pCDH-MCS-T2A-copGFP-MSCV (System Biosciences, CD523A-1); and SFFV-library ID GTAGCC, vector plasmid LV.SF.LTR (obtained from Eugenio Montini, Ospedale San Raffaele Srl).Lentiviral particles were produced as described using the χHIV system designed to overcome a block to HIV transduction of rhesus macaque target cells. Retention of sufficient barcode diversity within each barcoded vector library was confirmed via Monte Carlo simulations using custom Python code to calculate the number of engrafting cells that could be transduced with 95% certainty that each unique barcode is only present in a single transduced cell 95% of the time, as described and using code available at https://github.com/dunbarlabNIH.27, 32 Vector libraries were only utilized for transplantation experiments if sufficiently diverse.
K562 Validation of Quantitative Simultaneous Barcode Retrieval from 3 Lentiviral Libraries
K562 cells were transduced separately with each of the 3 different vectors at an MOI of 0.5 in the presence of 4 μg/mL protamine sulfate (Sigma, St. Louis, MO) in RPMI 1640 medium (Life Technologies, Grand Island, NY), supplemented with 0.5 mg/mL penicillin-streptomycin-glutamine (Life Technologies, Grand Island, NY) and 10% fetal bovine serum (Sigma, St. Louis, MO). Cells were cultured at 37°C in the presence of 5% CO2 followed by single-cell sorting on GFP-expressing cells to obtain single-cell clones. DNA was extracted from the different K562 clones and subjected to Southern blotting as previously described. Clones containing single vector copies based on the presence of a single band on Southern blot were identified. Single-copy clones were mixed at different ratios of cell numbers and then diluted into untransduced K562 cells to create mixtures at a range of vector ratios and overall marking levels, corresponding to the overall levels of marking and ratios between barcode libraries (vectors) obtained in vivo in ZJ48 and ZJ41.
Transduction and Autologous Transplantation of Rhesus Macaque CD34+ HSPCs
The NHLBI Animal Care and Use Committee approved all animal studies. Rhesus macaque HSPCs were mobilized from the bone marrow by treatment with 4 days of granulocyte colony-stimulating factor (G-CSF) 10 mg/kg/day (Amgen, Thousand Oaks, CA) and AMD3100 (plerixafor) 1 mg/kg (Sigma, St. Louis, MO) on the morning of the fifth day, 3–4 hr prior to apheresis. Collection of peripheral blood mononuclear cells (PBMNCs) via apheresis and enrichment of CD34+ HSPCs via immunoselection were performed as previously described. CD34+ cells were split into three equal fractions, and, following overnight culture on RetroNectin-coated plates (Takara, T100B, Mountain View, CA) in X-VIVO 10 media (Lonza, Rockland, ME) supplemented with 100 ng/mL recombinant humanflt3 ligand, stem cell factor, and thrombopoietin (Miltenyi Biotec, Auburn, CA), each fraction was transduced with one of the three vectors at an MOI of 25 in the presence of 4 μg/mL protamine sulfate (Sigma, St. Louis, MO). 24 hr later, all three fractions of transduced cells were collected from the plates, pooled, and reinfused into the autologous rhesus macaque. Each animal received total-body irradiation at a dose of 500 cGy/day for 2 days prior to the day of cell reinfusion. Aliquots of cells were removed prior to pooling and reinfusion, and they were cultured for an additional 48 hr prior to flow cytometric analysis of GFP expression. Cell aliquots from the end of transduction were also removed for DNA extraction and barcode recovery.
Cell Processing and Flow Cytometric Sorting
Peripheral blood was separated into mononuclear cells and granulocytes via centrifugation over Ficoll (MP Biomedicals, Solon, OH); incubated with ammonium-chloride-potassium (ACK) lysis buffer (Quality Biological, Gaithersburg, MD) to remove red blood cells; and labeled with fluorphore-conjugated monoclonal antibodies against CD3, CD20, CD14, CD56, CD16, and CD33 (antibody clone and source information summarized in Table 2), allowing sorting of individual granulocyte, monocyte, T cell, and B cell lineages on a FACSAriaII flow cytometer (BD Biosciences, San Jose, CA). DNA was extracted from sorted cell pellets using the DNeasy Blood and Tissue Kit (QIAGEN, Valencia, CA).
500 ng DNA samples from each cell type were amplified via 28 cycles of PCR using primers flanking the library ID and barcode. Primers added a 6-bp sample ID to facilitate multiplex sequencing, Samples were run on a 2.5% agarose gel, and the resulting 157-bp amplified fragment was gel purified (MinElute Gel Extraction Kit, QIAGEN, Valencia, CA). Equal amounts of product were pooled and prepared for high-throughput sequencing, allowing pooling of 12–20 multiplexed samples per lane in the sequencing flow cell of the Illumina HiSeq 2000 sequencer.
Barcode Retrieval and Data Analysis
After de-multiplexing, 4 million reads (selected randomly to avoid lane position bias) sequencing output from each sample were analyzed using custom python code as described. In brief, reads were extracted via the presence of one of the three library IDs, reads with abundance lower than 100 reads were discarded (a step shown previously not to change results, excluding reads almost certainly resulting from PCR and sequencing artifacts and speeding processing time from days to hours for each file), reads with 0–2 mismatches or insertions or deletions (indels) were counted as identical and were combined to determine the read number for each barcode with a library ID within a sample. The fractional contribution of each barcode to a sample was then calculated from the ratio of the individual barcode’s read number divided by the total number of valid barcode reads in the sample. A read number threshold of 2,000 was applied to exclude false barcodes resulting from sequencing artifacts or the presence of false library IDs in the genome.A master list of barcodes was then generated for each animal and each vector, consisting of any barcode with a read abundance of over 2,000 in any sample for that animal and vector. Once a barcode was included on the master list, all analyses included the read numbers for that barcode, even if the read number in other samples fell below 2,000. The threshold of 2,000 was chosen conservatively and previously validated for this rhesus macaque model.
Figure S4 shows the impact of applying lower and higher thresholds, which impact on enumeration of overall clone numbers but do not impact on tracking of high contributing or potentially expanding clones. Distinguishing very low contributing clones from false barcodes or recurrent sequencing artifacts is impossible, but real clones contributing at a very low level in every sample are unlikely to be biologically important.Using our methodology on primate samples with this level of marking, we have previously shown that lower thresholds result in the counting of false barcodes and sampling issues, and higher thresholds result in the exclusion of valid and reproducibly retrieved clones.Stacked area plots were made using R studio (Integrated Development for RStudio, 2015, Boston, MA; https://www.rstudio.com/) and custom R code. Heatmaps were produced using custom R code. All custom Python and R code can be accessed at https://github.com/dunbarlabNIH. Prism 6.0 (GraphPad, La Jolla, CA) was used to make dot plot diagrams.
Author Contributions
I.M.Y., C.W., and C.E.D. contributed to the design and conceptualization of the research. L.L.T., D.A.E., and S.K. provided software and conducted formal analyses. L.L.T. performed visualizations. I.M.Y., C.W., S.P., M.A.F.C., T.W., K.-R.Y., S.G.H., A.B., A.K., M.M., and R.E.D. conducted investigations. C.E.D. provided supervision. I.M.Y. and C.E.D. wrote the manuscript.
Authors: Annette Deichmann; Salima Hacein-Bey-Abina; Manfred Schmidt; Alexandrine Garrigue; Martijn H Brugman; Jingqiong Hu; Hanno Glimm; Gabor Gyapay; Bernard Prum; Christopher C Fraser; Nicolas Fischer; Kerstin Schwarzwaelder; Maria-Luise Siegler; Dick de Ridder; Karin Pike-Overzet; Steven J Howe; Adrian J Thrasher; Gerard Wagemaker; Ulrich Abel; Frank J T Staal; Eric Delabesse; Jean-Luc Villeval; Bruce Aronow; Christophe Hue; Claudia Prinz; Manuela Wissler; Chuck Klanke; Jean Weissenbach; Ian Alexander; Alain Fischer; Christof von Kalle; Marina Cavazzana-Calvo Journal: J Clin Invest Date: 2007-08 Impact factor: 14.808
Authors: Karin Loré; Ruth Seggewiss; F Javier Guenaga; Stefania Pittaluga; Robert E Donahue; Allen Krouse; Mark E Metzger; Richard A Koup; Cavan Reilly; Daniel C Douek; Cynthia E Dunbar Journal: Stem Cells Date: 2006-02-23 Impact factor: 6.277
Authors: Olga S Kustikova; Bernhard Schiedlmeier; Martijn H Brugman; Maike Stahlhut; Stefan Bartels; Zhixiong Li; Christopher Baum Journal: Mol Ther Date: 2009-06-16 Impact factor: 11.454
Authors: Marion G Ott; Manfred Schmidt; Kerstin Schwarzwaelder; Stefan Stein; Ulrich Siler; Ulrike Koehl; Hanno Glimm; Klaus Kühlcke; Andrea Schilz; Hana Kunkel; Sonja Naundorf; Andrea Brinkmann; Annette Deichmann; Marlene Fischer; Claudia Ball; Ingo Pilz; Cynthia Dunbar; Yang Du; Nancy A Jenkins; Neal G Copeland; Ursula Lüthi; Moustapha Hassan; Adrian J Thrasher; Dieter Hoelzer; Christof von Kalle; Reinhard Seger; Manuel Grez Journal: Nat Med Date: 2006-04-02 Impact factor: 53.440
Authors: Yoo-Jin Kim; Yoon-Sang Kim; Andre Larochelle; Gabriel Renaud; Tyra G Wolfsberg; Rima Adler; Robert E Donahue; Peiman Hematti; Bum-Kee Hong; Jean Roayaei; Keiko Akagi; Janice M Riberdy; Arthur W Nienhuis; Cynthia E Dunbar; Derek A Persons Journal: Blood Date: 2009-04-01 Impact factor: 22.113
Authors: Rajiv Sharma; Daniel P Dever; Ciaran M Lee; Armon Azizi; Yidan Pan; Joab Camarena; Thomas Köhnke; Gang Bao; Matthew H Porteus; Ravindra Majeti Journal: Nat Commun Date: 2021-01-20 Impact factor: 14.919
Authors: Ellen Fraint; Bianca A Ulloa; María Feliz Norberto; Kathryn S Potts; Teresa V Bowman Journal: Stem Cells Transl Med Date: 2020-10-15 Impact factor: 6.940