| Literature DB >> 18411205 |
Gary P Wang1, Alexandrine Garrigue, Angela Ciuffi, Keshet Ronen, Jeremy Leipzig, Charles Berry, Chantal Lagresle-Peyrou, Fatine Benjelloun, Salima Hacein-Bey-Abina, Alain Fischer, Marina Cavazzana-Calvo, Frederic D Bushman.
Abstract
Gene transfer has been used to correct inherited immunodeficiencies, but in several patients integration of therapeutic retroviral vectors activated proto-oncogenes and caused leukemia. Here, we describe improved methods for characterizing integration site populations from gene transfer studies using DNA bar coding and pyrosequencing. We characterized 160,232 integration site sequences in 28 tissue samples from eight mice, where Rag1 or Artemis deficiencies were corrected by introducing the missing gene with gamma-retroviral or lentiviral vectors. The integration sites were characterized for their genomic distributions, including proximity to proto-oncogenes. Several mice harbored abnormal lymphoproliferations following therapy--in these cases, comparison of the location and frequency of isolation of integration sites across multiple tissues helped clarify the contribution of specific proviruses to the adverse events. We also took advantage of the large number of pyrosequencing reads to show that recovery of integration sites can be highly biased by the use of restriction enzyme cleavage of genomic DNA, which is a limitation in all widely used methods, but describe improved approaches that take advantage of the power of pyrosequencing to overcome this problem. The methods described here should allow integration site populations from human gene therapy to be deeply characterized with spatial and temporal resolution.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18411205 PMCID: PMC2396413 DOI: 10.1093/nar/gkn125
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Tissues analyzed by integration site sequencing and frequency of integration near Cancergenes
| Mouse | Disease state | Tissue | Integration sites | Sites within 50 kb cancergenes 5′ end | Percentage of Cancergenes (tissue) | Percentage of Cancergenes (mouse) |
|---|---|---|---|---|---|---|
| Total reads (unique sites) | Total reads (unique sites) | Total reads (unique sites) | Total reads (unique sites) | |||
| Gamma-retroviral vector in rag-deficient mice | ||||||
| 1 | Healthy control | LN | 2360 (274) | 269 (45) | 11.4 (16.4) | 10.1 (15.3) |
| marrow | 2701 (101) | 137 (17) | 5.1 (16.8) | |||
| thymus | 3262 (103) | 436 (11) | 13.4 (10.7) | |||
| 215 | Lymphoproliferation | LN | 6950 (60) | 262 (6) | 3.8 (10.0) | 3.3 (13.6) |
| marrow | 1893 (33) | 39 (3) | 2.1 (9.1) | |||
| spleen | 4734 (48) | 173 (7) | 3.7 (14.6) | |||
| thymus | 4044 (43) | 104 (9) | 2.6 (20.9) | |||
| X | Lymphoproliferation | liver | 4049 (40) | 1344 (10) | 33.2 (25.0) | 33.3 (23.5) |
| marrow | 4160 (53) | 1433 (11) | 34.4 (20.8) | |||
| spleen | 5287 (43) | 1717 (11) | 32.5 (25.6) | |||
| Lentiviral (EF1α) vector in artemis-deficient mice | ||||||
| 22 | Healthy control | LN | 7789 (325) | 268 (21) | 3.4 (6.5) | 5.1 (6.0) |
| marrow | 3387 (89) | 322 (4) | 9.5 (4.5) | |||
| spleen | 7028 (252) | 482 (16) | 6.9 (6.3) | |||
| thymus | 3325 (50) | 25 (2) | 0.8 (4.0) | |||
| 31 | Healthy cells pretransplantation | Sca1+ | 9574 (337) | 735 (25) | 7.7 (7.4) | 7.7 (7.4) |
| Lymphoproliferation | LN | 3687 (215) | 89 (20) | 2.4 (9.3) | 1.7 (9.0) | |
| spleen | 1669 (6) | 0 (0) | 0 (0) | |||
| 8 | Healthy cells pretransplantation | Sca1+ | 10648 (265) | 597 (18) | 5.6 (6.8) | 5.6 (6.8) |
| Lymphoproliferation | LN | 737 (46) | 3 (3) | 0.4 (6.5) | 0.04 (3.9) | |
| liver | 14041 (11) | 0 (0) | 0 (0) | |||
| pleural fluid | 7183 (55) | 13 (3) | 0.2 (5.5) | |||
| spleen | 7179 (18) | 0 (0) | 0 (0) | |||
| thymus | 12290 (23) | 0 (0) | 0 (0) | |||
| Lentiviral (PGK) vector in artemis-deficient mice | ||||||
| 401 | Healthy control | LN | 1647 (65) | 117 (4) | 7.1 (6.2) | 4.3 (9.5) |
| marrow | 379 (41) | 41 (4) | 10.8 (9.8) | |||
| spleen | 1265 (40) | 40 (4) | 3.2 (10.0) | |||
| thymus | 2487 (53) | 50 (7) | 2.0 (13.2) | |||
| 613 | Healthy cells pretransplantation | Sca1+ | 26477 (37) | 0 (0) | 0 (0) | 0 (0) |
| Lymphoproliferation | LN | 42 (5) | 0 (0) | 0 (0) | 1.0 (5.9) | |
| marrow | 31 (6) | 1 (1) | 3.2 (16.7) | |||
| thymus | 26 (6) | 0 (0) | 0 (0) | |||
| MLV-based vector in MEF | 7420 (4828) | 1003 (632) | 13.5 (13.1) | |||
| HIV-based vector in MEF | 3929 (2441) | 329 (209) | 8.4 (8.6) | |||
aThese samples were sequenced using the Sanger method only.
LN, lymph nodes.
Figure 1.The DNA bar coding strategy. Each LTR primer used in ligation-mediated PCR contained a unique DNA barcode that specified the mouse and tissue of origin. Each barcode consists of a unique 4-bp nucleotide sequence, inserted between the sequencing primer binding site and the LTR specific primer segment. Thus all sequencing reads begin with the 4-bp barcode identifiers. A sample primer with a bar code is shown at the bottom of the diagram.
Figure 2.Lentiviral vector and gamma-retroviral vector integration site distributions in the murine genome. Vector integration sites for pooled samples were compared to their matched random control sites. In the matching procedure (20), each unique integration site was matched with 10 control sites in the genome randomly selected in silico that were constrained to lie the same distance from an MseI recognition site as the experimentally determined integration site. Comparison of experimentally determined integration sites to the matched random controls thus ‘washed out’ any possible biases introduced by the use of MseI cleavage. A value above 1 indicates favored integration relative to random control sites; a value below 1 indicates disfavored integration. (A) Frequency of integration in transcription units. MEF: control integration sites from cultured murine embryonic fibroblasts. Healthy: control healthy mice. Tumor: mice with lymphoproliferation. (B) Frequency of integration near transcription start site. (C) Frequency of integration near CpG islands. (D) Frequency of integration in transcription units. In this analysis the integration site data sets are pooled for all tissues from each mouse. (E) Frequency of integration in transcription units for integration sites pooled by tissue of origin (samples from liver are not included in this analysis due to the low number of integration sites). Comparisons between the lentiviral and gamma-retroviral vectors in each of the panels achieved P <0.0001 (Fisher's exact test). Comprehensive analysis of integration frequency relative to many genomic features can be found in Supplementary Data 1–3.
Figure 3.Correlations of clone counts between tissues in mice with abnormal lymphoproliferation (A and B) or healthy controls (C and D). The mouse studied is indicated on the figure, along with the genes nearest the integration site. Each point represents an individual integration site; the values on the x- and y-axes indicate the number of sequences for each clone. The square root of clone counts for each tissue is plotted to allow very high counts to be displayed conveniently. Integration sites at known proto-oncogenes are indicated by the larger red lettering. The r-values indicate the Spearman correlations for counts between tissues. For detailed analysis of all mice see Supplementary Data 4. We note that it is probable that not all potential cancer-related genes have been identified. Also, for any given insertion, additional studies are required to establish whether the integration event up-regulates Cancergene transcription.
Figure 4.Severe biases in recovery of integration sites arising due to use of different restriction enzymes. (A) Venn diagram indicating the overlap between integration sites isolated by cleaving genomic DNA with MseI versus Tsp509I. (B) Collector's curve (rarefaction) analysis of integration sites recovered after MseI cleavage. Repeated samples of integration site subsets were used to evaluate whether further sampling would likely yield additional integration sites (rarefaction analysis), as indicated by whether the curve has reached a plateau value. The y-axis indicates the number of integration site sequences detected, the x-axis the number of integration sites in the subset analyzed. ‘Sob’ (Species-observed) indicates rarefaction on the original data. The Chao1 estimator was used to estimate the number of undetected integration sites from frequency of isolation information. ‘Chao1’ indicates collector's curve analysis on Chao1 estimates for sequence subsets. (C) Collector's curve analysis for integration sites recovered after cleavage of genomic DNA with Tsp509I. Markings as in (B).
Figure 5.Bias in recovery of integration sites due to the distance between integration sites and the closest MseI sites. The distribution of counts of identical sequence reads is shown as a function of distance to the nearest restriction site. The y-axis represents the percentage of integration sites with the indicated frequencies of isolation, and the x-axis is the range of distances between integration sites and their nearby MseI restriction sites used for binning.