Mobile genetic elements with the ability to integrate genetic information into chromosomes can cause disease over short periods of time and shape genomes over eons. These elements can be used for functional genomics, gene transfer and human gene therapy. However, their integration-site preferences, which are critically important for these uses, are poorly understood. We analyzed the insertion sites of several transposons and retroviruses to detect patterns of integration that might be useful for prediction of preferred integration sites. Initially we found that a mathematical description of DNA-deformability, called V(step), could be used to distinguish preferential integration sites for Sleeping Beauty (SB) transposons into a particular 100 bp region of a plasmid [G. Liu, A. M. Geurts, K. Yae, A. R. Srinivassan, S. C. Fahrenkrug, D. A. Largaespada,J. Takeda, K. Horie, W. K. Olson and P. B. Hackett (2005) J. Mol. Biol., 346, 161-173 ]. Based on these findings, we extended our examination of integration of SB transposons into whole plasmids and chromosomal DNA. To accommodate sequences up to 3 Mb for these analyses, we developed an automated method, ProTIS, that can generate profiles of predicted integration events. However, a similar approach did not reveal any structural pattern of DNA that could be used to predict favored integration sites for other transposons as well as retroviruses and lentiviruses due to a limitation of available data sets. Nonetheless, ProTIS has the utility for predicting likely SB transposon integration sites in investigator-selected regions of genomes and our general strategy may be useful for other mobile elements once a sufficiently high density of sites in a single region are obtained. ProTIS analysis can be useful for functional genomic, gene transfer and human gene therapy applications using the SB system.
Mobile genetic elements with the ability to integrate genetic information into chromosomes can cause disease over short periods of time and shape genomes over eons. These elements can be used for functional genomics, gene transfer and human gene therapy. However, their integration-site preferences, which are critically important for these uses, are poorly understood. We analyzed the insertion sites of several transposons and retroviruses to detect patterns of integration that might be useful for prediction of preferred integration sites. Initially we found that a mathematical description of DNA-deformability, called V(step), could be used to distinguish preferential integration sites for Sleeping Beauty (SB) transposons into a particular 100 bp region of a plasmid [G. Liu, A. M. Geurts, K. Yae, A. R. Srinivassan, S. C. Fahrenkrug, D. A. Largaespada,J. Takeda, K. Horie, W. K. Olson and P. B. Hackett (2005) J. Mol. Biol., 346, 161-173 ]. Based on these findings, we extended our examination of integration of SB transposons into whole plasmids and chromosomal DNA. To accommodate sequences up to 3 Mb for these analyses, we developed an automated method, ProTIS, that can generate profiles of predicted integration events. However, a similar approach did not reveal any structural pattern of DNA that could be used to predict favored integration sites for other transposons as well as retroviruses and lentiviruses due to a limitation of available data sets. Nonetheless, ProTIS has the utility for predicting likely SB transposon integration sites in investigator-selected regions of genomes and our general strategy may be useful for other mobile elements once a sufficiently high density of sites in a single region are obtained. ProTIS analysis can be useful for functional genomic, gene transfer and human gene therapy applications using the SB system.
Mobile genetic vectors have been harnessed for genetic studies in model organisms and are being developed as agents for gene-therapy in humans (1–3). For example, the awakening of the Sleeping Beauty (SB) transposon system as a powerful tool for insertional mutagenesis to identify oncogenes (4,5) and other classes of genes (6,7) complements retroviral vectors, which have been used for decades (8). Importantly, understanding the parameters that affect integration of vectors is required to appreciate fully the results of their applications.Although transposons and some retroviruses integrate in virtually all regions of host genomes, their integration is not random (9–18). Weak consensus sequences have been described surrounding the sites of integration for retroviruses (16,19) and transposable elements (6,20,21). However, the most-favored integration sites do not always conform to these sequences (6). In addition to specific-sequence recognition, DNA structural characteristics, including protein-induced deformability, A-philicity and bendability, have been shown to influence binding of proteins (22). Although these structural characteristics are sequence-dependent, two dissimilar sequences can have similar structural patterns. As a result, distinct preferred integration sites may not match consensus sequences, but rather share similar structural patterns. Unique patterns of these DNA structural characteristics at integration sites have been reported for retroviruses and lentiviruses (19), P-elements (20) and SB transposons (21,23) that may contribute to mechanisms that differentiate potential loci for integration of mobile genetic elements. We previously used a mathematical description of DNA ‘deformability’ called Vstep to identify shared structural patterns among several preferred integration sites for SB transposons into a short 100 bp region of a target plasmid (23). DNA deformation is characterized by a non-uniform twisting of the double helix, alteration in the spacing between the base pairs at the integration site and localized tilting of the target site such that the axis around the insertion site is off center. This initial analysis did not answer the question of whether these parameters can be used to effectively predict integration site preferences into chromosomal DNA in mammalian genomes nor whether other integrating vectors followed similar rules.Here we describe our strategy of using a small dataset of high-density integrations into a defined region of DNA to formulate rules that govern integration-site preferences in lengths of chromatin of more than 3 Mb. To analyze such long stretches of DNA, we developed an algorithm for rapidly scanning DNA sequences to predict favored sites of integration of mobile elements into mammalian chromosomes. We used SB transposons as a model element to establish a method for finding and testing rules that govern integration-site preferences. We used two datasets from forward-genetic studies to verify the predictions made by our algorithms and then examined potential integration preferences for two other transposons as well as retroviral and lentiviral vectors.
MATERIALS AND METHODS
Algorithm for determining the Vstep profile of SB transposon integration sites
Profiling TA sites using the Vstep algorithm. Sequences of 12 bp (N) with TA sites at positions six and seven were analyzed with respect to the 11 Vstep values ([0]–[10]) for transitions from one base pair to the next (brackets). Profiles are charted and subsequently assigned to one of three categories, preferred, semi-preferred or basal, based upon the graphical pattern. In all profiles there is a ‘TA-peak’ that always exists in such profiles because the T-to-A Vstep value is 6.3 and all steps from N to T and from A to N (N = any base) are always <3.0, as shown on the left side of the figure. The ‘TA peak’ formed by the two lines that connect the three Vstep values for the N-to-T, T-to-A and A-to-N steps are shown in boldface.
Table 1
SB transposition-site preferences as a function of Vstep profiles
Vstep Pattern
# Target Sites (% of total)
Sites Hit
Insertions/Site
Preference
pFV/Luc:
Basal
299 (61%)
39
0.13
1X
Semi-Preferred
154 (31%)
92
0.60
5X
Preferred
36 (7%)
62
1.7
13X
Braf Intron-9:
Basal
209 (60%)
5
0.02
1X
Semi-Preferred
105 (19%)
12a
0.11
6X
Preferred
33 (10%)
8b
0.2
10X
3.2 Mbp Chromosome 1:
Basal
117 454 (56%)
5
0.00004
1X
2.5-peak
67 070 (32%)
15
0.00022
6X
Preferred
23 775 (11%)
14
0.00059
15X
aA total of 11 sites were hit; one was hit twice for a total of 12 hits.
bSix sites were hit; two were hit twice for a total of eight hits.
Analyzing Vstep and A-philicity profiles of insect transposons and retroviruses
Total Vstep profile of the 7758 bp plasmid pFV/Luc. The sequence is divided into 78 bins of size100 bp. (a) Plot of the number of each type of TA site per bin. The hexagon indicates the Chinook salmon poly(A) addition motif and the following square indicates an M13 origin of replication. (b) Plot of Total Vstep score per bin. (c) Distribution of observed insertion sites [adapted from Liu et al. (23)]. Shaded areas are regions required for selection and thus unlikely to be scored. The asterisks indicate the three most likely regions for integration based on ProTIS analysis and the arrow indicates a region that has a high number of TA sites, but relatively few integrations.
RESULTS
Development of an algorithm for Vstep profiles of transposon-integration sites
Remobilization of transposons into the ninth intron of the mouse Braf gene
The key to identifying preferred sites in chromatin is to examine multiple integrations into a limited genomic region and quantify variations from Poisson statistics. Such data became available from a study in which the SB transposon, T2/Onc, was engineered to elicit gain-of-function mutations and accelerate tumor formation in somatic tissues of mice lacking the p19Arftumor suppressor (4). The most frequent oncogenic insertion site was intron-9 of the Braf gene. All of the 25 analyzed insertions in intron-9 were oriented toward the 10th exon (Figure 3a), resulting in a transcript encoding the kinase domain of Braf that acts as a dominant oncogene. Of the 347 potential TA-integration sites in the 4069 bp intron, 22 were targets and three sites were hit twice. In this case, the probability of two insertions into a single TA site is 0.07 and the odds of this happening three times are 0.0004, which strongly suggested the existence of preferential insertion sites.
Figure 3
Vstep analysis of insertion sites of T2/Onc into the mouse Braf gene. (a) Schematic of mapped insertions into Braf (exons shown as tall vertical lines) with an expanded intron-9. Only T2/Onc transposons that integrated in a left-to-right orientation would be identified in the genetic screen. SA, splice-acceptor site, SD, splice-donor site, LTR, retroviral long terminal repeat, double arrowheads, inverted terminal repeats of the integrating transposon. The long arrow represents the direction of transcription from the LTR promoter within T2/Onc. (b) Total Vstep profile of intron-9 in terms of 82 bins of size 50 bp.
Transposon insertion sites in 3.2 Mb of mouse chromosome 1. (a) SB integration sites in Chromosome 1, the locations of the concatemer from which the transposons were remobilized (downwards arrow) and the 3.2 Mb region that had the highest density of integrations is marked with an asterisk. Region (b) was divided into 32 000 bins of size 100 bp and the Total Vstep scores for each bin calculated as described in Figure 3. The average Total Vstep value per bin is 23. (b) Blue bars, Total Vstep scores/bin; red bars, insertion sites mapped as a function of position. (c) Insertion sites (red) displayed as a function of Total Vstep score/bin (blue).
Total Vstep Profile for the human LMO2 gene plotted as 100 bp bins. The map of 100 kb of the LMO2 locus is shown above the center of the Vstep profile. Rectangles, exons; block arrows, sites of two activating retroviral insertions [P4 and P5, Ref. (36)]. Spikes 1 and 2 in the Total Vstep profile correspond to short tandem repeats of (TCTA) and (TA), respectively.
Profiling other transposable elements and retroviruses
Although SB was the first DNA-based transposable element developed to deliver DNA sequences into mammalian genomes, lepidopteran piggyBac transposons and Drosophila P-elements are powerful germline-transformation tools in insects (38,39). Although both of these vectors have significantly strong preferences for transcriptional units, we hypothesized that they might exhibit target-site selection patterns related to DNA structure that would further define sites of integration within genes. Accordingly, we examined the integration-site sequence-tags deposited in GenBank from multiple investigations. The single largest deposit of integration sites was generated by Exelixis and comprises over 18 000 piggyBac and 6500 P-element insertions (20,40). We refined the piggyBac data to 11 791 integrations that could be identified by the TTAA sequence recognized by piggyBac transposase and 5070 P-element integrations into validated genomic sequences. For both transposons we used the same procedure to identify preferential integration sites as we did for SB integrations: (i) find insertion hotspots, (ii) develop rules based on these sequences and (iii) test the rules against a much larger set of integrations. In contrast with what we found for SB transposons, there was no consistent Vstep pattern shared amongst either the piggyBac or P-element integration sites (Supplementary Figures 2 and 3).Retroviruses have been utilized in genetic screens and for germline and somatic transgenesis in vertebrates for decades. Weak consensus sequences are found at the integration sites of several retroviruses (16,19), based upon the examination of relatively few integration sites scattered across a target genome. Using curated data kindly provided by Drs Xiaolin Wu and Alex Holman, we examined 695 murine leukemia virus (19), 1371 human immunodeficiency virus-1(10,13), 148 simian immunodeficiency virus (13) and 551 avian sarcoma-leukosis virus (13,14) integration sites for Vstep patterns that would aid in predicting integration preferences (Figure 6). As with P-elements, we found symmetric patterns that overlap with the base pairs involved in the target site duplication for most family members. Importantly, these patterns are based on the same compilations used to identify unique, weak consensus sequences for the various viruses (16,19) and cannot be used to generate algorithms alone. The indicated patterns shown in Figure 6 suggest that Vstep rules for identifying preferential integration sites might exist, but adequately dense sources of in vivo integration sites for these vectors, along with the identification of hotspots, are still required to generate appropriate algorithms.
Figure 6
Vstep analysis of insertion sites of proviruses and transposons. The arrows in the profiles indicate the boundaries of the TSD sequence that occurs with the staggered cuts made by the various integrase enzymes. (a) Average Vstep profiles for 573 SB transposon integrations. (b) Average Vstep profiles for murine leukemia virus. (c) Average Vstep profiles for human immunodeficiency virus. (d) Average Vstep profiles for simian immunodeficiency virus. (e) Average Vstep profiles for avian sarcoma/leucosis virus. (f) Average Vstep profiles for 1006 random DNA 20mer sequences.
Authors: Astrid R W Schröder; Paul Shinn; Huaming Chen; Charles Berry; Joseph R Ecker; Frederic Bushman Journal: Cell Date: 2002-08-23 Impact factor: 41.582
Authors: Adam J Dupuy; Karl Clark; Corey M Carlson; Sabine Fritz; Ann E Davidson; Karra M Markley; Ken Finley; Colin F Fletcher; Stephen C Ekker; Perry B Hackett; Sandra Horn; David A Largaespada Journal: Proc Natl Acad Sci U S A Date: 2002-03-19 Impact factor: 11.205
Authors: Thomas J Vigdal; Christopher D Kaufman; Zsuzsanna Izsvák; Daniel F Voytas; Zoltán Ivics Journal: J Mol Biol Date: 2002-10-25 Impact factor: 5.469
Authors: Jason B Bell; Elena L Aronovich; Jeffrey M Schreifels; Thomas C Beadnell; Perry B Hackett Journal: Mol Ther Date: 2010-07-13 Impact factor: 11.454
Authors: Anja O Paatero; Hilkka Turakainen; Lotta J Happonen; Cia Olsson; Tiina Palomäki; Maria I Pajunen; Xiaojuan Meng; Timo Otonkoski; Timo Tuuri; Charles Berry; Nirav Malani; Mikko J Frilander; Frederic D Bushman; Harri Savilahti Journal: Nucleic Acids Res Date: 2008-10-25 Impact factor: 16.971
Authors: Igor Kondrychyn; Marta Garcia-Lecea; Alexander Emelyanov; Sergey Parinov; Vladimir Korzh Journal: BMC Genomics Date: 2009-09-08 Impact factor: 3.969