Literature DB >> 27570481

Analysis of PBase Binding Profile Indicates an Insertion Target Selection Mechanism Dependent on TTAA, But Not Transcriptional Activity.

Dong Yang1, Ruiqi Liao2, Yun Zheng3, Ling Sun2, Tian Xu1.   

Abstract

Transposons and retroviruses are important pathogenic agents and tools for mutagenesis and transgenesis. Insertion target selection is a key feature for a given transposon or retrovirus. The piggyBac (PB) transposon is highly active in mice and human cells, which has a much better genome-wide distribution compared to the retrovirus and P-element. However, the underlying reason is not clear. Utilizing a tagged functional PB transposase (PBase), we were able to conduct genome-wide profiling for PBase binding sites in the mouse genome. We have shown that PBase binding mainly depends on the distribution of the tetranucleotide TTAA, which is not affected by the presence of PB DNA. Furthermore, PBase binding is negatively influenced by the methylation of CG sites in the genome. Analysis of a large collection of PB insertions in mice has revealed an insertion profile similar to the PBase binding profile. Interestingly, this profile is not correlated with transcriptional active genes in the genome or transcriptionally active regions within a transcriptional unit. This differs from what has been previously shown for P-element and retroviruses insertions. Our study provides an explanation for PB's genome-wide insertion distribution and also suggests that PB target selection relies on a new mechanism independent of active transcription and open chromatin structure.

Entities:  

Keywords:  Transposons; insertion

Mesh:

Substances:

Year:  2016        PMID: 27570481      PMCID: PMC4997051          DOI: 10.7150/ijbs.15589

Source DB:  PubMed          Journal:  Int J Biol Sci        ISSN: 1449-2288            Impact factor:   6.580


Introduction

Insertion profiles are key feature of transposons and retroviruses which can aid in increasing our understanding of pathogenesis and in developing tools for mutagenesis and transgenesis. It has been reported that both transposons and retroviruses preferably insert into transcriptionally active genes, which may be due to the higher accessibility of open chromatin and/or the interaction between transposase/integrase and cellular proteins bound to transcriptionally active regions 1-6. The distribution of P-element insertions in the Drosophila melanogaster genome exhibits high preference for transcriptionally active regions, and especially for the regions near the transcriptional start site 7, 8. A similar preference has been also reported for retroviruses including human immunodeficiency virus (HIV), avian sarcoma-leukosis virus (ASLV), and murine leukaemia virus (MLV) 2, 3, 9-13. It has been shown that transcriptional co-activators bound to HIV integration complexes serve as a tether for insertion targeting 14-17. The piggyBac (PB) transposon has been shown to be highly active in mice and human cells 18, making it an ideal tool for a variety of genetic manipulations in mammals and human cells 18-28. In fact PB has many advantageous features including having the highest transposition efficiency compared to other DNA transposons 29, 30, being active in broad cell types and species from insects to mammalians 18-20, 31-36, capable of carrying a large cargo 23, and precise excision without leaving damage footprints 18, 37, 38. PB appears to have a preference for transcriptional units 18, 21, 22, suggesting that it could also have a mechanism similar to P-element and retroviruses that target transcriptionally active genes. On the other hand, PB has a significantly broader genome-wide distribution than other transposons and retroviruses 30, 39, 40, arguing that it may employ a different mechanism in target selection. To better understand the genomic profile of PB insertion site selection, we have tagged the PB transposase (PBase) and examined PBase binding preference in the mouse genome using the Chromatin Immunoprecipitation (ChIP)-Chip method 41, 42 which is a high-throughput method used to identify the interaction between genomic DNA and the protein of interest in the living cells. We have analyzed the PBase binding distribution in the genome in relation to the distribution of PB's target sequence TTAA, the distribution of transcriptional units, gene expression levels, and the distribution of CG methylation. Our analysis has revealed that unlike P-element and retroviruses, PBase binding does not prefer transcriptional active units, but rather mainly depends on the distribution of TTAA sites with a negative influence by CG methylation. This unique mechanism provides an explanation for the behavior of PB's genome-wide insertion profile and broad activities in different species and cell types.

Results

Mapping of PBase binding sites in mouse ES cells

To investigate interactions between PBase and the mouse genomic DNA, we applied the ChIP-Chip method using the Affymetrix® Chromatin Immunoprecipitation Assay Protocol (P/N 702238). 3×Myc-tag were added to the C-terminal of PBase (Figure 1a). This Myc tagged PBase was shown to be (Figure 1b) and able to catalyze PB transposition in W4/129S6 mouse ES cell (Taconic) genome with the same efficiency as wild-type PBase (Figure 1c, 1d). 15μg Myc antibody (SC-789X, Santa Cruz) was used to immunoprecipitate the PBase-DNA complex. The Affymetrix® GeneChip® Mouse Tiling 2.0R Array Set (P/N 900852) was then used to identify PBase binding sites. To check whether the PB transposon had an effect on PBase binding, we mapped the PBase binding sites in ES cells with and without the presence of the PB transposon designated as PB+ or PB-, respectively. For experiments with each group, 2 replicates were performed. Using the peak finding algorithm 43, 44, we defined PBase binding sites across the whole mouse genome. These sites displayed similar distribution patterns in all the analyses carried out for this study. We did not detect any effect of the PB transposon on the distribution of the PBase binding sites in our study. We then aggregate all the sites and identified a total of 2,396,017 PBase binding sites across the mouse genome (Table S1).
Figure 1

Utilizing Myc-tagged PBase for transposition in cultured mouse ES cells. (a) Both PBase and PBase-3×Myc were driven by an actin promoter. Tri-Myc tag (Blue triangle) was added to the c-terminal of PBase (PBase-3×Myc). PB [SV40-neo] carrying the neo drug selection marker driven by a SV40 promoter served as the donor plasmid 21. (b) Transient expression of PBase-3×Myc in ES cells 48h after transfection. (c) Statistical results of PB transposition efficiency test 21. PBase-3×Myc drove transposition with the same efficiency as PBase (p=0.6). Each number is the average obtained from three experiments. (d) An example of ES cell transposition efficiency test experiments. Blue dots were surviving ES clones stained by methylene blue after G418 selection. The number of survival cell clones after G418 selection indicated the transposition efficiency.

TTAA is highly enriched at the center of PBase binding sites

PB inserts almost exclusively into TTAA sites 37, 45. But little is known about the mechanism of target determination. We matched our 2,396,017 PBase binding sites to the TTAA sites across the genome. 87.85% (2,104,819) of these binding sites were within 250 bp of the TTAA sites and 25.48% (610,405) of the binding sites were within 25 bp. The TTAA motif was significantly enriched at the center of binding sites compared to the other combinations of 4 nucleotides (χ2 analysis, p < 0.0001) (Figure 2a).
Figure 2

TTAA sites were enriched at the center of PBase binding sites. (a) Distribution of PBase binding sites relative to the TTAA sites. Other combination of four nucleotides sequences served as control. (b) TTAA was enriched at the center of PBase binding sites even without the presence of PB transposon. (c) The non-random distribution of both PBase binding sites and (d) TTAA sites in the mouse genome. Mouse genome was divided into 5Mb bins. The number of PBase binding sites or TTAA sites in each bin was calculated. Both distributions were significantly different from Possion distribution (χ2 test, P < 0.0001).

For those peaks located within 25bp of TTAA sites, 51.21% (306471) were from the PB- group, while 49.79% (303934) were from PB+ group. There was no significant difference between the distribution of binding sites from PB+ and PB- group relative to TTAA sites (χ2 analysis, p =0.8714) (Figure 2b). These data indicated that even without the presence of PB transposon, PBase already has the ability to bind TTAA sites.

PBase binding site distribution follows that of TTAA sites in the genome, but not gene density or transcriptional activity levels

Global analysis of PBase binding sites showed that the sites spread universally across the whole mouse genome, but are not strictly randomly (Poisson) distributed (χ2 analysis, p < 0.0001) (Figure 2c). This might be partially due to the non-random distribution of TTAA sites in the mouse genome (χ2 analysis, p<0.0001) (Figure 2d). The distribution of PB insertion sites during large-scale mutagenesis in the D. melanogaster genome is also not random 8. We found that TTAA sites are also not randomly distributed in the D. melanogaster genome either (χ2 analysis, p < 0.0001). Further analysis of the distribution of PBase binding sites in the mouse genome showed that the PBase binding sites had no preference for any particular chromosome. The number of binding sites on each chromosome was highly correlated with the chromosome length (CC = 0.9800) and TTAA density (CC=0.9682) (Figure 3a), but not correlated with the gene density (CC=0.1613). These data suggest that unlike P-element and retroviruses, PBase binding followed TTAA, but not transcription activity.
Figure 3

Global distribution of PBase binding sites followed that of TTAA sites. (a) The distribution of PBase binding sites in the whole mouse genome at the chromosome level. The density of PBase binding sites on each chromosome was highly correlated with the length of this chromosome and the density of TTAA sites. (b) The distribution of PBase binding sites or (c) PB insertion sites in different transcription element was shown. High correlation was detected between the density of PBase binding sites on a gene and (d) the density of TTAA sites or (e) the length of this gene. (f) Expression level of a gene had no correlation with PBase binding. BT: binding sites to TTAA; BL: binding sites to length; IT: insertion sites to TTAA; IL: insertion sites to length.

To further explore the potential relationship between PBase binding and transcriptional activity, we performed two more analyses. Previously, P-element and retrovirus insertions were shown to exhibit a strong preference for the 5' region upstream of the transcription start sites 2, 3, 7-13, 40, 46. We did not detect a strong 5' upstream preference for PBase binding. Our data showed that only 0.72% of PBase binding sites were located in the 5' region within 1000bp upstream of transcription start sites (Figure 3b). Unlike the dramatically distorted distributions of P-element insertions (73%) in the D. melanogaster genome 40 and retroviruses (8%-20%) in the human genome 46, the distribution of PBase binding sites in the 5' region within 1000bp upstream of the transcription start site, exonic and intronic regions (5': 072%, exon: 5.36%, intron: 42.11%) is largely correlated with the length (5': 0.37%, exon: 2.36%, intron: 33.89%, CC=0.9653) or the TTAA density (5': 0.37%, exon: 1.86%, intron: 39.73%, CC=0.9926) of these regions in mouse genome (Figure 3b). Similar distribution patterns were also observed by analyzing our 5248 PB insertion sites in mouse mutant strains generated in germline transposition experiments (5': 1.39%; exon: 2.93%, intron: 47.18%, CC=0.9434) (Figure 3c). The strong 5' preference of P-element and retrovirus insertion has been attributed to the accessibility of the open chromatin regions that correlate with transcriptional activity 4, 9. Our data for PB suggest that PB does not rely on transcriptional activity. We thus checked whether transcription levels affect PBase binding in the mouse genome. We obtained the microarray expression data of 29106 transcripts from the mouse ES cells 47. No correlation between the expression level of the transcripts and the density of PBase binding sites were detected (CC = - 0.1014) (Figure 3f). On the other hand, a strong correlation between the number of TTAA sites in a gene (or the length of a gene) and the density of PBase binding sites on a gene was detected (CC = 0.8756 / 0.8780) (Figure 3d, 3e). Together these results indicate that PBase binding would largely depends on the TTAA sites rather than a region's transcriptional activity status.

Distribution of PBase Binding Is influenced by Methylation of CpG

While the global distribution of PBase binding sites generally followed that of TTAA sites in the genome, we noticed that there are regional distortions of this distribution (Figure 4a), suggesting other local factors might influence PBase binding besides TTAA. Further analysis showed that those regions, in which the density of PBase binding sites was much lower and not in proportion to TTAA density, were most probably highly methylated CG regions (Figure 4a, 4d). These trends suggested that CG methylation has a negative influence on PBase binding. To confirm this, we divided all the CG sites into 3 groups according to methylation levels. Those highly methylated CG sites were significantly farther away from PBase binding sites compared to low methylated sites (t test, p<0.001)(Figure 4b), which indicates that highly methylated CG sites prevent PBase from binding nearby. This phenomenon was not an artifact of TTAA distribution because TTAA sites have an opposite distribution trend in comparison to CG sites (Figure 4c). Consistent with this, when we only analyzed the low methylated CG regions (methylation level < 0.1), the density of PBase binding sites was correlated with the density of TTAA sites (CC=0.7741) (Figure 4e). These observations together indicate that the CG methylation of target DNA have an inhibitory effect on PBase binding.
Figure 4

PBase binding was inhibited by the methylation of genomic DNA. (a) A chromosome view of the PBase binding sites density, TTAA sites density and CG methylation level of chromosome x. PBase binding was affected in the highly methylated CG regions. (b) Relative distribution of PBase binding sites to CG sites. CG sites were divided into 3 groups based on the methylation level. PBase tended to bind near the low methylated CG sites. This is not caused by the distribution of the TTAA sites, which was an opposite trend (c). (d) Regions with the lowest PBase binding in the whole genome, were more likely high methylated regions (black dots). (e) The correlation coefficient value between PBase binding sites density and TTAA sites density was increased when those highly methylated regions were removed.

Discussion

It has been proposed that DNA transposons and retrovirus integration in the genome depends on active transcription. The piggyBac transposon has been shown previously to target TTAA motif for integration. However, it is not clear whether the PB transposon has a preference for TTAA sites in actively transcribed regions in the genome. To define the target selection profile for piggyBac, we introduced a Myc-tagged and functional PBase into the mouse ES cells and were able to generate a genome-wide map of PBase binding sites in the mouse genome. We found that PBase binding sites spread evenly in the mouse genome depending on the distribution of TTAA sites, but did not correlate with gene density or gene expression levels. The insertion sites from a large germline mutagenesis screen are consistent with the results of our PBase binding site analysis. While retroviruses and P-element have a strong bias for transcriptionally active regions, the PBase binding profile showed that PBase binding selection does not prefer transcriptionally active regions, but rather largely depends on the distribution patterns of TTAA sites. Our data therefore suggest a transposition mechanism different from retroviruses and P-element. It has been shown that integrases from retroviruses interact with cellular proteins or co-factors that are bound to transcriptionally active regions. This could permit these integrases have a higher accessibility to open chromatins or transcriptionally active regions. The lack of preference for transcriptionally active regions by PBase suggests that it might not interact with cellular proteins that bind to transcriptionally active regions. Furthermore, TTAA is a short sequence and is widely distributed throughout the genomes of different organisms. This could explain the broader genome-wide distribution patterns of PB and its capacity to transpose in a wide range of hosts. PB's unique target selection profile makes it an ideal tool for genome-wide mutagenesis. We also detected a negative influence of genomic methylation on PB's target selection, which is similar to the previous finding that the methylation of PB transposon DNA itself inhibits transposition 19. Therefore, methylation could be a mechanism that silences PB. While this feature could affect PB mutagenesis for somatic cells or in tissue culture cells, its germline mutagenesis capacity should not be affected as the genome of the germ line cells is largely unmethylated 48. In summary, PB's distinctive and transcription-independent target selection profile suggests a transposition mechanism different from retroviruses and P-element. Future studies exploring the interaction between the transposases and host factors could provide molecular mechanisms that contribute to this difference.

Materials and Methods

Plasmid Construction

Construction of Act-PBase-3×Myc was as follows: the coding sequence of 3×Myc was PCR amplified from pIND-3×Myc with primer1 (5'-TCAACGAAAGTACCGGTAAACC-3') and primer2 (5'-ATAGTATAGCGGCCGCCTTGTACTCGGAAACAA-3') , and cloned into the Not I and Age I sites of Act-PBase 18 to generate Act-PBase-3×Myc.

Mouse ES Cell Transfection

W4/129S6 mouse ES cells (Taconic) were used for the ChIP-chip assay to detect PBase binding sites in the mouse genome in this study. The conditions for culture and electroporation of ES cells were described in the manufacturer-recommended protocols (Taconic). 30μg circular Act-PBase-3×Myc in PB- group or 30μg Act-PBase-3×Myc plus 5μg PB[SV40-neo]18 in PB+ group were used for electroporation of 107 cells. After electroporation, cells were seeded onto 10 cm dish containing mitomycin C-treated mouse embryonic fibroblast feeder cells, ES cells were then harvested for ChIP after a 48h incubation period.

Transposition efficiency test

20μg circular PB[SV40-neo] and 10μg Act-PBase-3×Myc (or Act-PBase) were used for electroporation. Geneticin (G418) was added into each dish at the final concentration of 500μg/ml for selection neo resistant clones after 48h incubation. The medium was changed every day with geneticin. After 7 days selection, cells were fixed with PBS containing 4% paraformaldehyde for 10 minutes and then stained with 0.2% methylene blue for an hour. Cell clones were counted after washing with deionized water.

ChIP-chip

ES cells were harvested and crosslinked with 1% formaldehyde 48h after transfection. Sonication was performed with Sonics® VC 130 Vibra cellTM at 35 amplitude, 30 seconds pulses with 1 minute rest, 10 cycles to shear the DNA to 100-1000 bp fragments. Both pulsing and resting steps were performed in an ice bath. ChIP-chip experiments were performed according to Affymetrix® Chromatin Immunoprecipitation Assay Protocol (P/N 702238). A total of 108 ES cells were used per immunopreicipitation (IP) with the use of 15μg anti-Myc antibody (SC-789X, Santa Cruz). GeneChip® Mouse Tiling 2.0R Array Set (P/N 900852), a whole mouse genome tiling array was used for the DNA analysis.

Identification of PBase Binding Sites

The CHIP-chip data was first normalized using the quantile normalization. Then the peak finding algorithm 43 was applied to identify PBase binding sites. There are two main steps in this algorithm: First, identification of the binding region. Candidate binding regions should satisfy the following thresholds: (i) should contain at least 4 probes with significantly higher signal intensity than the background; and (ii) The distance between each neighboring probe with significantly higher signal in the region should be no more than 500bp. Second, identification of candidate binding sites from the binding region. For each binding region identified in the previous step, a double standard linear regression was performed which fits neighboring signals to asymmetric triangles centered on candidate binding sites. In total, we defined 2396017 PBase binding sites in the whole mouse genome (Table S1). This program was implemented using the Java programming language. Figure S1 and Table S1. Click here for additional data file.
  48 in total

1.  Genome-wide location and function of DNA binding proteins.

Authors:  B Ren; F Robert; J J Wyrick; O Aparicio; E G Jennings; I Simon; J Zeitlinger; J Schreiber; N Hannett; E Kanin; T L Volkert; C J Wilson; S P Bell; R A Young
Journal:  Science       Date:  2000-12-22       Impact factor: 47.728

2.  Role of the non-homologous DNA end joining pathway in the early steps of retroviral infection.

Authors:  L Li; J M Olvera; K E Yoder; R S Mitchell; S L Butler; M Lieber; S L Martin; F D Bushman
Journal:  EMBO J       Date:  2001-06-15       Impact factor: 11.598

Review 3.  Use of the piggyBac transposon for germ-line transformation of insects.

Authors:  Alfred M Handler
Journal:  Insect Biochem Mol Biol       Date:  2002-10       Impact factor: 4.714

4.  Gene transfer efficiency and genome-wide integration profiling of Sleeping Beauty, Tol2, and piggyBac transposons in human primary T cells.

Authors:  Xin Huang; Hongfeng Guo; Syam Tammana; Yong-Chul Jung; Emil Mellgren; Preetinder Bassi; Qing Cao; Zheng Jin Tu; Yeong C Kim; Stephen C Ekker; Xiaolin Wu; San Ming Wang; Xianzheng Zhou
Journal:  Mol Ther       Date:  2010-07-06       Impact factor: 11.454

5.  piggyBac is a flexible and highly active transposon as compared to sleeping beauty, Tol2, and Mos1 in mammalian cells.

Authors:  Sareina Chiung-Yuan Wu; Yaa-Jyuhn James Meir; Craig J Coates; Alfred M Handler; Pawel Pelczar; Stefan Moisyadi; Joseph M Kaminski
Journal:  Proc Natl Acad Sci U S A       Date:  2006-09-27       Impact factor: 11.205

6.  ChIP-chip: data, model, and analysis.

Authors:  Ming Zheng; Leah O Barrera; Bing Ren; Ying Nian Wu
Journal:  Biometrics       Date:  2007-09       Impact factor: 2.571

7.  Assay for movement of Lepidopteran transposon IFP2 in insect cells using a baculovirus genome as a target DNA.

Authors:  M J Fraser; L Cary; K Boonvisudhi; H G Wang
Journal:  Virology       Date:  1995-08-20       Impact factor: 3.616

8.  Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription.

Authors:  M K Lewinski; D Bisgrove; P Shinn; H Chen; C Hoffmann; S Hannenhalli; E Verdin; C C Berry; J R Ecker; F D Bushman
Journal:  J Virol       Date:  2005-06       Impact factor: 5.103

9.  Transcription start regions in the human genome are favored targets for MLV integration.

Authors:  Xiaolin Wu; Yuan Li; Bruce Crise; Shawn M Burgess
Journal:  Science       Date:  2003-06-13       Impact factor: 47.728

10.  Genome-wide target profiling of piggyBac and Tol2 in HEK 293: pros and cons for gene discovery and gene therapy.

Authors:  Yaa-Jyuhn J Meir; Matthew T Weirauch; Herng-Shing Yang; Pei-Cheng Chung; Robert K Yu; Sareina C-Y Wu
Journal:  BMC Biotechnol       Date:  2011-03-30       Impact factor: 2.563

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.