Sébastien Viollet1, Clément Monot1, Gaël Cristofari1. 1. INSERM; U1081; Institute for Research on Cancer and Aging of Nice (IRCAN); Nice, France ; CNRS; UMR 7284; Institute for Research on Cancer and Aging of Nice (IRCAN); Nice, France ; University of Nice-Sophia Antipolis; Faculty of Medicine; Nice, France.
Abstract
LINE-1 (L1) elements are the only active and autonomous transposable elements in humans. The core retrotransposition machinery is a ribonucleoprotein particle (RNP) containing the L1 mRNA, with endonuclease and reverse transcriptase activities. It initiates reverse transcription directly at genomic target sites upon endonuclease cleavage. Recently, using a direct L1 extension assay (DLEA), we systematically tested the ability of native L1 RNPs to extend DNA substrates of various sequences and structures. We deduced from these experiments the general rules guiding the initiation of L1 reverse transcription, referred to as the snap-velcro model. In this model, L1 target choice is not only mediated by the sequence specificity of the endonuclease, but also through base-pairing between the L1 mRNA and the target site, which permits the subsequent L1 reverse transcription step. In addition, L1 reverse transcriptase efficiently primes L1 DNA synthesis only when the 3' end of the DNA substrate is single-stranded, suggesting so-far unrecognized DNA processing steps at the integration site.
LINE-1 (L1) elements are the only active and autonomous transposable elements in humans. The core retrotransposition machinery is a ribonucleoprotein particle (RNP) containing the L1 mRNA, with endonuclease and reverse transcriptase activities. It initiates reverse transcription directly at genomic target sites upon endonuclease cleavage. Recently, using a direct L1 extension assay (DLEA), we systematically tested the ability of native L1 RNPs to extend DNA substrates of various sequences and structures. We deduced from these experiments the general rules guiding the initiation of L1 reverse transcription, referred to as the snap-velcro model. In this model, L1 target choice is not only mediated by the sequence specificity of the endonuclease, but also through base-pairing between the L1 mRNA and the target site, which permits the subsequent L1 reverse transcription step. In addition, L1 reverse transcriptase efficiently primes L1 DNA synthesis only when the 3' end of the DNA substrate is single-stranded, suggesting so-far unrecognized DNA processing steps at the integration site.
L1 Elements are Endogenous Mutagens in the Human Genome
Transposable elements account for half to two-thirds of the human genome. Among them, LINE-1 (L1) non-LTR retrotransposons form the only autonomous and active family and are the most abundant, representing 17% of our DNA. Each individual genome contains hundreds of potentially active L1 copies, and hundreds of thousands of defective copies, which are truncated, fragmented, and/or mutated. The active copies can proliferate via an RNA-mediated copy-and-paste mechanism, called retrotransposition. L1 insertions are intrinsically mutagenic, however their actual impact on gene expression depends on their specific site of integration. Intergenic or deep intronic insertions can often have no detectable effects on genes. In contrast, insertions in exons or regulatory sequences have the potential to profoundly alter gene expression/function, by disrupting coding- or cis-regulating sequences, or by carrying cis-regulating sequences (transcription factor binding sites, cryptic splicing and polyadenylation sites, etc.)., Hence, germline L1 insertions sporadically cause de novo genetic diseases,, and somatic L1 retrotransposition in cancer has been shown to contribute to tumor genome dynamics, including driver mutations.- Therefore, exploring the mechanisms that influence L1 target choice is crucial to our understanding of L1-driven genome plasticity.
L1 Retrotransposition can be Initiated through Two Different Pathways
The L1 replication cycle starts with the synthesis of a bicistronic mRNA coding for the two L1 proteins, ORF1p and ORF2p (Fig. 1). ORF1p is a 40 kDa RNA-binding protein able to form trimers.- It exhibits nucleic acid chaperone activity, the function of which has not been elucidated. ORF2p is a large, 149 kDa, protein with endonuclease (EN) and reverse transcriptase (RT) activities., Both ORF1p and ORF2p bind the L1 RNA to form a stable ribonucleoprotein particle (RNP), the core of the L1 retrotransposition machinery., The L1 RNP can mediate two different integration processes. In the canonical pathway, called target-primed reverse transcription (TPRT), the L1 EN activity produces a nick at the recognized target site in the chromosomal DNA. The RT moiety then extends this liberated 3′-OH group, using the L1 RNA as a template. Reverse transcription is primed within the poly(A) tail of the L1 RNA. L1 EN preferentially cuts DNA at a consensus sequence 5′-TTTTA-3′, with nicking occurring at the TpA bond.,- In an alternative pathway, named endonuclease-independent (ENi) retrotransposition or non-classical L1 insertion (NCLI), reverse transcription is initiated at pre-existing DNA lesions, without the need for endonuclease cleavage., A particular case of this pathway is retrotransposition at telomeres, the natural extremity of chromosomes. Regardless of whether a particular retrotransposition event is initiated by EN or not, the subsequent steps of the reaction, such as second strand DNA synthesis or ligation of the 3′ ends of the neo-synthesized DNA to the target DNA, have not been explored yet.
Figure 1. The L1 life-cycle. L1 replication starts by the transcription of a bicistronic mRNA (A). The L1 RNA is exported to the cytoplasm (B). ORF1p and ORF2p proteins are translated and bind to the L1 RNA to form L1 ribonucleoprotein particles (RNP) (C). The L1 RNP is imported into the nucleus (D). Integration and reverse transcription occur at the genomic target site. First, the L1 endonuclease (EN) activity nicks the target DNA (red arrowhead, E). Then, the L1 reverse transcriptase (RT) initiates the reverse transcription of L1 RNA (black arrowhead, F). The mechanisms involved in the final steps of this process and the resolution of the integration are unresolved yet (G). Partial reverse transcription can lead to 5′-truncated L1 copies.
Figure 1. The L1 life-cycle. L1 replication starts by the transcription of a bicistronic mRNA (A). The L1 RNA is exported to the cytoplasm (B). ORF1p and ORF2p proteins are translated and bind to the L1 RNA to form L1 ribonucleoprotein particles (RNP) (C). The L1 RNP is imported into the nucleus (D). Integration and reverse transcription occur at the genomic target site. First, the L1 endonuclease (EN) activity nicks the target DNA (red arrowhead, E). Then, the L1 reverse transcriptase (RT) initiates the reverse transcription of L1 RNA (black arrowhead, F). The mechanisms involved in the final steps of this process and the resolution of the integration are unresolved yet (G). Partial reverse transcription can lead to 5′-truncated L1 copies.
The Snap-Velcro Model of L1 Reverse Transcription Initiation
Recently, we explored the mechanism of L1 reverse transcription initiation using a direct L1 extension assay (DLEA). In this approach, native L1 RNP expressed in—and enriched from—human cells are incubated with oligonucleotide primers of various sequences or structures, and with radioactive dTTP only, for a very short time (less than 5 min). The products are then resolved on sequencing gels to directly visualize the extension of the primer. Due to the short incubation time and the use of dTTP only, it focuses on the initiation of reverse transcription. Advantages of this method, compared with previously PCR-based techniques, include its versatility with regards to the primers that can be used and its quantitative nature. A limitation of DLEA is the absence of sequence information and its lower sensitivity.One of the unresolved questions related to L1 reverse transcription priming was whether—or to which degree—the 3′ end of the nicked genomic DNA needs to be accessible and to base-pair with the poly(A) tail of the L1 RNA. Indeed, R2, a related non-LTR retrotransposon which has been used to establish the basis of the TPRT model, does not require such a complementarity., Although the consensus sequence released upon L1 EN cleavage (5′-TTTT-3′) could in principle anneal to the poly(A) tail of the L1 RNA, it is extremely short for maintaining a stable interaction and the actual sequences cleaved by the L1 EN can significantly differ from the consensus sequence.- To directly address this question, we quantified the efficiency of extension of more than 65 primers by DLEA. Based on the results of these experiments, and on additional analyses of the distribution of polymorphic L1 insertions in the human genome, we proposed the snap-velcro model for L1 reverse transcription initiation. This model is detailed in Figure 2. The efficiency of reverse transcription initiation is influenced by the 10 last nucleotides of the target DNA. The 4 last nucleotides (the snap) contribute the most to this process. Reverse transcription priming is the most efficient when the snap corresponds to 4 Ts (snap closed). However suboptimal sequences with terminal or internal mismatches can be tolerated (snap open). For terminal mismatches, the efficiency of extension depends on the nature of the base ending the primer (T > C > A > G). These suboptimal sequences can be more efficiently extended if mismatches are compensated by an increased number of matching Ts in the upstream 6 nt (velcro strap fastened). Finally an important aspect of our results is that priming only occurs if the DNA substrate is single-stranded. Indeed, double-stranded DNA substrates are extended only if they end with a 3′ overhang. If the 3′ extremity of the target DNA is embedded in duplex DNA, either as a blunt- or as a 3′-recessed end, no extension could be detected under the DLEA conditions employed.
Figure 2. Features of the snap-velcro model of L1 reverse transcription priming. (A) Reverse transcription priming only occurs if the DNA substrate is single-stranded. (B) Reverse transcription priming requires base-pairing between the L1 RNA (pink) poly(A) tail and the target-site DNA (green). The snap (bold green) corresponds to the last 4 nucleotides at the 3′ of the DNA primer. The velcro (light green) contains the 6 bases upstream of the snap. The snap is considered as closed if 4 nucleotides are T. The velcro is tightly fastened if the position-weighted T-density is superior to 0.5 (see ref. 28 for the detailed numerical model). The snap-velcro status predicts the efficiency of L1 reverse transcription priming (green arrow).
Figure 2. Features of the snap-velcro model of L1 reverse transcription priming. (A) Reverse transcription priming only occurs if the DNA substrate is single-stranded. (B) Reverse transcription priming requires base-pairing between the L1 RNA (pink) poly(A) tail and the target-site DNA (green). The snap (bold green) corresponds to the last 4 nucleotides at the 3′ of the DNA primer. The velcro (light green) contains the 6 bases upstream of the snap. The snap is considered as closed if 4 nucleotides are T. The velcro is tightly fastened if the position-weighted T-density is superior to 0.5 (see ref. 28 for the detailed numerical model). The snap-velcro status predicts the efficiency of L1 reverse transcription priming (green arrow).
Consequences of the Snap-Velcro Model Related to Primer-Template Sequence Match
The snap-velcro model indicates that complementarity between the L1 poly(A) tail and the last 10 nucleotides of the target DNA is important for priming of L1 reverse transcription, yet allows sufficient flexibility to accommodate a wide range of potential target sites. It explains a long-standing observation that L1 elements are often flanked by imperfect T-rich sequences significantly longer than expected for the recognition site of the L1 endonuclease.,,, This model also implies that L1 target-site selection relies not only on the sequence specificity of the EN nicking reaction, but also on the subsequent ability of the RT to efficiently extend the cleaved product. This observation has practical and technological implications. Non-LTR retrotransposons fall into two main classes: i) the stringent elements, which always insert into discrete and defined genomic locations with high sequence specificity (such as R1, R2, Tx1L, SART1, Tras1), and ii) more promiscuous elements, which insert into multiple locations within a very short and degenerate sequence (such as L1). EN domain swapping between two stringent retrotransposons is sufficient to exchange their respective target site selectivity both in vitro and in vivo. Similarly, structure-driven domain swapping between a stringent EN and the L1 EN moiety has succeeded in modifying its target selectivity in vitro. However, such EN variants, when reintroduced into a complete L1 element and tested in vivo, are unable to redirect L1 insertions to altered target sites. Instead, they continue to insert in T-rich stretches, indicating that other determinants downstream of the initial EN cleavage contribute to L1 target site selection. Our results suggest that L1 RT priming specificity could be one of these determinants. Therefore engineering L1 to achieve site-specific integration in vivo might require to take into account both EN and RT specificities.
Consequences of the Snap-Velcro Model Related to the Accessibility of the Target DNA
A second important feature of the snap-velcro model is the requirement for a single-stranded 3′ overhang to initiate reverse transcription at the target site. The original TPRT model stipulates that retrotransposition is initiated by a nick in the target DNA.,, How does the 3′-OH extremity of the substrate, which is embedded in the duplex DNA, become available for L1 RT? We can envision several possibilities. First, the L1 EN produces double-stranded staggered cuts rather than nicks. In vitro, plasmid DNA can indeed be linearized upon prolonged incubation with an isolated recombinant EN domain. Whether L1 ORF2p acts as a monomer or a multimer in the context of the L1 RNP is unknown. However, many other reverse transcriptases form dimers, including the R2 RT or human telomerase., Dimerization of ORF2p could lead to concomitant cleavage of bottom and top strands, while maintaining the two target site DNA extremities together (Fig. 3A). Of note, the average length of the target-site duplication, which reflects the distance between the top and bottom strand cuts, is 15 nucleotides, a length compatible with the minimal size of the single-stranded region (6 nt). A second hypothetical mechanism involves a strand-transfer or a DNA helicase activity (Fig. 3B). Although ORF1p has been proposed to perform this task through its nucleic acid chaperone activity, the presence of this protein in native L1 RNPs was not sufficient under our experimental conditions to prime reverse transcription with duplex DNA substrates. Recent efforts have identified a number of cellular proteins associated with the L1 retrotransposition machinery., Among them, Upf1 an RNA- and DNA-dependent 5′-3′ helicase implicated in nonsense-mediated mRNA decay, and also in telomeric DNA replication, is a serious candidate. It is noteworthy that our attempts to extend duplex DNA by L1 RNP were performed in the absence of ATP, an essential cofactor of helicase activities. Thus, to explore the mechanism allowing annealing of the target site DNA to the L1 poly(A) tail, future follow-up DLEA experiments could be performed with the additional inclusion of ATP. Finally, unlike a TPRT reaction, DLEA uncouples EN cleavage and RT priming. Thus we cannot exclude that, in vivo, the product of the nicking reaction would somehow be channeled to the RT active site, being unwounded in the process.
Figure 3. Hypothetical mechanisms allowing L1 RNA base-pairing with target-site DNA. (A) ORF2p dimerization leads to simultaneous staggered cuts through its EN activity. The resulting extremities have 3′ overhangs, which can anneal to the L1 RNA and prime L1 cDNA synthesis. (B) L1 EN initially starts with a single cut, but a DNA-dependent helicase unwinds the target site DNA strands, enabling L1 cDNA synthesis. (C) Upon double-strand DNA break, DNA repair factors resect these ends and generate 3′ overhangs. These new extremities not necessarily end with Ts as for EN sites. Consequently, base-pairing generally occurs at internal sites within the L1 RNA which show spurious matches with the damaged site. (D) Telomeres naturally end with 3′ overhangs. Red arrowhead, EN cut; green, cDNA; pink, L1 RNA.
Figure 3. Hypothetical mechanisms allowing L1 RNA base-pairing with target-site DNA. (A) ORF2p dimerization leads to simultaneous staggered cuts through its EN activity. The resulting extremities have 3′ overhangs, which can anneal to the L1 RNA and prime L1 cDNA synthesis. (B) L1 EN initially starts with a single cut, but a DNA-dependent helicase unwinds the target site DNA strands, enabling L1 cDNA synthesis. (C) Upon double-strand DNA break, DNA repair factors resect these ends and generate 3′ overhangs. These new extremities not necessarily end with Ts as for EN sites. Consequently, base-pairing generally occurs at internal sites within the L1 RNA which show spurious matches with the damaged site. (D) Telomeres naturally end with 3′ overhangs. Red arrowhead, EN cut; green, cDNA; pink, L1 RNA.In this perspective, DLEA reactions are more similar to reverse transcription taking place in the ENi pathway., The nature of the preexisting lesions used by L1 RT to prime reverse transcription is still unclear, in particular whether they are single- or double-stranded. Telomeres are frequent integration sites in this pathway,, and notably they end with a 3′ overhang, which is also the preferred substrate of telomerase (Fig. 3D). Another interesting observation is that ENi-retrotransposition only occurs efficiently in cells defective for both p53 and non-homologous end-joining (NHEJ)., Impairing NHEJ results in end-resection and extremities with 3′ overhangs, which ultimately leads to homologous recombination-mediated repair. We speculate that NHEJ inhibition could be necessary to permit the generation of these 3′ overhangs and thereby reverse transcription priming in the ENi pathway (Fig. 3C). Possible end-processing factors include the MRN complex (Mre11, Rad50, Nbs1), CtIP, Exo1, BLM or Dna2. Poly(ADP-ribose) polymerase 1 (PARP-1) was shown to interact with both ORF2p and with Mre11. Therefore it could potentially recruit both end-resection factors and the L1 machinery at sites of DNA damage.Because DLEA uses native L1 RNP enriched from cells, it can be combined with shRNA-mediated knock-down of specific cellular factors. This approach will help in the future to elucidate the role of cellular factors in retrotransposition.
Authors: Y Miki; I Nishisho; A Horii; Y Miyoshi; J Utsunomiya; K W Kinzler; B Vogelstein; Y Nakamura Journal: Cancer Res Date: 1992-02-01 Impact factor: 12.701
Authors: Ian Miller; Max Totrov; Lioubov Korotchkina; Denis N Kazyulkin; Andrei V Gudkov; Sergey Korolev Journal: Nucleic Acids Res Date: 2021-11-08 Impact factor: 16.971