John B Moldovan1, Yifan Wang2,3, Stewart Shuman4, Ryan E Mills2,3, John V Moran1,5. 1. Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109; jmoldova@umich.edu moranj@umich.edu. 2. Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109. 3. Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109. 4. Molecular Biology Program, Sloan Kettering Institute, New York, NY 10065. 5. Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109.
Abstract
Long interspersed element-1 (LINE-1 or L1) amplifies via retrotransposition. Active L1s encode 2 proteins (ORF1p and ORF2p) that bind their encoding transcript to promote retrotransposition in cis The L1-encoded proteins also promote the retrotransposition of small-interspersed element RNAs, noncoding RNAs, and messenger RNAs in trans Some L1-mediated retrotransposition events consist of a copy of U6 RNA conjoined to a variably 5'-truncated L1, but how U6/L1 chimeras are formed requires elucidation. Here, we report the following: The RNA ligase RtcB can join U6 RNAs ending in a 2',3'-cyclic phosphate to L1 RNAs containing a 5'-OH in vitro; depletion of endogenous RtcB in HeLa cell extracts reduces U6/L1 RNA ligation efficiency; retrotransposition of U6/L1 RNAs leads to U6/L1 pseudogene formation; and a unique cohort of U6/L1 chimeric RNAs are present in multiple human cell lines. Thus, these data suggest that U6 small nuclear RNA (snRNA) and RtcB participate in the formation of chimeric RNAs and that retrotransposition of chimeric RNA contributes to interindividual genetic variation.
Long interspersed element-1 (LINE-1 or L1) amplifies via retrotransposition. Active L1s encode 2 proteins (ORF1p and ORF2p) that bind their encoding transcript to promote retrotransposition in cis The L1-encoded proteins also promote the retrotransposition of small-interspersed element RNAs, noncoding RNAs, and messenger RNAs in trans Some L1-mediated retrotransposition events consist of a copy of U6 RNA conjoined to a variably 5'-truncated L1, but how U6/L1 chimeras are formed requires elucidation. Here, we report the following: The RNA ligase RtcB can join U6 RNAs ending in a 2',3'-cyclic phosphate to L1 RNAs containing a 5'-OH in vitro; depletion of endogenous RtcB in HeLa cell extracts reduces U6/L1 RNA ligation efficiency; retrotransposition of U6/L1 RNAs leads to U6/L1 pseudogene formation; and a unique cohort of U6/L1 chimeric RNAs are present in multiple human cell lines. Thus, these data suggest that U6 small nuclear RNA (snRNA) and RtcB participate in the formation of chimeric RNAs and that retrotransposition of chimeric RNA contributes to interindividual genetic variation.
Long interspersed element-1 sequences (LINE-1s or L1s) comprise ∼17% of human DNA and have mobilized by a replicative process termed retrotransposition (1, 2). Most L1s are immobile (3–5); however, an average human genome harbors ∼80 to 100 retrotransposition-competent L1s (RC-L1s) (6–8). Human RC-L1s are ∼6 kilobases (kb) and contain a 5′ untranslated region (UTR) that is followed by 2 open reading frames (ORFs) (ORF1 and ORF2) and a short 3′ UTR that ends in a variable length poly-adenosine [poly(A)] tract (4, 9). ORF1 encodes an ∼40-kDa nucleic acid binding protein (ORF1p) (10–14) that has nucleic acid chaperone activity (14, 15). ORF2 encodes an ∼150-kDa protein (ORF2p) (16–18) that has DNA endonuclease (L1 EN) (19) and reverse transcriptase (L1 RT) (20, 21) activities. ORF1p, ORF2p, and full-length polyadenylated L1 RNA are required for efficient L1 retrotransposition in cis (19, 22, 23).L1 retrotransposition begins with the transcription of a full-length genomic L1 from an RNA polymerase II promoter that resides within its 5′ UTR (24–26). The bicistronic L1 messenger RNA (mRNA) is exported to the cytoplasm where it undergoes translation (27–29). ORF1p and ORF2p preferentially bind their encoding L1 mRNA by a process termed cis-preference (30, 31), leading to the formation of an L1 ribonucleoprotein particle (RNP) (10, 13, 17, 32, 33). The association of ORF2p with the L1 mRNA poly(A) tail is a critical step in both L1 RNP formation and L1 retrotransposition (23). Components of the L1 RNP then enter the nucleus where a new L1 copy is integrated into genomic DNA by target site primed reverse transcription (TPRT) (19, 34, 35).U6 small nuclear RNA (snRNA) is a uridine-rich small noncoding RNA that plays an essential role in nuclear intron splicing (36–38). U6 snRNA is the most conserved spliceosomal snRNA (39), is transcribed by RNA polymerase III (40), and has structural and functional similarities to domain V of self-splicing group II introns (41–44). The major form of U6 snRNA terminates in a 5-base poly-uridine [poly(U)] tract that ends in a terminal 2′,3′-cyclic phosphate (45). The 2′,3′-cyclic phosphate is generated posttranscriptionally by the Mpn1 enzyme (46, 47), which is encoded by the U6 snRNA biogenesis phosphodiesterase 1 (USB1) gene. Deletions or mutations in USB1 disrupt U6 snRNA 3′ end processing (46, 47) and are associated with the humangenetic disease poikiloderma with neutropenia (48).The L1 proteins can act in trans to promote the retrotransposition of a variety of cellular RNAs, including, small interspersed element (SINE) RNAs (49–52), noncoding RNAs (53–56), and messenger RNAs (30, 31). L1-mediated retrotransposition events have likely dispersed hundreds of copies of U6 snRNA throughout the human genome (57). Approximately 100 to 200 U6 pseudogenes consist of a copy of U6 fused to either a variably 5′-truncated L1 or a complementary DNA (cDNA) derived from a cellular RNA (53, 55–58), and they contain structural hallmarks that indicate they were formed by L1 retrotransposition [e.g., they end in a poly(A) tail, integrate into an L1 EN consensus cleavage sequence, and are flanked by short, variably sized target-site duplications] (53, 55, 56, 58). Experiments using engineered human L1s suggest that U6/L1 chimeric pseudogenes account for up to 1 out of 15 L1 retrotransposition events in HeLa cells (55) whereas computational analyses strongly suggest that 3 different LINE clades (LINE-1, LINE-2, and RTE) participate in U6/LINE chimeric pseudogene formation (57). Thus, U6/LINE chimeric pseudogene formation appears to be an ancient and ongoing process.Here, we used genetic, molecular biological, biochemical, and computational approaches to dissect the mechanism of U6/L1 chimeric pseudogene formation. We demonstrate that U6/L1 chimeric RNAs arise independently of L1 retrotransposition and are formed through the ligation of a 2′,3′-cyclic phosphate on the 3′ end of U6 snRNA and 5′-OH on L1 RNA. Biochemical and genetic evidence suggest that the RNA 2′,3′-cyclic phosphate ligase RtcB (59–61) can join U6 RNAs ending in a 2′,3′-cyclic phosphate to both L1 and other mRNAs (e.g., green fluorescent protein [GFP] RNAs) containing a 5′-OH. Finally, we demonstrate that U6/L1 chimeric RNAs are a component of the transcriptome in multiple human cell lines.
Results
U6/L1 RNA Is Generated Independently of L1 Retrotransposition.
Previous hypotheses suggested that U6/L1 chimeric pseudogene formation could occur if ORF2p undergoes a template-switching event from L1 RNA to the 3′ end of U6 snRNA during TPRT (53, 55). However, recent studies have demonstrated that ORF2p exhibits a profound preference for binding, either directly or indirectly, to the 3′ poly(A) tract of L1, Alu, and other cellular RNAs (23, 62), raising the question of how ORF2p would switch templates to an RNA (e.g., U6 snRNA) that ends in a poly(U) tract. We hypothesized that U6 snRNA could be joined to L1 RNA to form a chimeric U6/L1 RNA prior to retrotransposition.To test the above hypothesis, HeLa-JVM cells were transfected with either a wild-type engineered humanL1 expression plasmid (Fig. 1) (pJM101/L1.3Δneo) or humanL1 expression plasmids that contain a nonsense mutation in ORF1 (Fig. 1) (pJM108/L1.3Δneo) or missense mutations in the L1 EN and/or L1 RT domain of ORF2p that severely inhibit L1 retrotransposition (Fig. 1) (pJBM119/L1.3Δneo or pJM105/L1.3Δneo) (19, 22, 31). Whole cell RNAs from transfected HeLa cells were subjected to cDNA synthesis using an oligo-dT primer, and the resultant cDNAs were used as templates in nested reverse-transcriptase PCRs (RT-PCRs) using primers complementary to U6 snRNA and the 3′ end of the engineered L1 plasmid (Fig. 1). To ensure specificity, the outer L1 primer (Fig. 1) (SV40as) was complementary to a specific sequence within the engineered L1 expression construct. The RT-PCR products were separated on agarose gels, visible DNA fragments were isolated from gels, and the products were characterized using Sanger sequencing (Fig. 1). RT-PCR products were not detected in control RT-PCR experiments that lacked the reverse transcriptase ().
Fig. 1.
Chimeric U6/L1 RNA is generated in HeLa cells transfected with human L1 expression constructs. (A) Schematics of wild-type and mutant L1s. Gray rectangles represent 5′ UTR, inter-ORF space, and 3′ UTR, respectively; yellow rectangle, ORF1; blue rectangle, ORF2. L1s were cloned into the pCEP4 mammalian expression vector. A cytomegalovirus immediate early promoter (CMV, black rectangle) that augments L1 expression and an SV40 polyadenylation signal (light blue rectangle, pA) flank the L1. The pJM101/L1.3Δneo plasmid expresses an active human L1 (L1.3). The pJM108/L1.3Δneo, pJM105/L1.3Δneo, and pJBM119/L1.3Δneo plasmids express versions of L1.3 that contain mutations that render them unable to retrotranspose; the approximate locations of the respective mutations are indicated in the schematic. (B) Rationale of the RT-PCR experiments used to detect U6/L1 chimeric RNAs. HeLa cells were transfected with L1 expression plasmids, total cellular RNA was extracted ∼48 h posttransfection, and cDNAs were synthesized using an oligo-dT primer. Nested PCR was carried out using primers complementary to sequences within U6 and the 3′ end of the L1 construct (U6s1 and SV40as, then U6s2 and 3UTRas3). (C) Results from a representative RT-PCR experiment. The transfected L1 construct is indicated above each lane of the agarose gel image. Each lane contains a single biological replicate. Lane 1, HeLa UTF (untransfected HeLa cells); lanes 2 and 3, HeLa transfected with pJM101/L1.3Δneo; lanes 4 and 5, HeLa transfected with pJM105/L1.3Δneo; lanes 6 and 7, HeLa transfected with pJBM119/L1.3Δneo; lanes 8 and 9, HeLa transfected with pJM108/L1.3Δneo; lanes 10 and 11, H2O PCR controls. Molecular weight standards (in bp) are shown in the first and last gel lanes. At least 3 independent biological replicates were conducted for each transfection condition. (D) Structures of 38 U6/L1 chimeric RNAs found in transfected HeLa cells. U6/L1 RNA chimera sequences contain the 3′ terminus of U6 snRNA cDNA (white arrow) ending in ∼4 to 6 thymidine nucleotides (Tn) conjoined to a variable 5′−truncated L1. A schematic of the full-length L1.3 sequence is represented at the top of the schematic. The horizontal black lines indicate the approximate length of L1 sequence conjoined to the U6 poly(T) tract. The 5′-most U6/L1 junction occurred at L1.3 nucleotide position 4387.
Chimeric U6/L1 RNA is generated in HeLa cells transfected with humanL1 expression constructs. (A) Schematics of wild-type and mutant L1s. Gray rectangles represent 5′ UTR, inter-ORF space, and 3′ UTR, respectively; yellow rectangle, ORF1; blue rectangle, ORF2. L1s were cloned into the pCEP4 mammalian expression vector. A cytomegalovirus immediate early promoter (CMV, black rectangle) that augments L1 expression and an SV40 polyadenylation signal (light blue rectangle, pA) flank the L1. The pJM101/L1.3Δneo plasmid expresses an active humanL1 (L1.3). The pJM108/L1.3Δneo, pJM105/L1.3Δneo, and pJBM119/L1.3Δneo plasmids express versions of L1.3 that contain mutations that render them unable to retrotranspose; the approximate locations of the respective mutations are indicated in the schematic. (B) Rationale of the RT-PCR experiments used to detect U6/L1 chimeric RNAs. HeLa cells were transfected with L1 expression plasmids, total cellular RNA was extracted ∼48 h posttransfection, and cDNAs were synthesized using an oligo-dT primer. Nested PCR was carried out using primers complementary to sequences within U6 and the 3′ end of the L1 construct (U6s1 and SV40as, then U6s2 and 3UTRas3). (C) Results from a representative RT-PCR experiment. The transfected L1 construct is indicated above each lane of the agarose gel image. Each lane contains a single biological replicate. Lane 1, HeLa UTF (untransfected HeLa cells); lanes 2 and 3, HeLa transfected with pJM101/L1.3Δneo; lanes 4 and 5, HeLa transfected with pJM105/L1.3Δneo; lanes 6 and 7, HeLa transfected with pJBM119/L1.3Δneo; lanes 8 and 9, HeLa transfected with pJM108/L1.3Δneo; lanes 10 and 11, H2O PCR controls. Molecular weight standards (in bp) are shown in the first and last gel lanes. At least 3 independent biological replicates were conducted for each transfection condition. (D) Structures of 38 U6/L1 chimeric RNAs found in transfected HeLa cells. U6/L1 RNA chimera sequences contain the 3′ terminus of U6 snRNA cDNA (white arrow) ending in ∼4 to 6 thymidine nucleotides (Tn) conjoined to a variable 5′−truncated L1. A schematic of the full-length L1.3 sequence is represented at the top of the schematic. The horizontal black lines indicate the approximate length of L1 sequence conjoined to the U6 poly(T) tract. The 5′-most U6/L1 junction occurred at L1.3 nucleotide position 4387.Sequence analyses revealed the presence of U6/L1 chimeric cDNAs in HeLa cells transfected with either the wild-type or mutant L1 expression plasmids (Fig. 1 , lanes 2 to 9 and ), but not in independent control experiments using untransfected HeLa cells (Fig. 1, lane 1). The cDNA products typically consisted of the 3′ end of U6 snRNA cDNA ending in ∼4 to 6 thymidines conjoined to a variably 5′-truncated L1 sequence derived from the transfected L1 expression plasmid and closely resembled those of previously characterized genomic U6/L1 pseudogenes (53, 55–58). The constellation of U6/L1 cDNA products generally varied in size between ∼100 and ∼1,000 base pairs, depending upon where the 3′ end of U6 was conjoined to the 5′-truncated L1 sequence. The size and number of cDNA fragments for each transfection condition also varied between independent RT-PCR experiments. Additional analyses revealed that there were not specific sequences within L1 that facilitated U6/L1 chimeric RNA formation (Fig. 1 and ).BLAT searches (63) revealed that the U6/L1 junction sequences were not present in the human genome reference sequence (HGR/build Grch38), indicating that they did not arise from the transcription of existing genomic U6/L1 pseudogenes. Notably, atypical U6/L1 chimeric cDNAs consisting of 3′-truncated U6 conjoined to a 5′-truncated L1 also were recovered from these experiments. These cDNAs sometimes exhibited microhomologies of 1 to 3 nucleotides (nt) at the U6/L1 cDNA junction and were similar in structure to previously described artifacts encountered in RT-PCR experiments () (64). Thus, U6/L1 chimeric RNAs are generated in transfected HeLa cells independently of L1 retrotransposition.
Purified RtcB Ligates U6 RNA to L1 RNA In Vitro.
We next investigated the mechanism of U6/L1 chimeric RNA formation. The 3′ end of mature U6 snRNA terminates in a 2′,3′-cyclic phosphate (38, 45). During transfer RNA (tRNA) splicing, the tRNA splicing endoribonuclease excises an intron from a subset of tRNA precursor RNAs, generating a 2′,3′-cyclic phosphate and a 5′-OH on the 3′ and 5′ ends of the cleaved tRNA halves, respectively (65). In archaea and animals, the cleaved tRNAs are spliced together by the RNA 2′,3′-cyclic phosphate ligase RtcB (59–61). RtcB joins 2′,3′-cyclic-PO4 and 5′-OH ends via RNA-3′-PO4 and RNA-(3′)pp(5′)G intermediates (66, 67).We hypothesized RtcB could ligate U6 snRNA to L1 RNA to generate U6/L1 chimeric RNAs. To test this hypothesis, we used ribozymes to generate a synthetic human U6 snRNA bearing a 2′,3′-cyclic phosphate (herein called U6 > P) and a synthetic L1 RNA fragment containing a 5′-OH (herein called OH-L1) (). We then used a cDNA oligonucleotide as a splint to ensure that the U6 snRNA 2′,3′-cyclic phosphate and L1 5′-OH ends remained in close proximity to one another (Fig. 2). We reasoned that the DNA oligonucleotide splint would simulate the situation encountered during tRNA splicing where base pairing interactions within and between the tRNA halves stabilize the cleaved tRNA. The resultant RNA/DNA complex was incubated with purified bacterial RtcB (59) and treated with DNase I to remove the oligonucleotide splint. The RNAs were purified, and nested RT-PCR was used to detect U6/L1 chimeric RNA (Fig. 2 ).
Fig. 2.
Purified recombinant RtcB ligates U6 RNA to L1 RNA in vitro. (A) The rationale of the U6/L1 in vitro ligation experiment. A synthetic human U6 RNA containing a 2′,3′-cyclic phosphate (>P, red circle) and a synthetic L1 RNA (blue font) containing a 5′-OH (black circle) were generated using a ribozyme-based in vitro transcription reaction. U6 and L1 RNAs were splinted with a cDNA oligonucleotide (DNA splint, italics), and then the resultant RNA/DNA hybrid was incubated with purified RtcB (black rectangle, white font) for 1 h. The reaction was then treated with DNase I, the RNA was purified, and cDNAs were synthesized using the SV40as oligonucleotide primer. RT-PCR reactions using nested primers (U6s1 and SV40as, then U6s2 and 3UTRas3) were used to detect U6/L1 chimeric cDNAs. (B) Schematic representations of the synthetic RNAs used in in vitro experiments. The in vitro transcribed U6 RNA ends in 4 uridine ribonucleotides and contains a 2′,3′-cyclic phosphate (U6 > P) or a 3′-OH (U6-OH). The in vitro transcribed L1 RNA consists of pJM101/L1.3Δneo sequence (nt positions 5752 to 6087) and contains a 5′-OH (OH-L1) or a 5′-triphosphate (P-L1). (C) Results from the in vitro U6/L1 ligation reactions. The constituents of U6/L1 ligation reactions are indicated above each gel lane (+) of the agarose gel image. An asterisk (*) indicates that RtcB was heat treated at 95 °C for 10 min prior to adding it to the reaction. No RT, no RT control; H2O, water PCR controls. DNA size markers (in bp) are shown to the left of the gel image. The predicted position of the 305-bp U6/L1 RT-PCR product is noted on the left side of the gel image (white arrow, red font). (D) Summary of results from product characterization experiments. Column 1, synthetic RNAs used in the reaction; column 2, number of RT-PCR products characterized for each reaction condition; column 3, number of RT-PCR products that correspond to the full-length ligation product; column 4, number of RT-PCR products that contain a variably 5′-truncated L1 sequence; column 5, number of putative RT-PCR artifact products. Each in vitro experiment was repeated 3 independent times and yielded similar results.
Purified recombinant RtcB ligates U6 RNA to L1 RNA in vitro. (A) The rationale of the U6/L1 in vitro ligation experiment. A synthetic human U6 RNA containing a 2′,3′-cyclic phosphate (>P, red circle) and a synthetic L1 RNA (blue font) containing a 5′-OH (black circle) were generated using a ribozyme-based in vitro transcription reaction. U6 and L1 RNAs were splinted with a cDNA oligonucleotide (DNA splint, italics), and then the resultant RNA/DNA hybrid was incubated with purified RtcB (black rectangle, white font) for 1 h. The reaction was then treated with DNase I, the RNA was purified, and cDNAs were synthesized using the SV40as oligonucleotide primer. RT-PCR reactions using nested primers (U6s1 and SV40as, then U6s2 and 3UTRas3) were used to detect U6/L1 chimeric cDNAs. (B) Schematic representations of the synthetic RNAs used in in vitro experiments. The in vitro transcribed U6 RNA ends in 4 uridine ribonucleotides and contains a 2′,3′-cyclic phosphate (U6 > P) or a 3′-OH (U6-OH). The in vitro transcribed L1 RNA consists of pJM101/L1.3Δneo sequence (nt positions 5752 to 6087) and contains a 5′-OH (OH-L1) or a 5′-triphosphate (P-L1). (C) Results from the in vitro U6/L1 ligation reactions. The constituents of U6/L1 ligation reactions are indicated above each gel lane (+) of the agarose gel image. An asterisk (*) indicates that RtcB was heat treated at 95 °C for 10 min prior to adding it to the reaction. No RT, no RT control; H2O, water PCR controls. DNA size markers (in bp) are shown to the left of the gel image. The predicted position of the 305-bp U6/L1 RT-PCR product is noted on the left side of the gel image (white arrow, red font). (D) Summary of results from product characterization experiments. Column 1, synthetic RNAs used in the reaction; column 2, number of RT-PCR products characterized for each reaction condition; column 3, number of RT-PCR products that correspond to the full-length ligation product; column 4, number of RT-PCR products that contain a variably 5′-truncated L1 sequence; column 5, number of putative RT-PCR artifact products. Each in vitro experiment was repeated 3 independent times and yielded similar results.Agarose gel analyses revealed that reactions containing the U6 > P and OH-L1 templates yielded the predicted 305-bp U6/L1 cDNA product (Fig. 2, lane 3), which was not visible in negative controls (Fig. 2, lanes 1 and 2). DNA sequencing revealed that 39 of 46 (∼85%) U6/L1 cDNA products contained a copy of the 3′ end of a U6 snRNA cDNA ending in 4 thymidine nucleotides precisely conjoined to the 5′-OH of the full-length L1 RNA fragment (Fig. 2). Three products (∼7%) contained a copy of the 3′ end of a U6 snRNA cDNA ending in 4 thymidine nucleotides conjoined to a 5′-truncated L1 fragment. These products could arise from the ligation of U6 > P to broken or degraded L1 RNA fragments containing a 5′-OH (Fig. 2 and ). Four RT-PCR products contained either a 3′-truncated U6 snRNA cDNA sequence conjoined to a 5′-truncated L1, a U6 snRNA followed by a partial hepatitis delta virus ribozyme (HDVr) sequence conjoined to a 5′-truncated L1, and/or untemplated nucleotides at the U6/L1 junction. These products also sometimes exhibited a microhomology of 1 to 3 nucleotides at the U6/L1 junction and structurally resemble previously described artifacts generated during RT-PCR (64) (Fig. 2 and ).
U6/L1 RNA Ligation Requires a 2′,3′-Cyclic Phosphate and a 5′-OH.
To test whether the U6 2′,3′-cyclic phosphate was required for RtcB-mediated U6/L1 RNA ligation, purified RtcB was incubated with a synthetic U6 snRNA that contained a 3′-OH terminus (U6-OH) and an L1 that contained a 5′ hydroxyl end (OH-L1) (Fig. 2 ). In contrast to reactions with U6 > P, the 305-bp cDNA diagnostic for bona fide U6/L1 chimeric cDNAs was not overtly visible in reactions containing the U6-OH substrate (Fig. 2, lane 4). The majority (39 of 43) of recovered cDNA products appeared to be RT-PCR artifacts similar to those described in the preceding paragraph (Fig. 2 and ). Only 4 RT-PCR products contained a copy of the 3′ end of a U6 snRNA cDNA ending in 4 thymidine nucleotides precisely conjoined to the 5′-OH of the full-length L1 RNA (Fig. 2).To test whether an L1 5′-OH terminus was required for RtcB-mediated U6/L1 RNA ligation, we incubated purified RtcB with U6 > P and an L1 RNA fragment containing a 5′-triphosphate (P-L1) (Fig. 2 ). The 305-bp cDNA diagnostic for bona fide U6/L1 chimeras was not overtly visible in reactions that contained P-L1 substrate (Fig. 2, lane 5). We did not recover any full-length U6/L1 chimeric cDNAs (0 of 26) from these reactions; however, 5 of 26 (∼19%) products contained the 3′ end of a U6 snRNA cDNA ending in 4 thymidine nucleotides conjoined to a 5′-truncated P-L1 (Fig. 2 and ). As above, these products could result from the ligation of U6 > P to broken or degraded L1 RNAs containing a 5′-OH. Thus, efficient RtcB-mediated ligation in vitro requires a U6 snRNA ending with a 2′,3′-cyclic phosphate and an L1 RNA that contains a 5′-OH end.
U6 snRNA and L1 RNA Are Ligated in HeLa Cell Nuclear Extracts.
We next tested whether U6 and L1 RNA could be ligated in cell free extracts. Briefly, we incubated U6 > P and OH-L1 RNA in HeLa cell nuclear extracts and then used the same nested RT-PCR strategy used in the in vitro assay to detect U6/L1 chimeric cDNAs (Fig. 2 and ). Of note, we did not use a DNA oligonucleotide splint in these assays as endogenous RNase H activity may result in the degradation of the complementary regions of the U6 > P and OH-L1 RNAs. Control Western blots confirmed the presence of RtcB in HeLa nuclear extracts (Fig. 3 and ).
Fig. 3.
HeLa cell nuclear extracts mediate the ligation of U6 and L1 RNAs. (A) Results from U6/L1 ligation reactions using HeLa cell nuclear extracts. The ligated U6/L1 RNA was purified from ligation reactions and analyzed using RT-PCR and agarose gel electrophoresis. The constituents of U6/L1 ligation reactions are indicated above each lane (+) of the representative agarose gel image. An asterisk (*) indicates that the HeLa cell nuclear extract was heat treated at 95 °C for 10 min prior to adding it to the reaction. No RT, no RT control; H2O, water PCR controls. DNA size markers (in bp) are shown to the left of the gel image. The predicted position of the 305-bp U6/L1 RT-PCR product is noted on the left side of the gel image (white arrow, red font). (B) Summary of results of ligation reactions using HeLa cell extracts. Column 1, RNAs used in the reaction; column 2, number of RT-PCR products characterized for each reaction condition; column 3, number of RT-PCR products that correspond to the full-length ligation product; column 4, number of RT-PCR products that contain a variably 5′-truncated L1 sequence; column 5, number of putative RT-PCR artifact products. (C) Structures of U6/L1 chimeric RNAs containing 5′-variably truncated L1 sequences. A schematic of the L1 fragment used as a template for the in vitro transcription reaction is represented at the top. The horizontal black lines indicate the approximate length of the L1 sequence conjoined to the U6 poly(T) tract. U6/L1 chimeric RNAs were isolated from 19 independent experiments. (D) Depletion of RtcB protein expression in HeLa cell nuclear extracts. Western blot images depicting RtcB expression (green arrow) in HeLa nuclear extracts. Extract sources are indicated above each lane (HeLa indicates untransfected HeLa extracts). Each lane represents an independent biological replicate. The Western blot experiment was done twice. Nucleolin (NCL) (red arrow) was used as a loading control. Approximate molecular weights are indicated to the left of the gel image. (E) Depletion of RtcB affects U6/L1 ligation efficiency in HeLa extracts. The x axis indicates the experimental condition. The y axis indicates the normalized U6/L1 ligation efficiency. Ligation efficiencies were normalized to untransfected HeLa-JVM extracts, which are set to 1. The ligation efficiency value represents the average of 6 independent RT-qPCR experiments. Error bars indicate SDs. Two-tailed t tests were used to determine significance. An asterisk (*) indicates P value < 0.05; n.s., not significant.
HeLa cell nuclear extracts mediate the ligation of U6 and L1 RNAs. (A) Results from U6/L1 ligation reactions using HeLa cell nuclear extracts. The ligated U6/L1 RNA was purified from ligation reactions and analyzed using RT-PCR and agarose gel electrophoresis. The constituents of U6/L1 ligation reactions are indicated above each lane (+) of the representative agarose gel image. An asterisk (*) indicates that the HeLa cell nuclear extract was heat treated at 95 °C for 10 min prior to adding it to the reaction. No RT, no RT control; H2O, water PCR controls. DNA size markers (in bp) are shown to the left of the gel image. The predicted position of the 305-bp U6/L1 RT-PCR product is noted on the left side of the gel image (white arrow, red font). (B) Summary of results of ligation reactions using HeLa cell extracts. Column 1, RNAs used in the reaction; column 2, number of RT-PCR products characterized for each reaction condition; column 3, number of RT-PCR products that correspond to the full-length ligation product; column 4, number of RT-PCR products that contain a variably 5′-truncated L1 sequence; column 5, number of putative RT-PCR artifact products. (C) Structures of U6/L1 chimeric RNAs containing 5′-variably truncated L1 sequences. A schematic of the L1 fragment used as a template for the in vitro transcription reaction is represented at the top. The horizontal black lines indicate the approximate length of the L1 sequence conjoined to the U6 poly(T) tract. U6/L1 chimeric RNAs were isolated from 19 independent experiments. (D) Depletion of RtcB protein expression in HeLa cell nuclear extracts. Western blot images depicting RtcB expression (green arrow) in HeLa nuclear extracts. Extract sources are indicated above each lane (HeLa indicates untransfected HeLa extracts). Each lane represents an independent biological replicate. The Western blot experiment was done twice. Nucleolin (NCL) (red arrow) was used as a loading control. Approximate molecular weights are indicated to the left of the gel image. (E) Depletion of RtcB affects U6/L1 ligation efficiency in HeLa extracts. The x axis indicates the experimental condition. The y axis indicates the normalized U6/L1 ligation efficiency. Ligation efficiencies were normalized to untransfected HeLa-JVM extracts, which are set to 1. The ligation efficiency value represents the average of 6 independent RT-qPCR experiments. Error bars indicate SDs. Two-tailed t tests were used to determine significance. An asterisk (*) indicates P value < 0.05; n.s., not significant.Similar to experiments using purified RtcB, the predicted 305-bp U6/L1 cDNA product was detected in reactions containing U6 > P and OH-L1 (Fig. 3, lane 3) but was not visible in negative control reactions that either lack or contain heat-treated HeLa cell nuclear extracts (Fig. 3, lanes 1 and 2). Sequencing of RT-PCR products revealed that 47 of 106 (∼44%) products consisted of the 3′ end of a copy of a U6 snRNA cDNA ending in 4 thymidine nucleotides conjoined precisely to the 5′-OH of the full-length OH-L1 RNA (Fig. 3). By comparison, 38 of 106 (∼36%) products consisted of the 3′ end of a copy of a U6 snRNA cDNA ending in 4 thymidine nucleotides conjoined to a 5′-truncated OH-L1 (Fig. 3). The prevalence of U6/L1 chimeras conjoined to 5′-truncated L1 sequences was higher in reactions conducted with HeLa cell nuclear extracts when compared to in vitro reactions conducted with purified RtcB (36% vs. 7%), suggesting that L1 RNA may be cleaved by a ribonuclease activity that generates 5′-OH ends in the HeLa cell nuclear extracts. As described above, we also recovered 21 of 106 (∼20%) putative artifactual RT-PCR products (64) ().Incubation of the U6-OH and OH-L1 substrates in HeLa cell nuclear extracts did not yield the predicted 305-bp U6/L1 cDNA product (Fig. 3, lane 4). DNA sequence analyses revealed that 38 of 43 (∼88%) products likely represent artifacts generated during RT-PCR (64) (). Four out of 38 (∼9%) products contained the 3′ end of a U6 snRNA cDNA ending in 5 thymidine nucleotides conjoined precisely to the 5′-OH of the full-length OH-L1 RNA substrate. The synthetic U6 > P RNA ends in 4 uridine ribonucleotides whereas the mature forms of human U6 snRNA typically end in 5 uridine ribonucleotides (45); thus, endogenous U6 snRNA may have been ligated to the OH-L1 RNA in these reactions. Finally, 1 product contained the 3′ end of a copy of a U6 snRNA cDNA ending in 4 thymidine nucleotides followed by 5′-truncated L1 (Fig. 3). Thus, a U6 2′,3′-cyclic phosphate is required for efficient U6/L1 ligation in HeLa cell nuclear extracts.To test whether a 5′-OH was required for U6/L1 ligation, we incubated U6 > P and P-L1 RNAs in HeLa cell nuclear extracts. We did not detect the predicted 305-bp U6/L1 cDNA product (Fig. 3, lane 5). Sequence analyses did not reveal the presence of the predicted U6/L1 chimeric cDNA; however, 16 of 23 (∼70%) of the cDNA products consisted of a copy of the 3′ end of a U6 snRNA cDNA ending in 4 thymidine nucleotides followed by a variably 5′-truncated P-L1 sequence (Fig. 3). These U6/L1 junctions occurred throughout L1 RNA, and there was not an overt sequence motif within L1 RNA that facilitated U6/L1 formation (Fig. 3 and ). These data are consistent with results from reactions where U6 > P and OH-L1 were incubated with HeLa cell nuclear extracts, suggesting that L1 RNA may be processed by a ribonuclease activity in the HeLa cell nuclear extracts either prior to or during the ligation reaction. The remaining 7 of 23 (∼30%) cDNA products were characterized as putative RT-PCR artifacts (64) (). Thus, efficient U6/L1 ligation in HeLa cell nuclear extracts requires U6 2′,3′-cyclic phosphate and 5′-OH L1 substrates.
Depletion of RtcB from HeLa Cells Affects U6/L1 Ligation Efficiency.
Ligation reactions using HeLa nuclear extracts suggested that RtcB might ligate U6/L1 RNA in HeLa cells. To test this hypothesis, we utilized CRISPR/Cas9 gene editing to generate 2 HeLa cell lines (RtcB2.1 and RtcB2.2) exhibiting reduced RtcB protein expression (). Sanger sequencing revealed that the RtcB alleles in RtcB2.1 and RtcB2.2 contain genomic edits that are predicted to result in either frame-shift mutations or in-frame deletions in the RtcB amino acid sequence (). Western blots confirmed that steady state RtcB protein levels were reduced by ∼80% and ∼73% for the RtcB2.1 and RtcB2.2 clonal cell lines, respectively (Fig. 3). We were unable to isolate a HeLa cell clone with a complete knockout of RtcB protein expression, likely because RtcB is an essential gene (68) and the total loss of RtcB protein expression would prevent cell growth and/or viability (69, 70).To examine U6/L1 RNA ligation efficiency in RtcB-depleted HeLa nuclear extracts, we incubated the extracts with synthetic U6 RNA ending in a 2′,3′-cyclic phosphate and an L1 RNA that contained a 5′ hydroxyl. Quantitative reverse transcriptase PCR (RT-qPCR) revealed that U6/L1 ligation efficiency was reduced ∼4- to 5-fold in the RtcB2.1 and RtcB2.2 clonal cell lines, but that neither RtcB protein levels nor U6/L1 ligation efficiency was affected in a clonal cell line containing a negative control sgRNA that targeted GFP (Fig. 3 ).To determine whether ectopic RtcB expression could complement the defect in the RtcB2.1 and RtcB2.2 cell lines, we transfected them with an RtcB cDNA expression plasmid. Western blots revealed that ectopic RtcB expression led to an ∼1.4-fold and ∼1.5-fold increase in RtcB protein expression in the RtcB2.1 and RtcB2.2 clonal cell lines when compared to HeLa-JVM cells (Fig. 3). RT-qPCR experiments revealed that ectopic RtcB expression partially rescued ligation efficiency by ∼2-fold in the RtcB2.1 and RtcB2.2 extracts (Fig. 3). These data indicate that endogenous RtcB contributes to U6/L1 RNA ligation in HeLa cell extracts.
U6 snRNA and GFP Are Ligated in HeLa Cell Nuclear Extracts.
The HGR contains chimeric pseudogenes that consist of a full-length copy of U6 conjoined to cDNAs derived from other cellular mRNAs (58), suggesting that U6 RNA can be ligated to non-L1 RNAs. To test this hypothesis, we generated a 310-nucleotide RNA fragment that corresponds to the 3′-end of a humanized Renilla GFP cDNA that contains a 5′-OH (OH-GFP) (). We incubated U6 > P and OH-GFP RNA in HeLa cell nuclear extracts and then used RT-PCR to detect cDNA products ( and ). Incubation of the U6 > P and OH-GFP RNA substrates with HeLa cell nuclear extracts yielded the predicted 232-bp cDNA product (). Sequencing of the cDNA products revealed that 9 of 23 (39%) consisted of a copy of the 3′ end of a U6 snRNA ending in 4 thymidine nucleotides conjoined precisely to the 5′-OH of the full-length GFP RNA fragment whereas 6 of 23 (26%) contained a copy of the 3′ end of a U6 snRNA ending in 4 thymidine nucleotides followed by a 5′-truncated GFP RNA fragment (). The remaining 8 of 23 (∼35%) cDNA products contained either a 3′-truncated U6 snRNA sequence conjoined to a 5′-truncated GFP, U6 snRNA followed by a partial HDVr sequence conjoined to a 5′-truncated GFP, and/or untemplated nucleotides at the U6/GFP junction. These sequences sometimes exhibited 1 to 3 nucleotide microhomologies at the U6/L1 junction and likely represent artifacts generated during RT-PCR (similar in structure to , except GFP replaces L1) (64). These data suggest that chimeric RNA formation is not exclusive to L1 RNA.
Endogenous U6/L1 RNA Is Part of the Transcriptome in Human Cells.
Data from transfection-based experiments and in vitro ligation experiments suggested that U6 snRNA could be ligated to L1 RNA in vivo; thus, we sought to determine whether U6/L1 chimeric RNAs were part of the transcriptome in human cells. We searched for U6/L1 junction reads in 100-base pair paired-end RNA sequencing (RNA-seq) data generated from 2 independent HeLa cell lines (HeLa-JVM and HeLa-HA), a humanembryonic carcinoma cell line (PA-1), a human embryonic stem cell line (H9-hESCs), and H9-derived neural progenitor cells (NPCs) (Fig. 4 and ). Each of these cell lines can accommodate the retrotransposition of engineered human L1s (22, 71–74). We identified 398 U6/L1 chimeric RNA read-pairs out of ∼1.1 × 109 RNA sequencing reads across the 5 cell lines. After removing duplicate PCR reads, we then merged overlapping reads to identify 64 intact U6/L1 junction sequences.
Fig. 4.
RNA-seq detection of endogenous U6/L1 chimeric RNAs in human cell lines. (A) Rationale of the RNA-seq experiments. Step 1: Ribosome-depleted RNA was fragmented to ∼190 nucleotides and subjected to 100-bp paired-end DNA sequencing. Step 2: RNA-seq read pairs (arrows) were aligned to a repeat masked version human reference genome (HGR/build Grch38), which contained “spiked-in” copies of a single U6 (white rectangle) and single L1.3 (blue rectangle) sequence. RNA-seq reads that did not map to U6 or L1 were discarded from subsequent analyses. Step 3: Overlapping U6/L1 read pairs were merged to determine the U6/L1 junction sequences. U6/L1 read pairs that contained a gap (i.e., “no overlap”) were discarded from subsequent analyses. Step 4: Overlapping U6/L1 junctions were aligned to the unmasked HGR. If the U6/L1 junction mapped to the HGR with >90% accuracy, it was designated as an “aligned” read. U6/L1 junctions that failed to map to the HGR with at least 90% accuracy were designated as “non-aligned” reads. (B) Structures of RNA-seq U6/L1 junctions. A schematic of a full-length RC-L1 is indicated at the top. The general structure of a U6/L1 chimeric junction sequence consists of the 3′ end of a U6 snRNA cDNA sequence ending in ∼4 to 8 thymidine nucleotides (left side of figure; white arrow ending in Tn) conjoined to a variably 5′−truncated L1 sequence. Two independent RNA-seq libraries were generated from HeLa-JVM, H9, NPC, and PA-1 cells, respectively (squares and triangles, respectively). One RNA-seq library was generated from HeLa-HA cells. Red horizontal lines, HeLa-JVM; green horizontal lines, HeLa-HA; yellow horizontal lines, PA-1; black horizontal lines, H9; blue horizontal lines, human NPCs. Each horizontal dashed line represents a single U6/L1 junction RNA-seq merged sequence read. The triangle or square at the left end of the horizontal dashed lines indicates the approximate location of the U6/L1 junction point relative to L1.3. The top set of dashed lines represent 16 U6/L1 junction sequences that mapped (“aligned”) to the HGR. The bottom set of dashed lines represent 33 U6/L1 junction sequences that did not map (“non-aligned”) to the HGR. These 33 U6/L1 junctions contained a copy of U6 conjoined to an L1 present in the same transcriptional orientation. The remaining 4 U6/L1 chimeras that did not map to the HGR () contained a copy of U6 conjoined to an L1 present in the opposite transcriptional orientation.
RNA-seq detection of endogenous U6/L1 chimeric RNAs in human cell lines. (A) Rationale of the RNA-seq experiments. Step 1: Ribosome-depleted RNA was fragmented to ∼190 nucleotides and subjected to 100-bp paired-end DNA sequencing. Step 2: RNA-seq read pairs (arrows) were aligned to a repeat masked version human reference genome (HGR/build Grch38), which contained “spiked-in” copies of a single U6 (white rectangle) and single L1.3 (blue rectangle) sequence. RNA-seq reads that did not map to U6 or L1 were discarded from subsequent analyses. Step 3: Overlapping U6/L1 read pairs were merged to determine the U6/L1 junction sequences. U6/L1 read pairs that contained a gap (i.e., “no overlap”) were discarded from subsequent analyses. Step 4: Overlapping U6/L1 junctions were aligned to the unmasked HGR. If the U6/L1 junction mapped to the HGR with >90% accuracy, it was designated as an “aligned” read. U6/L1 junctions that failed to map to the HGR with at least 90% accuracy were designated as “non-aligned” reads. (B) Structures of RNA-seq U6/L1 junctions. A schematic of a full-length RC-L1 is indicated at the top. The general structure of a U6/L1 chimeric junction sequence consists of the 3′ end of a U6 snRNA cDNA sequence ending in ∼4 to 8 thymidine nucleotides (left side of figure; white arrow ending in Tn) conjoined to a variably 5′−truncated L1 sequence. Two independent RNA-seq libraries were generated from HeLa-JVM, H9, NPC, and PA-1 cells, respectively (squares and triangles, respectively). One RNA-seq library was generated from HeLa-HA cells. Red horizontal lines, HeLa-JVM; green horizontal lines, HeLa-HA; yellow horizontal lines, PA-1; black horizontal lines, H9; blue horizontal lines, human NPCs. Each horizontal dashed line represents a single U6/L1 junction RNA-seq merged sequence read. The triangle or square at the left end of the horizontal dashed lines indicates the approximate location of the U6/L1 junction point relative to L1.3. The top set of dashed lines represent 16 U6/L1 junction sequences that mapped (“aligned”) to the HGR. The bottom set of dashed lines represent 33 U6/L1 junction sequences that did not map (“non-aligned”) to the HGR. These 33 U6/L1 junctions contained a copy of U6 conjoined to an L1 present in the same transcriptional orientation. The remaining 4 U6/L1 chimeras that did not map to the HGR () contained a copy of U6 conjoined to an L1 present in the opposite transcriptional orientation.Hand annotation of the 64 U6/L1 junctions revealed that 53 (∼83%) consisted of the 3′ end of U6 snRNA cDNA ending in ∼4 to 8 thymidine nucleotides conjoined to a variably 5′-truncated L1 sequence (Fig. 4 and ). Of note, 4 of these 53 U6/L1 chimeras consisted of U6 ending in 5 to 7thymidines conjoined to an L1 sequence in an antisense orientation, suggesting that U6 can become conjoined to both sense and antisense L1 RNAs (). As above, there was not a specific sequence within L1 that appeared to facilitate U6/L1 chimera formation (Fig. 4 and ). The remaining 11 of 64 (∼17%) U6/L1 sequences contained a 3′-truncated U6 snRNA conjoined to a 5′-truncated L1 and were excluded from further analysis as they were structurally similar to template-switching artifacts generated during cDNA synthesis described above (64) (). Thus, 53 bona fide unique U6/L1 chimeras were subjected to further analysis.
Most U6/L1 Chimeric RNA Sequences Do Not Align to the Genome.
The low proportion of U6/L1 chimera RNAs in our dataset suggested that they might represent a rare subset of RNAs in human cells. To determine whether the U6/L1 chimeras detected in RNA-seq experiments were derived from the transcription of an existing genomic U6/L1 or represented unique chimeric RNAs, the 53 U6/L1 chimeric RNAs were used as probes in BLAT searches of the HGR (Fig. 4 and ). Sixteen out of 53 (∼30%) U6/L1 junctions were present in the HGR, suggesting that they could have resulted from the transcription of extant U6/L1 pseudogenes. Seven out of the 16 putative transcribed U6/L1 chimeric RNAs were detected in multiple cell lines (Fig. 4 and ), and 7 were supported by multiple reads from the same cell line (Fig. 4 and ). The 16 genomic U6/L1 chimeric pseudogenes that served as putative transcription templates that gave rise to chimeric U6/L1 RNAs exhibited L1 structural hallmarks (). They consisted of a full-length U6 snRNA sequence ending in 5 to 7thymidine nucleotides conjoined to a variably 5′-truncated L1, were flanked by 6- to 19-bp target site duplications, and inserted into an L1 EN consensus cleavage sequence. By comparison, 37 out of 53 (∼70%) U6/L1 junction sequences did not align to the HGR and were unique to a single cell line (Fig. 4). Thirty-one of 37 junctions were supported by a single merged read pair, 5 of 37 junctions were supported by 2 merged reads, and 1 junction was supported by 3 merged reads (Fig. 4 and ).Human-specific L1 insertions can be polymorphic with respect to presence/absence in the human population (75); thus, it is conceivable that some of the cell lines used to generate RNA-seq data could contain a genomic U6/L1 chimeric pseudogene that is absent from the HGR. To examine this possibility, we used the 53 U6/L1 junctions as probes to query HeLa genome sequencing data available in the database of Genotypes and Phenotypes (dbGaP accession number phs000640.v1.p1) (76–78). Controls revealed that the 16 U6/L1 junction sequences that aligned to the HGR were also present in the HeLa genome data (Fig. 4 and see ). By comparison, the 37 non-aligned U6/L1 junction sequences were absent from HeLa genome data.To further validate the uniqueness of the 37 U6/L1 junction sequences, we aligned the 53 U6/L1 junction sequences to 23 high-coverage individual genomes representing 23 distinct human geographic populations from the 1000 Genomes Project dataset ( and ) (79). The 16 U6/L1 junction sequences that were present in the HGR and HeLa cell genomic datasets also were present in each of the 23 of high coverage 1000 Genomes Project individual genomes; 2 genomes (NA20845 and HG03742) contained an SNP in the U6 portion of the junction sequences (). In contrast, the 37 non-aligned U6/L1 junction sequences were absent from the high coverage 1000 Genomes Project individual genomes. Thus, these data suggest that the 37 U6/L1 junctions detected in our RNA-seq experiments do not correspond to existing U6/L1 genomic pseudogenes and that different cell types may contain a unique cohort of chimeric RNAs that are generated by posttranscriptional RNA ligation events.
Discussion
RNA Ligation Generates Chimeric U6/L1 RNA.
Here, we demonstrated that U6/L1 chimeric RNAs are generated in transfected HeLa cells independently of L1 retrotransposition (Fig. 1 ). In vitro ligation assays demonstrated that purified RtcB can ligate U6 snRNA substrates ending in a 2′,3′-cyclic phosphate to L1 RNAs containing a 5′-OH (Fig. 2 ) and that replacement of either the U6 2′,3′-cyclic phosphate with a 3′-OH or the L1 5′-OH with a 5′ triphosphate dramatically reduced the formation of RtcB-mediated ligation products. Additional assays revealed that HeLa cell nuclear extracts can mediate ligation of a U6 snRNA substrate ending in a 2′,3′-cyclic phosphate to an L1 RNA containing a 5′-OH (Fig. 3), depleting RtcB from HeLa extracts decreases U6/L1 ligation efficiency (Fig. 3), and U6 RNA could be ligated to non-L1 RNAs (). Finally, RNA-seq experiments revealed that endogenous U6/L1 RNA chimeras are a component of the transcriptome of multiple human cell types (Fig. 4). Together, these data provide a mechanistic explanation for how RtcB can mediate the ligation of U6 snRNA to a diverse cohort of cellular RNAs containing 5′-OH ends. It remains possible that another RNA ligation activity present in HeLa cells could also contribute to U6/L1 chimera formation. For example, previous studies have shown that a yeast-like tRNA ligase activity can join 2′,3′-cyclic phosphate and 5′-OH containing RNAs in vertebrate cells; however, this ligase remains to be identified (80).
RNA Ligation in HeLa Cell Extracts.
U6/L1 ligation reactions using purified bacterial RtcB required an oligonucleotide splint, presumably to keep the U6 snRNA 2′,3′-cyclic phosphate and L1 5′-OH ends in close proximity to one another. In contrast, U6/L1 ligation reactions using HeLa cell nuclear extracts did not require an oligonucleotide splint. The human form of RtcB is 505 amino acids in length, and protein sequence alignments using the Universal Protein Resource (UniProt) align tool (81) showed that humanRtcB is highly conserved (>90%) among other vertebrate forms of RtcB (). However, humanRtcB only shares ∼25% amino acid identity with RtcB from Escherichia coli (408 amino acids in length) (), suggesting that changes to the RtcB amino acid sequence over evolutionary time could have affected RtcB enzymatic activity and/or function.It is also possible that cellular proteins within HeLa nuclear extracts may help facilitate U6/L1 RNA ligation. Previous studies have revealed that L1 RNPs are associated with numerous RNA binding proteins and cellular RNAs, including U6 snRNA (82–87). Moreover, RtcB has been detected in GFP-tagged L1ORF1p cytoplasmic granules (83), cytoplasmic stress granules (88), and other protein/RNA complexes within the cell (60, 89, 90) whereas U6 snRNA associates with cellular RNA binding proteins, including La and the Lsm2-8 ring (38, 91, 92). Of note, the Lsm2-8 ring is ejected from U6 during splicing, and U6 snRNA must be recycled, which could provide other cellular proteins access to the 3′ end of U6 snRNA. Thus, it is plausible that protein–protein or protein–RNA interactions within HeLa cell nuclear extracts may help bring U6 snRNA and L1 RNA into close proximity to facilitate ligation.
The L1 RNA May Be Processed by Endoribonuclease Activity in HeLa Cells.
In contrast to experiments conducted with recombinant bacterial RtcB, almost half of the U6/L1 ligation products characterized using HeLa cell nuclear extracts consisted of the 3′ end of U6 snRNA cDNA ending in 4 thymidine nucleotides conjoined to a 5′-truncated L1 (Fig. 3 and ). This result was evident in reactions with L1 RNA substrates containing either a 5′-OH or a 5′ triphosphate. Given that efficient ligation reactions were dependent upon U6 snRNA ending in a 2′,3′-cyclic phosphate and L1 RNA with a 5′-OH, these data raise the possibility that L1 RNAs may be processed by an endogenous endoribonucleolytic activity within HeLa cell nuclear extracts prior to their ligation to U6 snRNA. A number of cellular endoribonucleases generate RNA fragments with 5′-OH ends (93), including the tRNA splicing endonuclease (66), RNase L (94, 95), angiogenin (96, 97), and inositol-requiring enzyme 1 (IRE1) (98). RNase L, for example, can target viral as well as host cellular RNAs (94, 95, 99) and inhibits L1 retrotransposition by targeting L1 RNA (100). Recent reports have also provided evidence that RtcB associates with IRE1, an endoribonuclease involved in the UPR (69, 98). It is possible that other classes of RNA endoribonucleases and/or exoribonucleases that generate RNA 5′-phosphate ends also could process the L1 RNA. However, in these instances, the 5′-phosphate end would need to be hydrolyzed by a 5′-phosphatase to generate a 5′-OH end suitable for ligation by RtcB. Thus, the most parsimonious explanation of the data is that the L1 RNA is processed by an as-yet-unknown endoribonucleolytic activity within HeLa cell nuclear extracts to generate a truncated L1 RNA containing a 5′-OH.
Chimeric U6/L1 RNAs Are Present in Human Cells.
RNA-seq experiments demonstrated that U6/L1 chimeric RNAs are a component of the transcriptome in humancancer cell lines, hESCs, and human NPCs (Fig. 4). The majority (∼70%) of U6/L1 chimeric RNA-seq reads failed to align to the HGR, HeLa, or 23 high-coverage genomes in the 1000 Genomes Project dataset (Fig. 4), suggesting that they were generated de novo by a posttranscriptional mechanism that results in the formation of a unique U6/L1 chimeric RNA molecule. Consistent with this conclusion, the majority of the “non-aligned” junctions were supported by a single RNA-seq read and are unique to a single cell line. In contrast, ∼30% of the U6/L1 chimeric RNAs aligned to the HGR, HeLa, and 23 high-coverage genomes in the 1000 Genomes Project dataset (Fig. 4), indicating that they are generated from existing U6/L1 pseudogenes. Of note, 13 of these U6/L1 pseudogenes are embedded within the introns of annotated RefSeq genes (), with 8 oriented in the same transcriptional orientation as the gene, suggesting that U6/L1 chimeric RNAs could be expressed as part of an RNA polymerase II pre-mRNA transcript. Vertebrate U6 snRNA is transcribed by RNA polymerase III and relies on upstream promoter elements to drive its transcription (38, 40). Thus, unless U6/L1 chimeric pseudogenes fortuitously inserted downstream of a promoter that could augment RNA polymerase III transcription, it remains unlikely that U6/L1 chimeric RNAs are transcribed as discrete transcription units.
A Model of U6/L1 Pseudogene Formation.
U6 snRNA is enriched in the nucleus (101) whereas RtcB is present in both the nucleus and cytoplasm (69, 102). Our data, in conjunction with previous knowledge of the L1 retrotransposition cycle, lead us to propose a mechanism for how U6/L1 chimeric pseudogenes are generated in human cells (Fig. 5). We posit that a ribonucleoprotein complex that minimally contains U6 snRNA, RtcB, and an undefined endoribonuclease can cleave the L1 RNA present in wild-type L1 RNPs to generate a 5′-OH end, which, in turn, allows L1 RNA to be ligated to the U6 snRNA 2′,3′-cyclic phosphate. As L1ORF2p preferentially associates with the L1poly(A) tail (23, 31), the resultant chimeric U6/L1 RNAs likely retrotranspose in cis. The majority of chimeric pseudogenes identified in the HGR consist of the full-length U6 snRNA sequence ending in a poly(T) tract conjoined to a variably 5′-truncated L1 sequence (53, 55, 57, 58); thus, this mechanism provides a plausible explanation for the generation of most U6/L1 chimeric pseudogenes. U6atac snRNA also ends in a 2′,3′-cyclic phosphate (103); thus, the above model also could explain how U6atac/L1 chimeric pseudogenes are generated in human cells (55, 57).
Fig. 5.
A model of U6/L1 RNA pseudogene formation. Following transcription, L1 RNA (black wavy line) is exported to the cytoplasm (step 1). L1 RNA is translated, and the L1 proteins, ORF1p (yellow circles) and ORF2p (blue circle) bind to their encoding L1 RNA to form an L1 ribonucleoprotein particle (RNP) (step 2). After RNP formation, the L1 RNA is cleaved by an unidentified endoribonuclease to generate 5′-truncated L1 RNA with a 5′-OH (step 3). RtcB, or a related ligase, ligates U6 snRNA to 5′-truncated L1 RNA (step 4). The resultant chimeric U6/L1 RNA is inserted into genomic DNA by TPRT (in cis), which results in the formation of a U6/L1 chimeric pseudogene (white band on black chromosome) (step 5). It is possible that L1 RNA could also be processed by nuclease activity (red scissors) and ligated to U6 while still in the nucleus (step 6). In this scenario, it is possible the chimeric U6/L1 RNA could undergo retrotransposition by trans-complementation.
A model of U6/L1 RNA pseudogene formation. Following transcription, L1 RNA (black wavy line) is exported to the cytoplasm (step 1). L1 RNA is translated, and the L1 proteins, ORF1p (yellow circles) and ORF2p (blue circle) bind to their encoding L1 RNA to form an L1 ribonucleoprotein particle (RNP) (step 2). After RNP formation, the L1 RNA is cleaved by an unidentified endoribonuclease to generate 5′-truncated L1 RNA with a 5′-OH (step 3). RtcB, or a related ligase, ligates U6 snRNA to 5′-truncated L1 RNA (step 4). The resultant chimeric U6/L1 RNA is inserted into genomic DNA by TPRT (in cis), which results in the formation of a U6/L1 chimeric pseudogene (white band on black chromosome) (step 5). It is possible that L1 RNA could also be processed by nuclease activity (red scissors) and ligated to U6 while still in the nucleus (step 6). In this scenario, it is possible the chimeric U6/L1 RNA could undergo retrotransposition by trans-complementation.Our data further demonstrate that U6 snRNA can become conjoined to retrotransposition-defective L1 and other cellular RNAs. However, because the retrotransposition of these RNAs would need to coopt the L1-encoded proteins in trans (31), they probably occur less frequently than U6/L1 chimeric pseudogenes that are formed in cis. Our model does not rule out the possibility that some snRNA/L1 chimeric pseudogenes arise during TPRT by a template-switching mechanism. For example, a small number of retro-pseudogene sequences consist of other small RNA species (e.g., U1, U3, U5, and 5s rRNA), which tend to be 3′-truncated and are conjoined to either 5′-truncated L1s or other cellular RNAs (53, 55, 57, 58). As these small RNAs are not normally modified at their 3′ ends to contain a 2′,3′-cyclic phosphate, it remains possible that the resultant chimeric products are formed by an RtcB-independent mechanism, or the 3′ truncation of the small RNAs is created by an endoribonuclease that generates a 2′,3′-cyclic phosphate or 3′-monophosphate end, either of which may serve as a substrate for ligation by RtcB to a 5′-OH (66, 67).
Conclusions.
We provide mechanistic evidence for how U6/L1 chimeric pseudogenes are formed in the human genome. Our data suggest that, in addition to their roles in mRNA and tRNA splicing, U6 snRNA and RtcB are involved in the formation of chimeric cellular RNAs, thereby contributing to RNA transcriptome diversity. The presence of a 2′,3′-cyclic phosphate on human U6 snRNA stabilizes it and prevents it from degradation (38). We speculate that RtcB-mediated ligation of U6 to L1 or other cellular RNAs containing a 5′-OH would eliminate the U6 2′,3′-cyclic phosphate, which could, in principle, facilitate U6 snRNA turnover.
Methods
Oligonucleotide Sequences.
Complete methods on cell culture, RT-PCR, RNA ligation assays, CRISPR gene editing, quantitative real-time PCR, and RNA-seq can be found in . A list of oligonucleotides used in this study is provided in . The University of Michigan Pluripotent Stem Cell Research Oversight (HPSCRO) Committee has approved work with hESCs in the Moran lab (Record no. 1004/1023).
Authors: Vadim Shchepachev; Harry Wischnewski; Charlotte Soneson; Andreas W Arnold; Claus M Azzalin Journal: FEBS Lett Date: 2015-07-23 Impact factor: 4.124
Authors: Katsumi Yamaguchi; Alisha O Soares; Loyal A Goff; Anjali Talasila; Jungbin A Choi; Daria Ivenitsky; Sadik Karma; Benjamin Brophy; Scott E Devine; Stephen J Meltzer; Haig H Kazazian Journal: Proc Natl Acad Sci U S A Date: 2020-12-04 Impact factor: 12.779
Authors: Weichen Zhou; Sarah B Emery; Diane A Flasch; Yifan Wang; Kenneth Y Kwan; Jeffrey M Kidd; John V Moran; Ryan E Mills Journal: Nucleic Acids Res Date: 2020-02-20 Impact factor: 16.971
Authors: Alena Kroupova; Fabian Ackle; Igor Asanović; Stefan Weitzer; Franziska M Boneberg; Marco Faini; Alexander Leitner; Alessia Chui; Ruedi Aebersold; Javier Martinez; Martin Jinek Journal: Elife Date: 2021-12-02 Impact factor: 8.140
Authors: M Julhasur Rahman; Sherry L Haller; Ana M M Stoian; Jie Li; Greg Brennan; Stefan Rothenburg Journal: Elife Date: 2022-09-07 Impact factor: 8.713